ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn

License Build Status download codecov Language grade: C/C++

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. it is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。ncnn 从设计之初深刻考虑手机端的部署和使用。无第三方依赖,跨平台,手机端 cpu 的速度快于目前所有已知的开源框架。基于 ncnn,开发者能够将深度学习算法轻松移植到手机端高效执行,开发出人工智能 APP,将 AI 带到你的指尖。ncnn 目前已在腾讯多款应用中使用,如 QQ,Qzone,微信,天天P图等。


技术交流QQ群:637093648(超多大佬) 答案:卷卷卷卷卷

Pocky群(MLIR YES!): 677104663(超多大佬)

Telegram Group https://t.me/ncnnyes

Discord Channel https://discord.gg/YRsxgmF


Current building status matrix

System CPU (32bit) CPU (64bit) GPU (32bit) GPU (64bit)
Linux (GCC) Build Status Build Status Build Status
Linux (Clang) Build Status Build Status Build Status
Linux (ARM) Build Status Build Status
Linux (MIPS) Build Status Build Status
Linux (RISC-V) Build Status
Windows (VS2015) Build Status Build Status
Windows (VS2017) Build Status Build Status Build Status
Windows (VS2019) Build Status Build Status Build Status
macOS Build Status Build Status
macOS (ARM) Build Status Build Status
Android Build Status Build Status Build Status Build Status
Android-x86 Build Status Build Status Build Status Build Status
iOS Build Status Build Status Build Status
iOS Simulator Build Status Build Status
WebAssembly Build Status
RISC-V GCC/Newlib Build Status Build Status

Support most commonly used CNN network

支持大部分常用的 CNN 网络


HowTo

how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3 / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000

download prebuild binary package for android and ios

use ncnn with alexnet with detailed steps, recommended for beginners :)

ncnn 组件使用指北 alexnet 附带详细步骤,新人强烈推荐 :)

use netron for ncnn model visualization

out-of-the-box web model conversion

ncnn low-level operation api

ncnn param and model file spec

ncnn operation param weight table

how to implement custom layer step by step


FAQ

ncnn throw error

ncnn produce wrong result

ncnn vulkan


Features

  • Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
  • No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
  • Pure C++ implementation, cross-platform, supports android, ios and so on
  • ARM NEON assembly level of careful optimization, calculation speed is extremely high
  • Sophisticated memory management and data structure design, very low memory footprint
  • Supports multi-core parallel computing acceleration, ARM big.LITTLE cpu scheduling optimization
  • Supports GPU acceleration via the next-generation low-overhead vulkan api
  • Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
  • Support direct memory zero copy reference load network model
  • Can be registered with custom layer implementation and extended
  • Well, it is strong, not afraid of being stuffed with 卷 QvQ

功能概述

  • 支持卷积神经网络,支持多输入和多分支结构,可计算部分分支
  • 无任何第三方库依赖,不依赖 BLAS/NNPACK 等计算框架
  • 纯 C++ 实现,跨平台,支持 android ios 等
  • ARM NEON 汇编级良心优化,计算速度极快
  • 精细的内存管理和数据结构设计,内存占用极低
  • 支持多核并行计算加速,ARM big.LITTLE cpu 调度优化
  • 支持基于全新低消耗的 vulkan api GPU 加速
  • 可扩展的模型设计,支持 8bit 量化 和半精度浮点存储,可导入 caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模型
  • 支持直接内存零拷贝引用加载网络模型
  • 可注册自定义层实现并扩展
  • 恩,很强就是了,不怕被塞卷 QvQ

supported platform matrix

  • = known work and runs fast with good optimization
  • ✔️ = known work, but speed may not be fast enough
  • = shall work, not confirmed
  • / = not applied
Windows Linux Android macOS iOS
intel-cpu ✔️ ✔️ ✔️ /
intel-gpu ✔️ ✔️ /
amd-cpu ✔️ ✔️ ✔️ /
amd-gpu ✔️ ✔️ /
nvidia-gpu ✔️ ✔️ /
qcom-cpu ✔️ / /
qcom-gpu ✔️ ✔️ / /
arm-cpu / /
arm-gpu ✔️ / /
apple-cpu / / / ✔️
apple-gpu / / / ✔️ ✔️

Example project


License

BSD 3 Clause

Comments
  • 测试MTCNN结果完全不一样

    测试MTCNN结果完全不一样

    跑了下MTCNN的PNet和RNet结果与标准结果相差很大,拿一张人脸给RNet的得分也很低

    const float mean_vals[3] = {127.5f, 127.5f, 127.5f}; const float norm_vals[3] = {0.0078125, 0.0078125, 0.0078125};

    int hs = ceil(img_hscales[i]); int ws = ceil(img_wscales[i]); ncnn::Mat pnet_img = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, img_w, img_h, ws, hs); pnet_img.substract_mean_normalize(mean_vals,norm_vals); ncnn::Extractor Pnet_ex = Pnet.create_extractor(); Pnet_ex.set_light_mode(true); Pnet_ex.input("data", pnet_img); ncnn::Mat score, loc; Pnet_ex.extract("prob1", score); Pnet_ex.extract("conv4-2", loc);

    if(*(score_data+i)>=thresh)。。。

  • prepare for release with experimental gpu inference capability

    prepare for release with experimental gpu inference capability

    shader

    • [x] priorbox (ssd)
    • [x] premute (ssd)
    • [x] deconvolution
    • [x] deconvolutiondepthwise (yolo)
    • [x] interp (upsample)
    • [x] reorg (yolov2)
    • [x] prelu
    • [x] reshape
    • [x] tanh
    • [x] sigmoid
    • [x] clip
    • [x] absval
    • [x] shufflechannel (shufflenet)

    example

    • [x] squeezenet-gpu
    • [x] mobilenet-ssd-gpu
    • [x] mobilenet-yolov3-gpu

    benchncnn

    • [x] shufflenet
    • [x] mobilenet-ssd / squeezenet-ssd
    • [x] mobilenet-yolo / mobilenet-yolov3

    binary release

    • [x] vulkan-enabled android prebuild library
    • [x] vulkan-enabled ios prebuild framework (arm64 only)

    documentation

    • [x] faq about common vulkan api error
    • [x] faq about packing
    • [x] faq about hybrid gpu/cpu inference practice
    • [x] faq about op fusion
  • 在转换模型文件的过程中,为什么参数会发生变化?

    在转换模型文件的过程中,为什么参数会发生变化?

    我是用pytorch生成的onnx模型,onnx模型经过onnxsim简化之后,再用onnx2ncnn转成.param文件,再用ncnnoptimize对.param进行优化之后, 然后用netron工具打开onnx模型文件和.param文件发现最后一层全连接层的w和b的值不一样,这是什么问题?有人遇到过没?

  • mtcnn使用20191113版本速度变慢

    mtcnn使用20191113版本速度变慢

    代码:https://github.com/moli232777144/mtcnn_ncnn 使用代码自带的ncnn库(更新时间是20180516),NDK版本使用android-ndk-r16b,对代码自带的科比图像循环检测100次,线程数为2,平均耗时是45ms。 更新ncnn库到20191113版本,NDK版本更新为android-ndk-r19c,对科比图像循环检测100次,线程数为2,平均耗时是106ms。 注:测试手机是vivo NEX A,骁龙710,代码下载下来之后未作变动,只是更新了gradle版本。对比前后只是更新了ncnn库和sdk版本(sdk版本不变的话,20191113版本检测时间还会更长一点),其余未作变动。 这个耗时相差还是比较大的,请问一下可能是什么原因呢?非常感谢!

    更新:对历史版本挨个测试之后发现,20190611版本的2线程循环1000次平均耗时仍然为44ms,到了20190908版本就变成了107ms。

  • 量化int8后出现以下情况:yolov5无检测框,yolo fastest等轻量检测网络精度下降严重

    量化int8后出现以下情况:yolov5无检测框,yolo fastest等轻量检测网络精度下降严重

    目前量化实验得到的几个结果返馈一下并提问大佬一些问题 ①预处理和校验数据确实是会影响到量化模型的精度(coco&ImageNet&voc的mean&norm转化后需区别对待,此条可忽略) ②有些模型量化后会出现没有检测框的现象(比如v5,v5有些处理模块比较复杂,不知道和这个有没有关系) ③一些轻量模型量化后推理时间反而更久,且精度出现下降严重的情况,但是这个推理时间更久是有前提的,比如用的是inter i7,i5这类处理器(原因可能有两个,一是像up说的ncnn更注重arm类架构,其次才是我说的这类处理器,二是类似yolo fastest的fp16在inter的处理器上已经达到20ms级别,开启vulkan甚至可以10ms,可能量化后速度也很难得到提升~可能在板子上的情况并非如此,但是我的树莓派坏了,没发去验证。) 想问下nihui大佬几个问题~ 一是yolo fastest会出现精度下降的情况具体原因是啥呢(相比下,fp16的检测精度还是可以的) 二是为何像v5这种会出现无检测框情况,虽然单帧运行确实快了三倍,但无检测框说明在下的量化过程是失败的,求大佬指点一二,感谢

  • ncnn 有类似下面这个函数没? warpAffine

    ncnn 有类似下面这个函数没? warpAffine

    void warpAffine(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags=INTER_LINEAR, intborderMode=BORDER_CONSTANT, const Scalar& borderValue=Scalar())

    另外多线程 有没有 纯C++的code 。 谢谢 z

  • 为什么我拿ncnn编译出来的.a文件编译so的时候会报undefined reference to '__kmpc_fork_call'的错误

    为什么我拿ncnn编译出来的.a文件编译so的时候会报undefined reference to '__kmpc_fork_call'的错误

    你好,我想请问一下: 我拿ncnn编译出来的.a文件,用ndk-build编译.so的时候会报 undefined reference to '__kmpc_fork_call' undefined reference to '__kmpc_for_static_init_4' undefined reference to '__kmpc_for_static_fini' undefined reference to '__kmpc_for_static_init_4' layer/convolutiondepthwise.cpp:176: error:undefined reference to '__kmpc_for_static_init_8'

  • 使用ncnn推理时error

    使用ncnn推理时error

    编译运行环境都是linux 然后在推理自己的代码时出现 [New LWP 17798] [New LWP 17800] [New LWP 17802] [New LWP 17801] [New LWP 17803] [New LWP 17799] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./matting-infer'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000562614d73a39 in ncnn::NetPrivate::forward_layer(int, std::vector<ncnn::Mat, std::allocatorncnn::Mat >&, ncnn::Option const&) const () [Current thread is 1 (Thread 0x7f979a75cc00 (LWP 17798))]

  • architecture changes for int8 packing

    architecture changes for int8 packing

    • [x] requantize
    • [ ] armv7 im2col+gemm pack8
    • [ ] armv7 conv1x1 pack8
    • [ ] armv7 conv3x3s1 pack8
    • [ ] armv8 im2col+gemm pack8
    • [ ] armv8 conv1x1 pack8
    • [ ] armv8 conv3x3s1 pack8
  • yolact onnx2ncnn的模型在windows上预测,得到的mask数据不对

    yolact onnx2ncnn的模型在windows上预测,得到的mask数据不对

    我是使用的pytorch1.4,在ubuntu16.04的系统上将yolact.onnx转化成yolact-sim.onnx,然后再windows上用onnx2ncnn转化成yolact.param和yolact.bin,然后又用ncnnoptimize优化了模型yolact-opt.param,yolact-opt.bin,然后微调了模型将3个0=-3改为0=0,然后在程序中yolact.cpp中调用,只有mask的结果是不对的,得到的图像数据很多负数,导致mask的结果不对。 模型参数如下: Convolution 615 1 1 614 616 0=32 1=1 5=1 6=8192 9=1 Permute 617 1 1 616 617 0=3 Concat 813 5 1 631 670 709 748 787 813 0=0 Concat 814 5 1 643 682 721 760 799 814 0=0 Concat 815 5 1 656 695 734 773 812 815 0=0 Softmax 817 1 1 814 817 0=1 1=1 程序中使用的是 ex.extract("616", maskmaps); // 138x138 x 32 ex.extract("813", location); // 4 x 19248 ex.extract("815", mask); // maskdim 32 x 19248 ex.extract("817", confidence); // 81 x 19248 大佬帮忙看下是哪里的问题,谢谢。

  • 预编译的库ncnn-vulkan.framework有问题,直接ld: Framework not found ncnn-vulkan

    预编译的库ncnn-vulkan.framework有问题,直接ld: Framework not found ncnn-vulkan

    我使用github上面预编译好的ncnn-vulkan.framework库,按照常规使用方式在Build phases里添加进项目,run之后直接error了,提示ld: Framework not found [ncnn-vulkan,我使用其他的framework库都没有任何问题,我网上下载了别人编译好的ncnn-vulkan.framework库,按同样的方式集成进项目里,也没有报错,目前就是使用了github上预编译好的ncnn-vulkan.framework出现了这个问题,麻烦尽快处理一下,谢谢!

  • MAC 编译NCNN-20220729 i386/x86_64版本失败

    MAC 编译NCNN-20220729 i386/x86_64版本失败

    MAC本版本:MacBook Pro (Retina, 13-inch, Early 2015) 系统版本12.5 NCNN版本:ncnn-20220729-full-source

    关闭openmp编译选项 执行

    mkdir -p build-ios
    cd build-ios
    cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/ios.toolchain.cmake -DIOS_PLATFORM=OS -DIOS_ARCH="armv7;armv7s;arm64;i386;x86_64" ..
    
    make -j 4
    

    提示以下错误

    ......
    
    [ 42%] Building CXX object src/CMakeFiles/ncnn.dir/gpu.cpp.o
    [ 42%] Building CXX object src/CMakeFiles/ncnn.dir/command.cpp.o
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/gpu.cpp:15:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/gpu.h:18:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/build_x86/src/platform.h:76:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/pthread.h:55:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/_types.h:27:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:32:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/cdefs.h:870:2: error: Unsupported architecture
    #error Unsupported architecture
     ^
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/gpu.cpp:15:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/gpu.h:18:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/build_x86/src/platform.h:76:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/pthread.h:55:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/_types.h:27:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:33:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/machine/_types.h:34:2: error: architecture not supported
    #error architecture not supported
     ^
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/gpu.cpp:15:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/gpu.h:18:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/build_x86/src/platform.h:76:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/pthread.h:55:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/_types.h:27:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:55:9: error: unknown type name '__int64_t'
    typedef __int64_t       __darwin_blkcnt_t;      /* total blocks */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:56:9: error: unknown type name '__int32_t'
    typedef __int32_t       __darwin_blksize_t;     /* preferred block size */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:57:9: error: unknown type name '__int32_t'
    typedef __int32_t       __darwin_dev_t;         /* dev_t */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:60:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_gid_t;         /* [???] process and group IDs */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:61:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_id_t;          /* [XSI] pid_t, uid_t, or gid_t*/
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:62:9: error: unknown type name '__uint64_t'
    typedef __uint64_t      __darwin_ino64_t;       /* [???] Used for 64 bit inodes */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:68:9: error: unknown type name '__darwin_natural_t'
    typedef __darwin_natural_t __darwin_mach_port_name_t; /* Used by mach */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:70:9: error: unknown type name '__uint16_t'
    typedef __uint16_t      __darwin_mode_t;        /* [???] Some file attributes */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:71:9: error: In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/benchmark.cpp:19:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/time.h:67:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/cdefs.h:870:2: error: Unsupported architecture
    #error Unsupported architecture
     ^
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/benchmark.cpp:19:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/time.h:68:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:33:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/machine/_types.h:34:2: error: architecture not supported
    #error architecture not supported
     ^
    unknown type name '__int64_t'In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/benchmark.cpp:19:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/time.h:68:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:55:9: error: unknown type name '__int64_t'
    typedef __int64_t       __darwin_blkcnt_t;      /* total blocks */
            ^
    
    typedef __int64_t       __darwin_off_t;         /* [???] Used for file sizes *//Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:56:9: error: unknown type name '__int32_t'
    typedef __int32_t       __darwin_blksize_t;     /* preferred block size */
            ^
    
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:57:9: error: unknown type name '__int32_t'
    typedef __int32_t       __darwin_dev_t;         /* dev_t */
            ^
            ^/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:60:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_gid_t;         /* [???] process and group IDs */
            ^
    
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:61:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_id_t;          /* [XSI] pid_t, uid_t, or gid_t*/
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:62:9: error: unknown type name '__uint64_t'
    typedef __uint64_t      __darwin_ino64_t;       /* [???] Used for 64 bit inodes */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:72:9: error: unknown type name '__int32_t'
    typedef __int32_t       __darwin_pid_t;         /* [???] process and group IDs */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:73:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_sigset_t;      /* [???] signal set */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:74:9: error: unknown type name '__int32_t'
    typedef __int32_t       __darwin_suseconds_t;   /* [???] microseconds */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:75:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_uid_t;         /* [???] user IDs */
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:76:9: error: unknown type name '__uint32_t'
    typedef __uint32_t      __darwin_useconds_t;    /* [???] microseconds */
            ^
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/src/c_api.cpp:16:
    In file included from /Users/akuvox/Desktop/hm_AI_Project/ncnn-20220729-full-source/build_x86/src/platform.h:76:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/pthread.h:55:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/_types.h:27:
    In file included from /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/_types.h:32:
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS15.5.sdk/usr/include/sys/cdefs.h:870:2: error: Unsupported architecture
    #error Unsupported architecture
    
    ......
    

    如果只编译支持arm*系列无此问题

  • 安卓端不禁用use_fp16_arithmetic情况下,卷积层部分输出为0

    安卓端不禁用use_fp16_arithmetic情况下,卷积层部分输出为0

    detail | 详细描述 | 詳細な説明

    在安卓端跑模型推理时不禁用use_fp16_arithmetic情况下,第一层卷积层部分输出就为0,pc端则不会,请问各位有遇到类似问题吗,求解?? 设备:Qualcomm Snapdragon 778G Plus 模型:模型均为采用ncnnoptimize优化的fp16模型 (onnx2ncnn导出模型问题一致) 安卓端在不禁用use_fp16_arithmetic情况下第一层卷积层部分输出: (左边是安卓端,右边是pc端) {646132f6-5cf3-4c11-bcaa-b3c8eaa24548} 禁用use_fp16_arithmetic情况下,精度相差极少: {f29fd478-4e74-41fb-8952-4411675abadd}

  • [pnnx] convert error when have nn.Parameters, example provided

    [pnnx] convert error when have nn.Parameters, example provided

    error:

    foldable_constant output_mean.1
    libc++abi: terminating with uncaught exception of type c10::Error: Tensors of type TensorImpl do not have sizes
    Exception raised from sizes_custom at /Users/runner/work/pytorch/pytorch/pytorch/c10/core/TensorImpl.cpp:416 (most recent call first):
    frame #0: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 92 (0x104fa7a1c in libc10.dylib)
    

    Here is the pytorch model I want to convet:

    
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from packaging import version
    import os
    
    
    class Model(nn.Module):
        def __init__(self):
            super(Model, self).__init__()
    
            self.fc1 = nn.Linear(864, 256)
            self.fc2 = nn.Linear(256, 140)
            self.lnw1 = self._build_params([8, 512, 140])
    
        def _build_params(self, shape):
            return nn.Parameter(torch.randn(shape))
    
        def forward(self, x, ph):
            x = self.fc1(x)
            x = F.elu(x)
            x = self.fc2(x)
            x = F.elu(x)
            
            ph = ph.unsqueeze(-1).unsqueeze(-1)
            print_shape(x)
            lpn_w1 = torch.sum(self.lnw1 * ph, dim=1)
    
            print_shape(lpn_w1, x)
            lpn_h1 = torch.einsum("bij,bj->bi", lpn_w1, x)
            return lpn_h1
    
    def to_np_bin(data, bin_f):
        if isinstance(data, torch.Tensor):
            data = data.numpy()
        data.tofile(bin_f)
    
    
    def test():
        torch.set_grad_enabled(False)
        torch.manual_seed(1024)
    
    
        net = Model()
        net.eval()
    
        x = torch.ones(1, 864)
        ph = torch.ones(1, 8)
    
        a = net(x, ph)
        print(a)
    
        # export torchscript
        mod = torch.jit.trace(net, [x, ph])
        save_f = "test_simple_fc.pt"
        mod.save(save_f)
    
        a = mod(x, ph)
        print(a[:, :40])
        print(a.shape)
    
        to_np_bin(a, 'data.bin')
        to_np_bin(x, 'in.bin')
    
    
    if __name__ == "__main__":
        if test():
            exit(0)
        else:
            exit(1)
    
    
    

    when convert pnnx got error above.

    Be note that, this will got at any scenarios when I have nn.Parameters (tested with another more huge model same error).

    Please take a look and give some advice how to fix

  • Finish the heapsort of simplestl partial_sort

    Finish the heapsort of simplestl partial_sort

    算法思路

    堆排序partial_sort算法思路如下: 一个大小为n的array,我们要获得top k个最大(最小)的元素。

    • 以array的前k个元素建立一个大小为k的小(大)根堆(使用自定义heapify()函数)
    • 遍历剩余n-k个元素与小(大)根堆的堆顶元素比较,如果比堆顶元素大(小)那么就会交换两者同时重新更新小(大)根堆,遍历结束后会获得top k个最大(最小)的元素,但是并不是按照严格的顺序来排序。
    • 利用堆顶是最大(最小)的元素对这k个元素使用常规意义下的堆排序来依次获得严格降序(升序)的top k array. 以上所有操作均为inplace操作

    复杂度分析

    • 时间复杂度 之前冒泡排序时间复杂度为O(nk), 此处堆排序时间复杂度为O((n-k)logk)
    • 空间复杂度 空间复杂度相同,均为inplace操作并没有使用额外空间。
  • pnnx怎么转换rvm mobilenetv3这种多输入模型到ncnn呢?

    pnnx怎么转换rvm mobilenetv3这种多输入模型到ncnn呢?

    error log | 日志或报错信息 | ログ

    terminate called after throwing an instance of 'c10::Error' what(): forward() is missing value for argument 'r1'. Declaration: forward(torch.model.model.MattingNetwork self, Tensor src, Tensor r1, Tensor r2, Tensor r3, Tensor r4, Tensor downsample_ratio) -> (Tensor[]) Exception raised from checkAndNormalizeInputs at /pytorch/aten/src/ATen/core/function_schema_inl.h:239 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf348a9a22 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fdf348a63db in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so) frame #2: + 0xf182ed (0x7fdf21fb72ed in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #3: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::string, c10::IValue, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, c10::IValue> > > const&) + 0x2d (0x7fdf24627fcd in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #4: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::string, c10::IValue, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, c10::IValue> > > const&) + 0x138 (0x7fdf246355f8 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #5: + 0x8ee83 (0x561c35707e83 in ./build/src/pnnx) frame #6: + 0x8ca5f (0x561c35705a5f in ./build/src/pnnx) frame #7: + 0x6db62 (0x561c356e6b62 in ./build/src/pnnx) frame #8: + 0x3c56d (0x561c356b556d in ./build/src/pnnx) frame #9: __libc_start_main + 0xe7 (0x7fdedda05c87 in /lib/x86_64-linux-gnu/libc.so.6) frame #10: + 0x3a9da (0x561c356b39da in ./build/src/pnnx)

    model | 模型 | モデル

    1. original model rvm_mobilenetv3_fp32.zip

    how to reproduce | 复现步骤 | 再現方法

    1.使用转换命令为:./build/src/pnnx rvm_mobilenetv3_fp32.pt inputshape=[1,3,1080,1920] 2.模型的输入本来应该是六个:src = torch.randn(1, 3, 1080, 1920).to("cpu")、rec = (torch.zeros([1, 1, 1, 1]).to("cpu"),) * 4、downsample_ratio = torch.tensor([0.25]) 3.

Jul 21, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Aug 5, 2022
ffcnn is a cnn neural network inference framework, written in 600 lines C language.

+----------------------------+ ffcnn 卷积神经网络前向推理库 +----------------------------+ ffcnn 是一个 c 语言编写的卷积神经网络前向推理库 只用了 500 多行代码就实现了完整的 yolov3、yolo-fastes

Jul 5, 2022
Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

Aug 11, 2022
An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

Aug 8, 2022
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

Jul 6, 2022
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision

The MRPT project 1. Introduction Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer v

Aug 15, 2022
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.
A lightweight 2D Pose model  can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

Aug 15, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Jul 31, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Jul 21, 2022
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Aug 17, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

Aug 8, 2022
NCNN+Int8+YOLOv4 quantitative modeling and real-time inference
NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

Apr 25, 2022
Simple inference deep head pose ncnn version
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Jun 13, 2022
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

Aug 1, 2022
This is a sample ncnn android project, it depends on ncnn library and opencv
This is a sample ncnn android project, it depends on ncnn library and opencv

This is a sample ncnn android project, it depends on ncnn library and opencv

Jul 28, 2022
GFPGAN-ncnn - a naive NCNN implementation of GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration
GFPGAN-ncnn - a naive NCNN implementation of GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration

GFPGAN-ncnn a naive ncnn implementation of GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration model support: 1.GFPGANClean

Jul 13, 2022
RealSR-NCNN-Android is a simple Android application that based on Realsr-NCNN & Real-ESRGAN.
 RealSR-NCNN-Android is a simple Android application that based on Realsr-NCNN & Real-ESRGAN.

RealSR-NCNN-Android Real-ESRGAN is a Practical Algorithms for General Image Restoration. RealSR-NCNN-Android is a simple Android application that base

Aug 8, 2022