ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn

License Build Status download codecov Language grade: C/C++

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. it is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。ncnn 从设计之初深刻考虑手机端的部署和使用。无第三方依赖,跨平台,手机端 cpu 的速度快于目前所有已知的开源框架。基于 ncnn,开发者能够将深度学习算法轻松移植到手机端高效执行,开发出人工智能 APP,将 AI 带到你的指尖。ncnn 目前已在腾讯多款应用中使用,如 QQ,Qzone,微信,天天P图等。


技术交流QQ群:637093648(超多大佬) 答案:卷卷卷卷卷

Pocky群(MLIR YES!): 677104663(超多大佬)

Telegram Group https://t.me/ncnnyes

Discord Channel https://discord.gg/YRsxgmF


Current building status matrix

System CPU (32bit) CPU (64bit) GPU (32bit) GPU (64bit)
Linux (GCC) Build Status Build Status Build Status
Linux (Clang) Build Status Build Status Build Status
Linux (ARM) Build Status Build Status
Linux (MIPS) Build Status Build Status
Linux (RISC-V) Build Status
Windows (VS2015) Build Status Build Status
Windows (VS2017) Build Status Build Status Build Status
Windows (VS2019) Build Status Build Status Build Status
macOS Build Status Build Status
macOS (ARM) Build Status Build Status
Android Build Status Build Status Build Status Build Status
Android-x86 Build Status Build Status Build Status Build Status
iOS Build Status Build Status Build Status
iOS Simulator Build Status Build Status
WebAssembly Build Status
RISC-V GCC/Newlib Build Status Build Status

Support most commonly used CNN network

支持大部分常用的 CNN 网络


HowTo

how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3 / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000

download prebuild binary package for android and ios

use ncnn with alexnet with detailed steps, recommended for beginners :)

ncnn 组件使用指北 alexnet 附带详细步骤,新人强烈推荐 :)

use netron for ncnn model visualization

out-of-the-box web model conversion

ncnn low-level operation api

ncnn param and model file spec

ncnn operation param weight table

how to implement custom layer step by step


FAQ

ncnn throw error

ncnn produce wrong result

ncnn vulkan


Features

  • Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
  • No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
  • Pure C++ implementation, cross-platform, supports android, ios and so on
  • ARM NEON assembly level of careful optimization, calculation speed is extremely high
  • Sophisticated memory management and data structure design, very low memory footprint
  • Supports multi-core parallel computing acceleration, ARM big.LITTLE cpu scheduling optimization
  • Supports GPU acceleration via the next-generation low-overhead vulkan api
  • Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
  • Support direct memory zero copy reference load network model
  • Can be registered with custom layer implementation and extended
  • Well, it is strong, not afraid of being stuffed with 卷 QvQ

功能概述

  • 支持卷积神经网络,支持多输入和多分支结构,可计算部分分支
  • 无任何第三方库依赖,不依赖 BLAS/NNPACK 等计算框架
  • 纯 C++ 实现,跨平台,支持 android ios 等
  • ARM NEON 汇编级良心优化,计算速度极快
  • 精细的内存管理和数据结构设计,内存占用极低
  • 支持多核并行计算加速,ARM big.LITTLE cpu 调度优化
  • 支持基于全新低消耗的 vulkan api GPU 加速
  • 可扩展的模型设计,支持 8bit 量化 和半精度浮点存储,可导入 caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模型
  • 支持直接内存零拷贝引用加载网络模型
  • 可注册自定义层实现并扩展
  • 恩,很强就是了,不怕被塞卷 QvQ

supported platform matrix

  • = known work and runs fast with good optimization
  • ✔️ = known work, but speed may not be fast enough
  • = shall work, not confirmed
  • / = not applied
Windows Linux Android macOS iOS
intel-cpu ✔️ ✔️ ✔️ /
intel-gpu ✔️ ✔️ /
amd-cpu ✔️ ✔️ ✔️ /
amd-gpu ✔️ ✔️ /
nvidia-gpu ✔️ ✔️ /
qcom-cpu ✔️ / /
qcom-gpu ✔️ ✔️ / /
arm-cpu / /
arm-gpu ✔️ / /
apple-cpu / / / ✔️
apple-gpu / / / ✔️ ✔️

Example project


License

BSD 3 Clause

Comments
  • 测试MTCNN结果完全不一样

    测试MTCNN结果完全不一样

    跑了下MTCNN的PNet和RNet结果与标准结果相差很大,拿一张人脸给RNet的得分也很低

    const float mean_vals[3] = {127.5f, 127.5f, 127.5f}; const float norm_vals[3] = {0.0078125, 0.0078125, 0.0078125};

    int hs = ceil(img_hscales[i]); int ws = ceil(img_wscales[i]); ncnn::Mat pnet_img = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, img_w, img_h, ws, hs); pnet_img.substract_mean_normalize(mean_vals,norm_vals); ncnn::Extractor Pnet_ex = Pnet.create_extractor(); Pnet_ex.set_light_mode(true); Pnet_ex.input("data", pnet_img); ncnn::Mat score, loc; Pnet_ex.extract("prob1", score); Pnet_ex.extract("conv4-2", loc);

    if(*(score_data+i)>=thresh)。。。

  • prepare for release with experimental gpu inference capability

    prepare for release with experimental gpu inference capability

    shader

    • [x] priorbox (ssd)
    • [x] premute (ssd)
    • [x] deconvolution
    • [x] deconvolutiondepthwise (yolo)
    • [x] interp (upsample)
    • [x] reorg (yolov2)
    • [x] prelu
    • [x] reshape
    • [x] tanh
    • [x] sigmoid
    • [x] clip
    • [x] absval
    • [x] shufflechannel (shufflenet)

    example

    • [x] squeezenet-gpu
    • [x] mobilenet-ssd-gpu
    • [x] mobilenet-yolov3-gpu

    benchncnn

    • [x] shufflenet
    • [x] mobilenet-ssd / squeezenet-ssd
    • [x] mobilenet-yolo / mobilenet-yolov3

    binary release

    • [x] vulkan-enabled android prebuild library
    • [x] vulkan-enabled ios prebuild framework (arm64 only)

    documentation

    • [x] faq about common vulkan api error
    • [x] faq about packing
    • [x] faq about hybrid gpu/cpu inference practice
    • [x] faq about op fusion
  • 在转换模型文件的过程中,为什么参数会发生变化?

    在转换模型文件的过程中,为什么参数会发生变化?

    我是用pytorch生成的onnx模型,onnx模型经过onnxsim简化之后,再用onnx2ncnn转成.param文件,再用ncnnoptimize对.param进行优化之后, 然后用netron工具打开onnx模型文件和.param文件发现最后一层全连接层的w和b的值不一样,这是什么问题?有人遇到过没?

  • mtcnn使用20191113版本速度变慢

    mtcnn使用20191113版本速度变慢

    代码:https://github.com/moli232777144/mtcnn_ncnn 使用代码自带的ncnn库(更新时间是20180516),NDK版本使用android-ndk-r16b,对代码自带的科比图像循环检测100次,线程数为2,平均耗时是45ms。 更新ncnn库到20191113版本,NDK版本更新为android-ndk-r19c,对科比图像循环检测100次,线程数为2,平均耗时是106ms。 注:测试手机是vivo NEX A,骁龙710,代码下载下来之后未作变动,只是更新了gradle版本。对比前后只是更新了ncnn库和sdk版本(sdk版本不变的话,20191113版本检测时间还会更长一点),其余未作变动。 这个耗时相差还是比较大的,请问一下可能是什么原因呢?非常感谢!

    更新:对历史版本挨个测试之后发现,20190611版本的2线程循环1000次平均耗时仍然为44ms,到了20190908版本就变成了107ms。

  • 小米10/10Pro 省电模式必现闪退:__kmp_abort_process

    小米10/10Pro 省电模式必现闪退:__kmp_abort_process

    error log | 日志或报错信息 | ログ

    Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 31959 (thread-pool-1), pid 31478 (om.timehut.ncnn)
    pid: 31478, tid: 31959, name: thread-pool-1  >>> com.timehut.ncnn <<<
          #01 pc 0000000000b19260  /data/app/~~DewB4Z9g4BUSPMAa4ro2Zw==/com.timehut.ncnn-g1R4w3hYqG12Txs5VaOSeQ==/base.apk!libtimehut-ai.so (offset 0x23ce000) (__kmp_abort_process+52) (BuildId: 91d74b3087e79f0d9d64fe036c708f7533cc5462)
    

    context | 编译/运行环境 | バックグラウンド

    ncnn版本:ncnn-20220729-android-vulkan Android Studio Electric Eel | 2022.1.1 Beta 4 NDK :24.0.8215888 运行环境:小米10/小米10 Pro,省电模式

    how to reproduce | 复现步骤 | 再現方法

    1. 省电模式每次必现闪退,测试发现只要调用System.loadLibrary加载ncnn编译好的so库就会闪退
    2. 代码如下

    more | 其他 | その他

        init {
            System.loadLibrary("timehut-ai")
        }
    
  • 量化int8后出现以下情况:yolov5无检测框,yolo fastest等轻量检测网络精度下降严重

    量化int8后出现以下情况:yolov5无检测框,yolo fastest等轻量检测网络精度下降严重

    目前量化实验得到的几个结果返馈一下并提问大佬一些问题 ①预处理和校验数据确实是会影响到量化模型的精度(coco&ImageNet&voc的mean&norm转化后需区别对待,此条可忽略) ②有些模型量化后会出现没有检测框的现象(比如v5,v5有些处理模块比较复杂,不知道和这个有没有关系) ③一些轻量模型量化后推理时间反而更久,且精度出现下降严重的情况,但是这个推理时间更久是有前提的,比如用的是inter i7,i5这类处理器(原因可能有两个,一是像up说的ncnn更注重arm类架构,其次才是我说的这类处理器,二是类似yolo fastest的fp16在inter的处理器上已经达到20ms级别,开启vulkan甚至可以10ms,可能量化后速度也很难得到提升~可能在板子上的情况并非如此,但是我的树莓派坏了,没发去验证。) 想问下nihui大佬几个问题~ 一是yolo fastest会出现精度下降的情况具体原因是啥呢(相比下,fp16的检测精度还是可以的) 二是为何像v5这种会出现无检测框情况,虽然单帧运行确实快了三倍,但无检测框说明在下的量化过程是失败的,求大佬指点一二,感谢

  • ncnn 有类似下面这个函数没? warpAffine

    ncnn 有类似下面这个函数没? warpAffine

    void warpAffine(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags=INTER_LINEAR, intborderMode=BORDER_CONSTANT, const Scalar& borderValue=Scalar())

    另外多线程 有没有 纯C++的code 。 谢谢 z

  • 为什么我拿ncnn编译出来的.a文件编译so的时候会报undefined reference to '__kmpc_fork_call'的错误

    为什么我拿ncnn编译出来的.a文件编译so的时候会报undefined reference to '__kmpc_fork_call'的错误

    你好,我想请问一下: 我拿ncnn编译出来的.a文件,用ndk-build编译.so的时候会报 undefined reference to '__kmpc_fork_call' undefined reference to '__kmpc_for_static_init_4' undefined reference to '__kmpc_for_static_fini' undefined reference to '__kmpc_for_static_init_4' layer/convolutiondepthwise.cpp:176: error:undefined reference to '__kmpc_for_static_init_8'

  • 使用ncnn推理时error

    使用ncnn推理时error

    编译运行环境都是linux 然后在推理自己的代码时出现 [New LWP 17798] [New LWP 17800] [New LWP 17802] [New LWP 17801] [New LWP 17803] [New LWP 17799] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./matting-infer'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000562614d73a39 in ncnn::NetPrivate::forward_layer(int, std::vector<ncnn::Mat, std::allocatorncnn::Mat >&, ncnn::Option const&) const () [Current thread is 1 (Thread 0x7f979a75cc00 (LWP 17798))]

  • architecture changes for int8 packing

    architecture changes for int8 packing

    • [x] requantize
    • [ ] armv7 im2col+gemm pack8
    • [ ] armv7 conv1x1 pack8
    • [ ] armv7 conv3x3s1 pack8
    • [ ] armv8 im2col+gemm pack8
    • [ ] armv8 conv1x1 pack8
    • [ ] armv8 conv3x3s1 pack8
  • 预编译的库ncnn-vulkan.framework有问题,直接ld: Framework not found ncnn-vulkan

    预编译的库ncnn-vulkan.framework有问题,直接ld: Framework not found ncnn-vulkan

    我使用github上面预编译好的ncnn-vulkan.framework库,按照常规使用方式在Build phases里添加进项目,run之后直接error了,提示ld: Framework not found [ncnn-vulkan,我使用其他的framework库都没有任何问题,我网上下载了别人编译好的ncnn-vulkan.framework库,按同样的方式集成进项目里,也没有报错,目前就是使用了github上预编译好的ncnn-vulkan.framework出现了这个问题,麻烦尽快处理一下,谢谢!

  • yolov4使用VULKAN预测出现全屏预测框,yolov5、yolov7则根本没有使用GPU进行加速预测,请问是哪里出了问题呢

    yolov4使用VULKAN预测出现全屏预测框,yolov5、yolov7则根本没有使用GPU进行加速预测,请问是哪里出了问题呢

    1 2 window10 + vs2019 + ncnn-20221128 + protobuf-3.11.2 + VulkanSDK1.3.231.1 如果使用最新版的VulkanSDK1.3.236.0,那么,如果yolov4将use_vulkan_compute设为true会启动异常,false则可以正常预测结果

  • Add x86 MultiHeadAttention

    Add x86 MultiHeadAttention

    调用Gemm和MatMul版本的MultiheadAttention。

    在i7-12700上,stable diffusion大概降低到4s/step,感觉上还不够快,up看看有啥建议。 测试op的时候Size用大一点,(320,1280), (4096,1280)这种级别的,并且epsilon要大一点,0.05左右,Size大了稍微有几个数据点精度维持不住,不过最后出来的效果也是可以的。

  • [feature request] int8 quantization support for Convolution1D and ConvolutionDepthWise1D

    [feature request] int8 quantization support for Convolution1D and ConvolutionDepthWise1D

    https://github.com/Tencent/ncnn/blob/c471826da1e1fd3820e4a6690e777479e22c4ceb/tools/quantize/ncnn2table.cpp#L132

    Is there a plan to also support

    • Convolution1D
    • ConvolutionDepthWise1D
  • 'float16x4_t' was not declared in this scope when compile for hisi

    'float16x4_t' was not declared in this scope when compile for hisi

    error log | 日志或报错信息 | ログ

    [ 0%] Built target ncnn-generate-spirv [ 0%] Building CXX object src/CMakeFiles/ncnn.dir/blob.cpp.o [ 1%] Building CXX object src/CMakeFiles/ncnn.dir/allocator.cpp.o [ 1%] Building CXX object src/CMakeFiles/ncnn.dir/benchmark.cpp.o [ 2%] Building CXX object src/CMakeFiles/ncnn.dir/command.cpp.o [ 2%] Building CXX object src/CMakeFiles/ncnn.dir/cpu.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/datareader.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/c_api.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/gpu.cpp.o [ 4%] Building CXX object src/CMakeFiles/ncnn.dir/mat.cpp.o [ 5%] Building CXX object src/CMakeFiles/ncnn.dir/layer.cpp.o [ 6%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_drawing.cpp.o [ 7%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_affine.cpp.o [ 7%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel.cpp.o [ 8%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_resize.cpp.o [ 8%] Building CXX object src/CMakeFiles/ncnn.dir/mat_pixel_rotate.cpp.o [ 9%] Building CXX object src/CMakeFiles/ncnn.dir/modelbin.cpp.o [ 10%] Building CXX object src/CMakeFiles/ncnn.dir/net.cpp.o [ 10%] Building CXX object src/CMakeFiles/ncnn.dir/option.cpp.o [ 10%] Building CXX object src/CMakeFiles/ncnn.dir/pipeline.cpp.o [ 11%] Building CXX object src/CMakeFiles/ncnn.dir/paramdict.cpp.o [ 12%] Building CXX object src/CMakeFiles/ncnn.dir/pipelinecache.cpp.o [ 12%] Building CXX object src/CMakeFiles/ncnn.dir/simpleocv.cpp.o [ 14%] Building CXX object src/CMakeFiles/ncnn.dir/simpleomp.cpp.o [ 14%] Building CXX object src/CMakeFiles/ncnn.dir/simplestl.cpp.o [ 14%] Building CXX object src/CMakeFiles/ncnn.dir/layer/absval.cpp.o [ 15%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/absval_arm.cpp.o [ 15%] Building CXX object src/CMakeFiles/ncnn.dir/layer/batchnorm.cpp.o [ 16%] Building CXX object src/CMakeFiles/ncnn.dir/layer/bias.cpp.o [ 17%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/batchnorm_arm.cpp.o [ 17%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/bias_arm.cpp.o [ 17%] Building CXX object src/CMakeFiles/ncnn.dir/layer/concat.cpp.o [ 18%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/concat_arm.cpp.o [ 19%] Building CXX object src/CMakeFiles/ncnn.dir/layer/bnll.cpp.o [ 19%] Building CXX object src/CMakeFiles/ncnn.dir/layer/convolution.cpp.o [ 20%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/convolution_arm.cpp.o [ 21%] Building CXX object src/CMakeFiles/ncnn.dir/layer/crop.cpp.o [ 21%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/crop_arm.cpp.o [ 22%] Building CXX object src/CMakeFiles/ncnn.dir/layer/deconvolution.cpp.o [ 22%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/deconvolution_arm.cpp.o [ 23%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/dropout_arm.cpp.o [ 23%] Building CXX object src/CMakeFiles/ncnn.dir/layer/eltwise.cpp.o [ 24%] Building CXX object src/CMakeFiles/ncnn.dir/layer/dropout.cpp.o [ 24%] Building CXX object src/CMakeFiles/ncnn.dir/layer/elu.cpp.o [ 25%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/eltwise_arm.cpp.o [ 25%] Building CXX object src/CMakeFiles/ncnn.dir/layer/exp.cpp.o [ 27%] Building CXX object src/CMakeFiles/ncnn.dir/layer/flatten.cpp.o [ 27%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/flatten_arm.cpp.o [ 28%] Building CXX object src/CMakeFiles/ncnn.dir/layer/embed.cpp.o [ 29%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/innerproduct_arm.cpp.o [ 29%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/innerproduct_arm_vfpv4.cpp.o [ 29%] Building CXX object src/CMakeFiles/ncnn.dir/layer/innerproduct.cpp.o [ 30%] Building CXX object src/CMakeFiles/ncnn.dir/layer/input.cpp.o [ 30%] Building CXX object src/CMakeFiles/ncnn.dir/layer/log.cpp.o [ 31%] Building CXX object src/CMakeFiles/ncnn.dir/layer/lrn.cpp.o [ 32%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/lrn_arm.cpp.o [ 32%] Building CXX object src/CMakeFiles/ncnn.dir/layer/memorydata.cpp.o [ 33%] Building CXX object src/CMakeFiles/ncnn.dir/layer/mvn.cpp.o [ 35%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/pooling_arm.cpp.o [ 35%] Building CXX object src/CMakeFiles/ncnn.dir/layer/power.cpp.o [ 35%] Building CXX object src/CMakeFiles/ncnn.dir/layer/prelu.cpp.o [ 36%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/prelu_arm.cpp.o [ 36%] Building CXX object src/CMakeFiles/ncnn.dir/layer/pooling.cpp.o [ 37%] Building CXX object src/CMakeFiles/ncnn.dir/layer/reduction.cpp.o [ 37%] Building CXX object src/CMakeFiles/ncnn.dir/layer/proposal.cpp.o [ 37%] Building CXX object src/CMakeFiles/ncnn.dir/layer/relu.cpp.o [ 38%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/relu_arm.cpp.o [ 39%] Building CXX object src/CMakeFiles/ncnn.dir/layer/reshape.cpp.o [ 39%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/reshape_arm.cpp.o [ 40%] Building CXX object src/CMakeFiles/ncnn.dir/layer/roipooling.cpp.o [ 40%] Building CXX object src/CMakeFiles/ncnn.dir/layer/scale.cpp.o [ 41%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/scale_arm.cpp.o [ 42%] Building CXX object src/CMakeFiles/ncnn.dir/layer/sigmoid.cpp.o In file included from /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_arm.cpp:31:0: /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h: In function 'void ncnn::innerproduct_pack4_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, const ncnn::Mat&, const ncnn::Mat&, int, const ncnn::Mat&, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:256:45: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:256:77: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:284:44: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:284:72: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h: In function 'void ncnn::innerproduct_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, const ncnn::Mat&, const ncnn::Mat&, int, const ncnn::Mat&, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:413:45: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr0))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:413:74: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr0))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:510:44: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:510:72: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h: In function 'void ncnn::innerproduct_transform_kernel_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, int, int, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_fp16s.h:716:68: error: 'vcvt_f16_f32' was not declared in this scope _p.val[0] = (uint16x4_t)(vcvt_f16_f32(vld1q_f32(k0))); ^ In file included from /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_arm.cpp:32:0: /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h: In function 'void ncnn::innerproduct_gemm_fp16s_neon(const ncnn::Mat&, ncnn::Mat&, const ncnn::Mat&, const ncnn::Mat&, int, const ncnn::Mat&, const ncnn::Option&)': /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:123:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:123:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:217:53: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:217:85: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:245:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:245:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:320:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:320:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ [ 42%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/sigmoid_arm.cpp.o /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:417:53: error: 'float16x4_t' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:417:85: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w0 = vcvt_f32_f16((float16x4_t)(vget_low_u16(_w01))); ^ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:436:52: error: 'float16x4_t' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^~~~~~~~~~~ /home/guanchao/wangyn/ai/ncnn/src/layer/arm/innerproduct_gemm_fp16s.h:436:80: error: 'vcvt_f32_f16' was not declared in this scope float32x4_t _w = vcvt_f32_f16((float16x4_t)(vld1_u16(kptr))); ^ [ 43%] Building CXX object src/CMakeFiles/ncnn.dir/layer/slice.cpp.o [ 43%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/slice_arm.cpp.o [ 44%] Building CXX object src/CMakeFiles/ncnn.dir/layer/softmax.cpp.o [ 45%] Building CXX object src/CMakeFiles/ncnn.dir/layer/split.cpp.o [ 45%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/softmax_arm.cpp.o [ 46%] Building CXX object src/CMakeFiles/ncnn.dir/layer/tanh.cpp.o [ 47%] Building CXX object src/CMakeFiles/ncnn.dir/layer/threshold.cpp.o [ 47%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/tanh_arm.cpp.o [ 47%] Building CXX object src/CMakeFiles/ncnn.dir/layer/tile.cpp.o [ 48%] Building CXX object src/CMakeFiles/ncnn.dir/layer/rnn.cpp.o [ 49%] Building CXX object src/CMakeFiles/ncnn.dir/layer/lstm.cpp.o [ 49%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/rnn_arm.cpp.o [ 50%] Building CXX object src/CMakeFiles/ncnn.dir/layer/arm/lstm_arm.cpp.o make[2]: *** [src/CMakeFiles/ncnn.dir/build.make:762: src/CMakeFiles/ncnn.dir/layer/arm/innerproduct_arm.cpp.o] Error 1 make[2]: *** Waiting for unfinished jobs.... [ 50%] Building CXX object src/CMakeFiles/ncnn.dir/layer/binaryop.cpp.o make[1]: *** [CMakeFiles/Makefile2:143: src/CMakeFiles/ncnn.dir/all] Error 2 make: *** [Makefile:136: all] Error 2

    context | 编译/运行环境 | バックグラウンド

    $ cmake --version cmake version 3.25.0

    CMake suite maintained and supported by Kitware (kitware.com/cmake).

    $ make --version GNU Make 4.2.1 Built for x86_64-pc-linux-gnu Copyright (C) 1988-2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

    how to reproduce | 复现步骤 | 再現方法

    1. cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/himix200.toolchain.cmake ..
    2. make -j$(nproc)

    more | 其他 | その他

Dec 20, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Nov 24, 2022
ffcnn is a cnn neural network inference framework, written in 600 lines C language.

+----------------------------+ ffcnn 卷积神经网络前向推理库 +----------------------------+ ffcnn 是一个 c 语言编写的卷积神经网络前向推理库 只用了 500 多行代码就实现了完整的 yolov3、yolo-fastes

Dec 28, 2022
Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

Dec 26, 2022
An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

Jan 5, 2023
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

Nov 29, 2022
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision

The MRPT project 1. Introduction Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer v

Dec 24, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Dec 17, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Dec 20, 2022
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Dec 29, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

Jan 4, 2023
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.
A lightweight 2D Pose model  can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

Jan 3, 2023
NCNN+Int8+YOLOv4 quantitative modeling and real-time inference
NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

Dec 6, 2022
Simple inference deep head pose ncnn version
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Dec 16, 2022
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

Dec 26, 2022
This is a sample ncnn android project, it depends on ncnn library and opencv
This is a sample ncnn android project, it depends on ncnn library and opencv

This is a sample ncnn android project, it depends on ncnn library and opencv

Jan 6, 2023
GFPGAN-ncnn - a naive NCNN implementation of GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration
GFPGAN-ncnn - a naive NCNN implementation of GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration

GFPGAN-ncnn a naive ncnn implementation of GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration model support: 1.GFPGANClean

Dec 10, 2022
RealSR-NCNN-Android is a simple Android application that based on Realsr-NCNN & Real-ESRGAN.
 RealSR-NCNN-Android is a simple Android application that based on Realsr-NCNN & Real-ESRGAN.

RealSR-NCNN-Android Real-ESRGAN is a Practical Algorithms for General Image Restoration. RealSR-NCNN-Android is a simple Android application that base

Jan 3, 2023