ffcnn is a cnn neural network inference framework, written in 600 lines C language.

+----------------------------+
 ffcnn 卷积神经网络前向推理库
+----------------------------+

ffcnn 是一个 c 语言编写的卷积神经网络前向推理库
只用了 500 多行代码就实现了完整的 yolov3、yolo-fastest 网络的前向推理
不依赖于任何第三方库,在标准 c 环境下就可以编译通过,在 VC、msys2+gcc、ubuntu+gcc
等多个平台上都可以正确的编译运行

这个代码相对于 darknet、ncnn 来说,没做特殊指令集优化,但代码更加简洁易懂,可以
作为大家学习卷积神经网络的一个参考


darknet 与 yolov3 的一些总结
----------------------------
yolov3 的网络结构里面,只有卷积层、dropout 层、shortcut 层、route 层、maxpool 层、
upsample 层和 yolo 层这几种类型。因此要实现起来还是比较容易的

卷积层:
1. 要搞明白卷积的含义和计算方法
2. 卷积运算的 pad、stride 的含义
3. 每个卷积核还有一个 bias 参数,计算完每个点后需要加上这个 bias
   没有归一化的情况(batch_normalize),其计算方法:
   x += bias;
   x  = activate(x, type);
4. 要搞明白什么是分组卷积
5. 卷积运算每个输出的点,都要经过激活函数
6. 卷积层如果有归一化操作(batch_normalize),其计算方法:
   x  = (x + rolling_mean) / sqrt(rolling_variance + 0.00001f)
   x *= scale;
   x += bias;
   x  = activate(x, type);
   其中 rolling_mean、rolling_variance、scale、bias 在 darknet 的 weights 文件中可以读取到

dropout 层:
前向推理时,这一层可以当做不存在,输入数据不做任何处理,直接传给下一层即可

shortcut 层:
把指定层的数据和当前层的数据相加,然后结果输出到下一层

route 层:
把指定的层(最多可以有 4 个)做拼接,宽高不变,channel 个数增加,然后结果输出到下一层

maxpool 层:
max 池化层,将 filter 覆盖的数据取最大值作为结果

upsample 层:
上采样层,可以理解为把图像放大,stride 指定了放大倍数,一般用最近邻法就可以了

yolo 层:
这一层主要是根据输入的 feature map 计算出 bbox
以 yolo-fastest 为例,总共有两个 yolo 层,其输入分别是 10x10x255 和 20x20x255
其中 255 表示有 255 个通道,其每个数据的含义如下:
255 = 3 * (4 + 1 + 80)
3 表示这个 grid 里面有 3 个 bbox 结果数据
每个 bbox 结果数据里面,4 个 x, y, w, h 坐标数据,1 个 object score 评分,然后是 80 个分类的评分
每个 bbox 里面在 80 个分类中找出评分最高的,作为这个 bbox 的分类,评分如果小于阈值(ignore_thresh)则丢弃
将符合要求的全部 bbox 放入一个列表保存,然后再做一个 nms 操作,就得到最终结果了

每个 bbox 的评分和 (x, y, w, h) 计算方法:
设 tx, ty, tw, th, bs 分别对应channel 0, 1, 2, 3, 4 的值(后面还有 80 个分类的评分)

评分的计算方法:score = sigmoid(bs); (80 个分类评分计算方法是一样的)
坐标计算方法:
float bbox_cx = (j + sigmod(tx)) * grid_width;  (grid_width 就是网络输入层即 0 层的宽度除以格子数目,即每个格子的像素宽度)
float bbox_cy = (i + sigmod(ty)) * grid_height; (方法与 bbox_cx 一致)
float bbox_w  = (float)exp(tw) * anchor_box_w;  (如果有缩放系数还要乘以这个系数)
float bbox_h  = (float)exp(th) * anchor_box_h;  (方法与 bbox_w  一致)

bbox_cx、bbox_cy 是中心点坐标,bbox_w、bbox_h 是宽高,转换一下得到:
x1 = bbox_cx - bbox_w * 0.5f;
y1 = bbox_cy - bbox_h * 0.5f;
x2 = bbox_cx + bbox_w * 0.5f;
y2 = bbox_cy + bbox_h * 0.5f;


darknet 的 weights 文件
-----------------------

文件最前面有一个文件头:
#pragma pack(1)
typedef struct {
    int32_t  ver_major, ver_minor, ver_revision;
    uint64_t net_seen;
} WEIGHTS_FILE_HEADER;
#pragma pack()

然后就是全部的权重数据,yolov3、yolo-fastest 的模块基本上就只有卷积层的权重,其它层是没有权重数据的。
图像和卷积核(filter)的数据都是 NCHW 格式,卷积层的权重数据存放顺序为:

n 个 bias
if (batchnorm) {
    n 个 scale
    n 个 rolling_mean
    n 个 rolling_variance
}
n * c * h * w 个权重数据


ffcnn 的特点
------------
1. 极为简洁易懂的 c 语言代码实现
2. 核心算法仅仅 600 行
3. 不依赖于任何第三方库
4. 可以很方便的移植到各种平台
5. 推理时会自动释放不需要的 layer 减小内存占用
6. 现阶段是 make it work first 后面有时间再优化性能
7. 直接使用 darknet 的 .cfg 和 .weights 文件(不需要再转换)


ffcnn vs ncnn 性能评测
----------------------

测试环境:
1. Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz 1.90GHZ, 8GB RAM
2. win7 64bit 操作系统 + msys2 + mingw32 + gcc version 10.3.0
3. ffcnn + yolo-fastest 代码:https://github.com/rockcarry/ffcnn
4. ncnn  + yolo-fastest 代码:https://github.com/rockcarry/ffyolodet
5. 测试图片 test.bmp 100 次推理

测试结果:
+----------+--------------+-------------------+------------------+
| 测试项目 | ffcnn-v1.2.0 | ncnn with avx off | ncnn with avx on |
+----------+--------------+-------------------+------------------+
| 耗    时 | 14555ms      | 14649 ms          | 8424 ms          |
+----------+--------------+-------------------+------------------+
| 内存占用 | 5MB          | 41MB              | 41MB             |
+----------+--------------+-------------------+------------------+
| 程序体积 | 68KB         | 1.2MB             | 1.2MB            |
+----------+--------------+-------------------+------------------+

可以看到 ffcnn 已经逼近 ncnn(不开启 avx 指令优化)的性能


[email protected]
20:22 2021/8/7









Owner
ck
I am Chen Kai. many people call me ck.
ck
Similar Resources

Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution

Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution

DeepC: Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution This code uses FSRCNN algorithm t

Dec 27, 2022

Real time monaural source separation base on fully convolutional neural network operates on Time-frequency domain.

Real time monaural source separation base on fully convolutional neural network operates on Time-frequency domain.

Real time monaural source separation base on fully convolutional neural network operates on Time-frequency domain.

Jan 9, 2023

Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

Dec 26, 2022

TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Dec 23, 2022

Inference framework for MoE layers based on TensorRT with Python binding

InfMoE Inference framework for MoE-based models, based on a TensorRT custom plugin named MoELayerPlugin (including Python binding) that can run infere

Nov 25, 2022

KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

Dec 27, 2022

Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

Dec 15, 2022

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

Jan 5, 2023

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Jan 3, 2023
Comments
  • 编译bug

    编译bug

    >> bash build.sh ffcnn.c: In function ‘net_load’: ffcnn.c:216:96: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] if ((fp = fopen(fweights, "rb"))) { fseek (fp, sizeof(WEIGHTS_FILE_HEADER), SEEK_SET); fread (net->weight_buf, 1, net->weight_size * sizeof(float), fp); fclose(fp); } ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ffcnn.c: In function ‘load_file_to_buffer’: ffcnn.c:27:16: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] if (buf) { fread(buf, 1, size, fp); buf[size] = '\0'; } ^~~~~~~~~~~~~~~~~~~~~~~ /tmp/ccOQFshr.o: In function `net_forward.part.7': ffcnn.c:(.text+0x154d): undefined reference to `exp' ffcnn.c:(.text+0x2157): undefined reference to `exp' ffcnn.c:(.text+0x2377): undefined reference to `exp' ffcnn.c:(.text+0x2397): undefined reference to `exp' ffcnn.c:(.text+0x24ad): undefined reference to `exp' /tmp/ccOQFshr.o:ffcnn.c:(.text+0x24e1): more undefined references to `exp' follow collect2: error: ld returned 1 exit status

    System info: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

    修改build.sh为gcc -Wall -O3 bmpfile.c ffcnn.c -o ffcnn -lm可以编译通过。

A GKR-based zero-knowledge proof protocol for CNN model inference.

zkCNN Introduction This is the implementation of this paper, which is a GKR-based zero-knowledge proof for CNN reference, containing some common CNN m

Dec 18, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Nov 24, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Jan 5, 2023
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

Nov 29, 2022
SMID, Parallel computing of CNN
SMID, Parallel computing of CNN

Parallel Computing in Deep Reference Network 1. Introduction Deep neural networks are made up of a number of layers of linked nodes, each of which imp

Dec 22, 2021
Cranium - 🤖 A portable, header-only, artificial neural network library written in C99
Cranium - 🤖   A portable, header-only, artificial neural network library written in C99

Cranium is a portable, header-only, feedforward artificial neural network library written in vanilla C99. It supports fully-connected networks of arbi

Dec 25, 2022
A GPU (CUDA) based Artificial Neural Network library
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Dec 10, 2022
simple neural network library in ANSI C
simple neural network library in ANSI C

Genann Genann is a minimal, well-tested library for training and using feedforward artificial neural networks (ANN) in C. Its primary focus is on bein

Dec 29, 2022
oneAPI Deep Neural Network Library (oneDNN)

oneAPI Deep Neural Network Library (oneDNN) This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-

Jan 6, 2023
DyNet: The Dynamic Neural Network Toolkit
DyNet: The Dynamic Neural Network Toolkit

The Dynamic Neural Network Toolkit General Installation C++ Python Getting Started Citing Releases and Contributing General DyNet is a neural network

Dec 31, 2022