Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3

The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly available on a non-commercial basis. Copyright of the model is maintained by the developers, and the model is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International Public License

🌟 This is the released version 1.3 (Mar 22, 2021) for the tool, and this version has improved following inference engine estimation:

1. Validate with real silicon data.
2. Add synchronous and asynchronous mode.
3. Update technology file for FinFET.
4. Add level shifter for eNVM.

👉 👉 👉 In "Param.cpp", to switch mode:

validated = true;           // false: no calibration factor     // true: validated by silicon data
synchronous = true;         // false: asynchronous    	        // true: synchronous, clkFreq decided by sensing delay

🌟 This version has also added three default examples for quick start:

1. VGG8 on cifar10 
   8-bit "WAGE" mode pretrained model is uploaded to './log/VGG8.pth'
3. DenseNet40 on cifar10 
   8-bit "WAGE" mode pretrained model is uploaded to './log/DenseNet40.pth'
5. ResNet18 on imagenet 
   "FP" mode pretrained model is loaded from 'https://download.pytorch.org/models/resnet18-5c106cde.pth'

👉 👉 👉 To quickly start inference estimation of default models (skip training)

python inference.py --dataset cifar10 --model VGG8 --mode WAGE
python inference.py --dataset cifar10 --model DenseNet40 --mode WAGE
python inference.py --dataset imagenet --model ResNet18 --mode FP

For estimation of on-chip training accelerators, please visit released V2.1 DNN+NeuroSim V2.1

In Pytorch/Tensorflow wrapper, users are able to define network structures, precision of synaptic weight and neural activation. With the integrated NeuroSim which takes real traces from wrapper, the framework can support hierarchical organization from device level to circuit level, to chip level and to algorithm level, enabling instruction-accurate evaluation on both accuracy and hardware performance of inference.

Developers: Xiaochen Peng 👭 Shanshi Huang 👭 Anni Lu.

This research is supported by NSF CAREER award, NSF/SRC E2CDA program, and ASCENT, one of the SRC/DARPA JUMP centers.

If you use the tool or adapt the tool in your work or publication, you are required to cite the following reference:

X. Peng, S. Huang, Y. Luo, X. Sun and S. Yu, ※DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies, § IEEE International Electron Devices Meeting (IEDM), 2019.

If you have logistic questions or comments on the model, please contact 👨 Prof. Shimeng Yu, and if you have technical questions or comments, please contact 👩 Xiaochen Peng or 👩 Shanshi Huang or 👩 Anni Lu.

File lists

  1. Manual: Documents/DNN NeuroSim V1.3 Manual.pdf
  2. DNN_NeuroSim wrapped by Pytorch: 'Inference_pytorch'
  3. NeuroSim under Pytorch Inference: 'Inference_pytorch/NeuroSIM'

Installation steps (Linux)

  1. Get the tool from GitHub
git clone https://github.com/neurosim/DNN_NeuroSim_V1.3.git
  1. Train the network to get the model for inference (can be skipped by using pretrained default models)

  2. Compile the NeuroSim codes

make
  1. Run Pytorch/Tensorflow wrapper (integrated with NeuroSim)

For the usage of this tool, please refer to the manual.

References related to this tool

  1. X. Peng, S. Huang, Y. Luo, X. Sun and S. Yu, ※DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies, § IEEE International Electron Devices Meeting (IEDM), 2019.
  2. X. Peng, R. Liu, S. Yu, ※Optimizing weight mapping and data flow for convolutional neural networks on RRAM based processing-in-memory architecture, § IEEE International Symposium on Circuits and Systems (ISCAS), 2019.
  3. P.-Y. Chen, S. Yu, ※Technological benchmark of analog synaptic devices for neuro-inspired architectures, § IEEE Design & Test, 2019.
  4. P.-Y. Chen, X. Peng, S. Yu, ※NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning, § IEEE Trans. CAD, 2018.
  5. X. Sun, S. Yin, X. Peng, R. Liu, J.-S. Seo, S. Yu, ※XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks,§ ACM/IEEE Design, Automation & Test in Europe Conference (DATE), 2018.
  6. P.-Y. Chen, X. Peng, S. Yu, ※NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures, § IEEE International Electron Devices Meeting (IEDM), 2017.
  7. P.-Y. Chen, S. Yu, ※Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing,§ IEEE International Symposium on Circuits and Systems (ISCAS), 2016.
  8. P.-Y. Chen, D. Kadetotad, Z. Xu, A. Mohanty, B. Lin, J. Ye, S. Vrudhula, J.-S. Seo, Y. Cao, S. Yu, ※Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip,§ IEEE Design, Automation & Test in Europe (DATE), 2015.
  9. S. Wu, et al., ※Training and inference with integers in deep neural networks,§ arXiv: 1802.04680, 2018.
  10. github.com/boluoweifenda/WAGE
  11. github.com/stevenygd/WAGE.pytorch
  12. github.com/aaron-xichen/pytorch-playground
Owner
NeuroSim
Researchers from Prof. Shimeng Yu's group at Georgia Tech
NeuroSim
Comments
  • error when running inference script

    error when running inference script

    Hi, When I run inference.py, an error is raised after trace_command.sh is called. The error is "free(): double free detected in tcache 2". My configs are ubuntu 20.04 focal, gcc 9.3, cuda 10.1, cuDNN 7.5.0, python 3.6, pytorch 1.1.0 (according to the suggestions in the user manual). I appreciate it if you could help me resolving the issue.

  • `readLatencyADC` and `CalculateclkFreq`

    `readLatencyADC` and `CalculateclkFreq`

    Hi, I have 3 questions:

    1. why the readLatencyADC is just set to numColMuxed? I expect it to be initialized to something which its dimension is time like X.readLatency. However, I found numColMuxed just as the number of columns that share same read circuit that does not have any relation with time. https://github.com/neurosim/DNN_NeuroSim_V1.3/blob/930c75772d43c733eb3090a5636761e058a16520/Inference_pytorch/NeuroSIM/SubArray.cpp#L853

    2. In general, I cannot understand what is exactly the CalculateclkFreq. Can you give some description about it? https://github.com/neurosim/DNN_NeuroSim_V1.3/blob/930c75772d43c733eb3090a5636761e058a16520/Inference_pytorch/NeuroSIM/SubArray.cpp#L848

    3. In the above code snippet there is an if clause. In this cluase, readLatency is calculated by summing up latency of some modules like adc, switchmatrix and so on. The problem is that, unlike the previous if clause, the CalculateLatency method of the modules is not called. So it is expected that latency of the modules, i.e. X.readLatency is equal to zero which is wrong.

    @neurosim @alu75

  • Significant speedup (using CUDA) can be achieved by initialising arrays on the gpu

    Significant speedup (using CUDA) can be achieved by initialising arrays on the gpu

    I have noticed that, for example, torch.zeros_like() do not define a device. A significant speedup can be achieved when one is defining the device to be equal to 'cuda' in those calls:

    Currently:

    torch.zeros_like(outputOrignal)
    

    Improved:

    torch.zeros_like(outputOrignal, device='cuda')
    

    The same is relevant for torch.normal.

    The changes would mainly be needed to be applied to: https://github.com/neurosim/DNN_NeuroSim_V1.3/blob/3754e10e939e80b4952ba4e09a3afb7972456fc9/Inference_pytorch/modules/quantization_cpu_np_infer.py

  • signed computation

    signed computation

    For the purpose of signed computation, is negative weights are supported by the simulator? If so, could you please explain how it should be implemented in a simulation? Just for the context: it is very common to perform signed computation using differential conductance of a pair of memristors.

  • Confilict in power consumption

    Confilict in power consumption

    Hello, I am using DNN_NeuroSim for inference part. I am unable to find correct relation between Power consumption and Chip area.For instance, in the NeuroSim, for pretrained VGG8 the power consumption and Chip area are around 0.1 Watt and 120 mm2. Howover, in the same platforms such as ISAAC and PUMA-simulator the power consumption and Chip area are around 65 Watt and 90 mm2. I can not understant this difference. Please help me.

    PUMA-simulator: https://github.com/Aayush-Ankit/puma-simulator ISAAC paper: https://www.cs.utah.edu/~rajeev/pubs/isca16.pdf

  • Questions about AdderTree::CalculateLatency

    Questions about AdderTree::CalculateLatency

    Hello @neurosim , I want to ask about the meaning of the first parameter (numRead) of the AdderTree:: CalculateLatency(void AdderTree::CalculateLatency(double numRead, int numUnitAdd, double _capLoad)), at the line 120 of AdderTree.cpp.

    Question 1

    At the line 694 of Chip.cpp, Gaccumulation->CalculateLatency(ceil(numTileEachLayer[1][l]netStructure[l][5](numInVector/(double) Gaccumulation->numAdderTree)), numTileEachLayer[0][l], 0); I want to know why netStructure[l][5] needs to be multiplied by numTileEachLayer[1][l].

    Question 2

    At the line 489 of Tile.cpp, accumulationCM->CalculateLatency((int)(numInVector/param->numBitInput)*ceil(param->numColMuxed/param->numColPerSynapse), numPE, 0); I want to know what ceil (param ->numColMixed/param ->numColPerSynapse) means. Moreover, when param->numColPerSynapse is greater than param ->numColMixed, its value (ceil (param ->numColMixed/param ->numColPerSynapse)) is 0, because both parameters are of type int. Is there a bug?

  • Question for conductance variation

    Question for conductance variation

    Thank you for the great tool!

    I'm currently using DNN NeuroSim v1.3 to evaluate the off-chip training performance of my memristor.

    Have some questions for the conductance variation in the train.py and inference.py file.

    1. Should I use the conductance variation in percentage value in following code?

    parser.add_argument('--vari', default=0, help='conductance variation (e.g. 0.1 standard deviation to generate random variation)')

    I'm bit curious whether I should use 0.05 or 5 if i have a memristor with 5% cycle to cycle conductance variation.

    1. Is the conductance variation in above argument cycle-to-cycle or device-to-device?

    Again, thank you for the great tool.

  • ImportError: cannot import name 'weak_script_method'

    ImportError: cannot import name 'weak_script_method'

    I got an error code ImportError: cannot import name 'weak_script_method'. As below "quantization_cpu_np_infer" has any problem... Could you give me any advices?

    Traceback (most recent call last): File "DNN_NeuroSim_V1.3/Inference_pytorch/inference.py", line 14, in from utee import hook File "/home/ubuntu/DNN_NeuroSim_V1.3/Inference_pytorch/utee/hook.py", line 5, in from modules.quantization_cpu_np_infer import QConv2d,QLinear File "/home/ubuntu/DNN_NeuroSim_V1.3/Inference_pytorch/modules/quantization_cpu_np_infer.py", line 5, in from torch._jit_internal import weak_script_method ImportError: cannot import name 'weak_script_method'

  • Question for SAR ADC Power Consumption

    Question for SAR ADC Power Consumption

    Hello

    First, Thank you for Great Tool!

    I have a question about SAR ADC Power Consumption formula from SarADC.cpp

    May I wonder how this formula below comes? especially for log2(levelOutput) ?

    Column_Power = (0.4710*log2(levelOutput)+1.9529)*1e-6;

    I know that log2(levelOutput) is related to resolution of SAR ADC, but I want to know the reason why Column power increases linearly when resolution of SAR ADC increases

    Looking forward to your reply!

  • Issue in Leakage Power.

    Issue in Leakage Power.

    Hi, Thanks for providing your scripts for interested people.

    I have a question about the Leakage Power reported by Neurosim. Because there are a lot of intermediate buffers to store the results of each layer, the Leakage Buffer does not seems convenient to me. I have looked at the Buffer.cpp script, and I think the energy consumption for each cell ( wlDecoder, precharger, sramWriteDriver, senseAmp) is missed and just buffer has been considered.

    Best Regards,

    Mohammad sabri

  • accuracy larger than 100%

    accuracy larger than 100%

    When testing VGG8, the accuracy output is 1805%.

    I added a line in inference.py to manually accumulate the number of data compared:

    correct += pred.cpu().eq(indx_target).sum() # this line already exists total_pred += len(pred) # new added line

    And I tried to calculate accuracy by correct/total_pred. In this way, the result is about 91%, which seems reasonable. Is this way of calculating correct?

  • can not find initialization of activityColWrite etc.

    can not find initialization of activityColWrite etc.

    Hello! It seems that the activityColWrite, activityRowWrite of SubArray class are not initialized or assigned a value anywhere. Could you please tell me the meaning of these variables? Thank you! @alu75

  • Where is the output

    Where is the output

    After running make it is showing no errors, but the output CSV files are being generated nowhere. I am a beginner at this. Am I missing something? Please help. Thank you.

  • Questions about HTree::CalculateLatency

    Questions about HTree::CalculateLatency

    Hi @neurosim and @alu75 , At line 718 of Chip.cpp, I found that the x_init and y_init of GhTree->CalculateLatency are always 0, which causes the condition "if (((!x_init) && (!y_init)) || ((!x_end) && (!y_end)))" in HTree::CalculateLatency to always be satisfied. And thus, at the same time, x_end and y_end become invalid variables. I'm wondering if this is a bug or if the tileLocaEachLayer is just useless.

    Also, at line from 326 to 338 of Chip.cpp, in the calculation of tileLocaEachLayer in ChipFloorPlan, why thisTileTotal is not accumulated from i=0, but from i=1?

  • Array能耗估计

    Array能耗估计

    hi,我在Subarray的能耗评估处看到:SubArray.cpp row:1241 // Read readDynamicEnergyArray = 0; readDynamicEnergyArray += capBL * cell.readVoltage * cell.readVoltage * numReadCells; // Selected BLs activityColWrite readDynamicEnergyArray += capRow2 * tech.vdd * tech.vdd * numRow * activityRowRead; // Selected WL readDynamicEnergyArray *= numColMuxed; 有两个疑问,想请您解答一下:

    1. 好像只计算了BL、WL的动态功耗,没有看到RRAM的器件功耗?
    2. 在inference.py中看到需要中值电导来消除高阻电流带来的影响,在评估功耗时没有看到相关的部分? 还请您在百忙之中解答一下,多谢了! from NeuroSIm的仰慕者
  • `Gaccumulation` area

    `Gaccumulation` area

    Hi, Can you explain why there is a divide by 3 in the following formula: Gaccumulation->CalculateArea(NULL, globalBufferHeight/3, NONE); (the above code can be found in chip.cpp, in the ChipCalculateArea function) @neurosim

  • `globalBusWidth`

    `globalBusWidth`

    Hi, I cannot understand how the globalBusWidth is calculated? globalBusWidth += ((desiredTileSizeCM)+(desiredTileSizeCM)/paramnumColMuxed)*numTileEachLayer[0][i]*numTileEachLayer[1][i];

    More specifically, my problem is with this part : (desiredTileSizeCM)/paramnumColMuxed)*numTileEachLayer[0][i]*numTileEachLayer[1][i] @neurosim

Jul 21, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

Dec 3, 2022
ffcnn is a cnn neural network inference framework, written in 600 lines C language.

+----------------------------+ ffcnn 卷积神经网络前向推理库 +----------------------------+ ffcnn 是一个 c 语言编写的卷积神经网络前向推理库 只用了 500 多行代码就实现了完整的 yolov3、yolo-fastes

Oct 4, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Dec 3, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Nov 30, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

Nov 2, 2022
oneAPI Deep Neural Network Library (oneDNN)

oneAPI Deep Neural Network Library (oneDNN) This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-

Dec 4, 2022
Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution
Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution

DeepC: Implementing Deep Convolutional Neural Network in C without External Libraries for YUV video Super-Resolution This code uses FSRCNN algorithm t

Nov 28, 2022
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

Nov 29, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Nov 25, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Sep 28, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Nov 27, 2022
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

Nov 24, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs
 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Nov 21, 2022
Simple inference deep head pose ncnn version
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Nov 17, 2022
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

Nov 16, 2022
A GPU (CUDA) based Artificial Neural Network library
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Sep 27, 2022
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Aug 24, 2022