Deep Learning in C Programming Language. Provides an easy way to create and train ANNs.

matrix

cDNN is a Deep Learning Library written in C Programming Language. cDNN provides functions that can be used to create Artificial Neural Networks (ANN). These functions are designed to be as efficient as possible both in performance and memory.

Features

  1. cDNN provides simple functions for creating ANNs.
  2. These functions are designed to achieve maximum performance possible on a cpu.
  3. At the core, the matrix library provides basic matrix-matrix operations which are required to implement neural networks.
  4. These matrix-matrix operations are very efficient and is as fast as some of the popular scientific computing libraries like Numpy.
  5. cDNN uses a Static Computation Graphs (DAGs) to wireup your ANNs and to perform gradient calculations.
  6. The library also provides helper functions that can be used by the user to save models, print graphs and so on.

Documentation

The documentation of cDNN is available here. I have tried to document it as extensively as possible. Feel free to modify or correct it.

More about cDNN

1. Matrix Library

This is the heart of the entire library. It provides basic matrix-matrix operations required for the basic functioning of neural networks. These functions are designed to be as efficient as possible without performance tradeoffs.

Before we go deeper on how these matrix-matrix operations occur, we need to take a look at how a matrix is created and organized in memory.

typedef struct array{
  float * matrix;
  int shape[2];
}dARRAY;

The above structure data type is used to create matrices. float * matrix stores the elements of the matrix and int shape[2] stores the shape or the order of the matrix.

The elements of a matrix are stored in a RowMajor fashion in memory.

Consider the following matrix :

matrix

The matrix has a shape (3,3). The elements of the matrix would be stored in memory as follows :

matrix

float * matrix stores the above array and int shape[2] = {3,3}. The shape of the matrix helps us to know the dimensions of the matrix and helps to perform matrix-matrix operations accordingly.

The main advantage of this type of matrix organization is that it eliminates the use of float ** matrix to store a 2D matrix. Operations would be very slow if we used float ** matrix due to double lookup table in memory.

Access of elements of a matrix in RowMajor order can be done using the following way

matrix[i*A->shape[1]+j]); ">
int dims[] = {3,3};
dARRAY * A = ones(dims); //creates a (3,3) matrix of ones
...
//printing elements of matrix
for(i=0;ishape[0];i++)
  for(j=0;jshape[1];j++)
    printf("%f",A->matrix[i*A->shape[1]+j]);

A->matrix[i*A->shape[1]+j] allows us to access each element in the matrix.

Now comming to the main topic, the matrix-matrix operations are performed in two ways :

  1. Using efficient BLAS operations.
  2. Using parallelized loop operations.

Additional details are available in documentation.

The things discussed above help us to create neural networks and perform gradient calculations.

2. Static Computation Graphs

cDNN uses a static computation graph to wireup your neural networks. Popular deep learning libraries like PyTorch, Tensorflow, Caffe ... go even further and use dynamic computation graphs. Dynamic graphs are difficult to implement hence, we will only use static graphs in this library.

Fun fact, Tensorflow 1.0 used Static Computation Graphs. Tensorflow 2.0 introduced Dynamic Computation Graphs.

3. Performance

cDNN is as fast as Numpy or even faster than Numpy in some cases. This makes model training so much quicker and helps you iterate over models very quickly.

Major performance boost comes from implementing certain matrx-matrix functions like matrix multiplication in fortran. cDNN replies upon BLAS provided by OpenBLAS to perform certain operations in a highly efficient way.

cDNN also uses automatic thread calculations and executes matrix operations in parallel that don't use BLAS to achieve parallelization, cDNN relies on OpenMP to aid in thread creation process and other thread issues like synchronization.

Installation

Requirements,

  1. gcc
  2. ncurses
  3. Openblas
  4. OpenMP

Installing the Dependencies

On Linux,

$ sudo apt-get install gcc
$ sudo apt-get install gfortran  #important don't miss this!
$ sudo apt-get install libomp-dev
$ sudo apt-get install libncurses-dev

# Downloading OpenBlas from Source. There's no other way to install OpenBlas in how cDNN wants it.

$ git clone https://github.com/xianyi/OpenBLAS.git
$ cd OpenBLAS
$ sudo make && sudo make install #This will take a while depending on your system speed. You may see some warnings along the way. Don't worry about it.

On macOS,

$ brew install ncurses
$ brew install gfortran
$ git clone https://github.com/xianyi/OpenBLAS.git
$ cd OpenBLAS
$ sudo make && sudo make install

Installing OpenBLAS from their source will take a while. If you run into any errors while installing OpenBLAS, please refer to their User Manual.

Building cDNN

After installing the dependecies, execute the following in terminal.

$ git clone https://github.com/iVishalr/cDNN.git
$ cd cDNN
$ sudo make && sudo make install

This will create a shared library in your system which allows you to use the fuctions in cDNN anywhere. You do not need to have the source code with you after the shared library has been created and stored in system.

$ sudo make && sudo make install will create a shared library according to the platform you are using and will place the library in /usr/local/lib and the include header files will be placed in /usr/local/include.

Note : Please do not change anything in Makefile of cDNN as you will be installing in the standard directories where other shared libraries like libc.so and so on will be present. You may risk modifying/deleting other libraries in your system if you change things in Makefile.

I know its a lot of work, but there's no way around it.

Compiling

To compile a test.c file that uses cDNN, please type the following in terminal

On Linux,

$ export OPENBLAS_NUM_THREADS=2
$ gcc -I /usr/include/ -I /opt/OpenBLAS/include/ test.c -lcdnn -lgomp -lncurses -lopenblas -L /usr/lib/ -L /opt/OpenBLAS/lib/ -lm

Please keep the above LDFLAGS (-lcdnn,-lopenblas, ....) in the same order. Otherwise test.c won't compile.

On macOS,

$ gcc test.c -lcdnn -lopenblas -lncurses -I /usr/local/include/ -L /usr/local/lib/ -I /opt/OpenBLAS/include/ -L /opt/OpenBLAS/lib/

Since the shared library depends on OpenBLAS's implementation of cblas.h, you are requried to include its header files as well as its shared library.

To run the program, execute ./a.out or ./

Examples

/*
File : test.c
Author : Vishal R
Email ID : [email protected] or [email protected]
Abstract : Implements a 5 layer neural network using cDNN
*/

#include <cdnn.h>

int main(){

  Create_Model();

  int x_train_dims[] = {12288,100};
  int y_train_dims[] = {2,100};
  int x_cv_dims[] = {12288,100};
  int y_cv_dims[] = {2,100};
  int x_test_dims[] = {12288,100};
  int y_test_dims[] = {2,100};

  dARRAY * x_train = load_x_train("./data/X_train.t7",x_train_dims);
  dARRAY * y_train = load_y_train("./data/y_train.t7",y_train_dims);

  dARRAY * x_cv = load_x_cv("./data/X_cv.t7",x_cv_dims);
  dARRAY * y_cv = load_y_cv("./data/y_cv.t7",y_cv_dims);

  dARRAY * x_test = load_x_test("./data/X_test.t7",x_test_dims);
  dARRAY * y_test = load_y_test("./data/y_test.t7",y_test_dims);

  Input(.layer_size=12288);
  Dense(.layer_size=64,.activation="relu",.initializer="he",.layer_type="hidden");
  Dense(.layer_size=32,.activation="relu",.initializer="he",.layer_type="hidden",.dropout=0.7);
  Dense(.layer_size=32,.activation="relu",.layer_type="hidden",.dropout=0.5);
  Dense(.layer_size=16,.activation="relu",.layer_type="hidden");
  Dense(.layer_size=2,.activation="softmax",.initializer="random",.layer_type="output");
  Model(.X_train=x_train,.y_train=y_train,.X_cv=x_cv,.y_cv=y_cv,.X_test=x_test,.y_test=y_test,\
        .epochs=1000,.lr=3.67e-5,.optimizer="adam",.checkpoint_every=-1,.batch_size=32);

  Fit();
  Test();
  Save_Model("./DOGS_VS_CATS.t7");

  int img_dims[] = {12288,1};
  dARRAY * test_img1 = load_image("./test_img1.data",img_dims);
  dARRAY * test_img2 = load_image("./test_img2.data",img_dims);

  dARRAY * prediction1 = Predict(test_img1,1);
  dARRAY * prediction2 = Predict(test_img2,1);

  free2d(test_img1);
  free2d(test_img2);
  free2d(prediction1);
  free2d(prediction2);

  Destroy_Model();
}

Above file shows how to create a 5 layer neural network using cDNN library.

Additional examples are available in the Examples folder.

Contributions

If you like this library and would like to make it better, you are free to do so. It takes a team effort to make things better. Hence I would love to have you on board.

Avoid making commits directly to main branch. Create your own branch and make a pull request. After your pull request is approved, the changes you have made would be merged with the main code.

License

cDNN has a MIT-style license, as found in LICENSE file.

Owner
Vishal R
Computer Science Student at PES University.
Vishal R
Similar Resources

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

Nov 30, 2022

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

Dec 3, 2022

Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

Dec 4, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Dec 2, 2022

header only, dependency-free deep learning framework in C++14

header only, dependency-free deep learning framework in C++14

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us k

Nov 30, 2022

LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Nov 28, 2022

Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Nov 26, 2022

Forward - A library for high performance deep learning inference on NVIDIA GPUs

 Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Mar 17, 2021

A library for high performance deep learning inference on NVIDIA GPUs.

A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Nov 21, 2022
tutorial on how to train deep learning models with c++ and dlib.

Dlib Deep Learning tutorial on how to train deep learning models with c++ and dlib. usage git clone https://github.com/davisking/dlib.git mkdir build

Dec 21, 2021
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Nov 30, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Dec 5, 2022
Use c++ train yolov5

LibTorch-YoLoV5-train-detection use c++ train yolov5 Just for learning!!! Just for learning!!! Just for learning!!! Dependencies Windows10 Qt5.13.2(ki

May 19, 2022
Deploying Deep Learning Models in C++: BERT Language Model
 Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

Nov 14, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Nov 27, 2022
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

PSTensor : Custimized a Tensor Data Structure Compatible with PyTorch and TensorFlow. You may need this software in the following cases. Manage memory

Feb 12, 2022
Jul 21, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

Nov 28, 2022