Minerva: a fast and flexible tool for deep learning on multi-GPU. It provides ndarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy.

Minerva: a fast and flexible system for deep learning

Latest News

  • We've cleared quite a lot of Minerva's dependencies and made it easier to build. Basically, almost all needed are:

    ./build.sh

    Please see the wiki page for more information.

  • Minerva's Tutorial and API documents are released!

  • Minerva had migrated to dmlc, where you could find many awesome machine learning repositories!

  • Minerva now evolves to use cudnn_v2. Please download and use the new library.

  • Minerva now supports the latest version of Caffe's network configuration protobuf format. If you are using older version, error may occur. Please use the tool to upgrade the configure file.

Overview

Minerva is a fast and flexible tool for deep learning. It provides NDarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy. Please refer to the examples to see how multi-GPU setting is used.Minerva is a fast and flexible tool for deep learning. It provides NDarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy. Please refer to the examples to see how multi-GPU setting is used.

Quick try

After building and installing Minerva and Owl package (python binding) as in Install Minerva. Try run ./run_owl_shell.sh in Minerva's root directory. And enter:

>>> x = owl.ones([10, 5])
>>> y = owl.ones([10, 5])
>>> z = x + y
>>> z.to_numpy()

The result will be a 10x5 array filled by value 2. Minerva supports many numpy style ndarray operations. Please see the API document for more information.

Features

  • N-D array programming interface and easy integration with numpy

    >>> import numpy as np
    >>> x = np.array([1, 2, 3])
    >>> y = owl.from_numpy(x)
    >>> y += 1
    >>> y.to_numpy()
    array([ 2., 3., 4., ], dtype=float32)

    More is in the API cheatsheet

  • Automatically parallel execution

    >>> x = owl.zeros([256, 128])
    >>> y = owl.randn([1024, 32], 0.0, 0.01)

    The above x and y will be executed concurrently. How is this achieved?

    See Feature Highlight: Data-flow and lazy evaluation

  • Multi-GPU, multi-CPU support:

    >>> owl.set_device(gpu0)
    >>> x = owl.zeros([256, 128])
    >>> owl.set_device(gpu1)
    >>> y = owl.randn([1024, 32], 0.0, 0.01)

    The above x and y will be executed on two cards simultaneously. How is this achieved?

    See Feature Highlight: Multi GPU Training

Tutorial and Documents

  • Tutorials and high-level concepts could be found in our wiki page
  • A step-by-step walk through on MNIST example could be found here
  • We also built a tool to directly read Caffe's configure file and train. See document.
  • API documents could be found here

Performance

We will keep updating the latest performance we could achieve in this section.

Training speed

Training speed
(images/second)
AlexNet VGGNet GoogLeNet
1 card 189.63 14.37 82.47
2 cards 371.01 29.58 160.53
4 cards 632.09 50.26 309.27
  • The performance is measured on a machine with 4 GTX Titan cards.
  • On each card, we load minibatch size of 256, 24, 120 for AlexNet, VGGNet and GoogLeNet respectively. Therefore, the total minibatch size will increase as the number of cards grows (for example, training AlexNet on 4 cards will use 1024 minibatch size).

An end-to-end training

We also provide some end-to-end training codes in owl package, which could load Caffe's model file and perform training. Note that, Minerva is not the same tool as Caffe. We are not focusing on this part of logic. In fact, we implement these just to play with the Minerva's powerful and flexible programming interface (we could implement a Caffe-like network trainer in around 700~800 lines of python codes). Here is the training error with time compared with Caffe. Note that Minerva could finish GoogleNet training in less than four days with four GPU cards.

Error curve

Testing error rate

We trained several models using Minerva from scratch to show the correctness. The following table shows the error rate of different network under different testing settings.

Testing error rate AlexNet VGGNet GoogLeNet
single view top-1 41.6% 31.6% 32.7%
multi view top-1 39.7% 30.1% 31.3%
single view top-5 18.8% 11.4% 11.8%
multi view top-5 17.5% 10.8% 11.0%
  • AlexNet is trained with the solver except that we didn't use multi-group convolution.
  • GoogLeNet is trained with the quick_solver.
  • We didn't train VGGNet from scratch. We just transform the model into Minerva format and testing.

The models can be found in the following link: AlexNet GoogLeNet VGGNet

You can download the trained models and try them on your own machine using net_tester script.

Next Plan

  • Get rid of boost library dependency by using Cython. (DONE)
  • Large scale LSTM example using Minerva.
  • Easy support for user-defined new operations.

License and support

Minerva is provided in the Apache V2 open source license.

You can use the "issues" tab in github to report bugs. For non-bug issues, please send up an email at [email protected]. You can subscribe to the discussion group: https://groups.google.com/forum/#!forum/minerva-support.

Wiki

For more information on how to install, use or contribute to Minerva, please visit our wiki page: https://github.com/minerva-developers/minerva/wiki

Owner
Distributed (Deep) Machine Learning Community
A Community of Awesome Machine Learning Projects
Distributed (Deep) Machine Learning Community
Comments
  • Build for CPU only

    Build for CPU only

    Now, the minerva can build on CPU only.

    I added SigmoidForward, ReluForward, and TanhForward CPU basic implementation. For apps, the minst_mlp works fine.

    To-do:

    • other CPU implementations including conv, backward...
  • Can't build apps

    Can't build apps

    Does minerva work with CUDA 7? I installed CUDA 7 and all the samples worked well but I cannot build minerva apps with errors like undefined reference to curandGenerateNormal.

    ...
    -- cmake generator: Unix Makefiles
    -- cmake build tool: /usr/bin/make
    -- cmake build type: Release
    -- Found cuDNN (include: ~/cudnn2, library: ~/cudnn2/libcudnn.so)
    -- Found BLAS (include: /usr/include, library: /opt/openblas/lib/libcblas.so)
    -- build C++ applications              -- 1                                                                                                                                                         [9/1758]
    -- build unit tests                    -- 0
    -- build cpu-only version              -- 0
    -- build with parameter server support -- 0
    -- build with BLAS library for CPU     -- 1
    -- Build CXX Applications:
    --   mnist_cnn_2gpu
    --   mnist_mlp
    --   main
    --   mnist_cnn
    -- Configuring done
    -- Generating done
    -- Build files have been written to: ~/minerva/release
    [ 16%] Built target gflags
    [ 32%] Built target dmlc-core
    [ 32%] Built target third-party
    Linking CXX shared library ../lib/libminerva.so
    [ 91%] Built target minerva
    Linking CXX executable main
    ../lib/libminerva.so: undefined reference to `curandGenerateNormal'
    ../lib/libminerva.so: undefined reference to `curandSetPseudoRandomGeneratorSeed'
    ../lib/libminerva.so: undefined reference to `curandCreateGenerator'
    

    Here's my config.in file

    BUILD_DIR=release
    CXX=g++
    CC=gcc
    CXXFLAGS=
    CUDA_ROOT=/usr/local/cuda
    CUDNN_ROOT=/home/zer0n/cudnn2
    BUILD_TYPE=Release
    BUILD_OWL=0
    BUILD_CXX_APPS=1
    BUILD_TESTS=0
    BUILD_WITH_PS=0
    PS_ROOT=
    BUILD_CPU_ONLY=0
    BUILD_WITH_BLAS=1
    BLAS_ROOT=/opt/openblas
    

    I have successfully installed Caffe with CUDA 7 using this instruction tho.

  • Is there a functional google groups or email address for this project?

    Is there a functional google groups or email address for this project?

    The emails I've sent

    [email protected]

    bounce back.

    My question is about recurrent functions and BPTT; does the C++/cuda codebase currently have implementations?

  • mnist_cnn failed at Epoch #0

    mnist_cnn failed at Epoch #0

    Here is the error: F0310 16:38:25.787619 9513 narray.cpp:126] Check failed: lhs.Size(1) == rhs.Size(0) (512 vs. -6833920) size must match the acts[6] has size: [32845682 4 32 256 ], and minibatch_size is 256.

  • Does installation of Own module require dmlc?

    Does installation of Own module require dmlc?

    I followed the Build Owl module instruction here (https://github.com/dmlc/minerva/wiki/Install-Minerva) but the build fails

    minerva/narray/narray.h:3:26: fatal error: dmlc/logging.h: No such file or directory
     #include <dmlc/logging.h>
    
  • Difference between Minerva and MShadow

    Difference between Minerva and MShadow

    Dear all,

    I was surprised too see that Minerva is now part of the umbrella project DMLC. So I'm a little bit lost in the middle of these awesome tools. What is the difference between them ? Especially between minerva, mshadow and Cxxnet. I'm interested in implementing a distributed version of a triplet convolutional network that I have in Caffe.

  • Can't get libminervaps.a

    Can't get libminervaps.a

    I'm following the instructions to integrate minerva with parameter server. But I always fail to get libminervaps.a. I have some confusions here. I don't understand "then compile with make minerva". What does this mean, should I make under the parameter server sources or under minerva? If under parameter server directory, it says not such target. And under minerva directory, it will not give me libminervaps.a. I'm sure that I run configure with parameter enabled.

  • patch for more flexible/idiomatic use of CMake

    patch for more flexible/idiomatic use of CMake

    Hi guys, I've made a few changes that makes the Minerva build slightly easier to link with other applications that are using CMake and the same libraries (e.g., boost). I don't think I've broken anything, and I've tested it on Mac OS without CUDA and Linux with CUDA, but there will be a slightly changed build process when you run ./configure (you'll need to specify the deps/ directory as the CMAKE_PREFIX_PATH, but I've provided instructions). I'm happy to keep my changes separate of course, but I'm submitting this pull request since others might find it useful, and it will help me avoid conflicts going forward if you accept it, so I would be very happy if you incorporated it into your branch. All the best, Chris D.

  • Unittest compilation failed

    Unittest compilation failed

    When compiling unittest, I got error on my machine (GCC 4.9.2 prerelease, Arch Linux 2015.03.01, kernel 3.18.6). This is caused by following things in this commit 2e886a4d6a8e04c803dbb117ecd048bb8fe867f8:

    1. unittest_main is built as a shared library while for gtest, usually it only builds .a.
    2. The -flto flag is not working with my GCC.

    I suggest fixing them by changing back to static library for unittest and removing lto flag (or perform a corresponding check on that flag).

  • doesn't compile with cuDNN R2

    doesn't compile with cuDNN R2

    cuDNN R2 has different interface than R1. Our code does not compile with R2, giving out messages like "identifier "cudnnTensor4dDescriptor_t" is undefined". We should fix it or at least give a warning about this.

  • Minerva GPU-- free memory for a variable

    Minerva GPU-- free memory for a variable

    Hi

    Thanks for the great tool. I are trying to use minerva to run RNN on huge-size dataset. However, the gpu memory increase gradually and get crashed when it is beyond the gpu memory limit. My first question is that is there any way/function to free memory of unused variables? I have used wait_for_all, however it cannot solve the problem.

    I try to write a memory-free function by cudaFree. The function is able to free the memory of the Narray variable, however, the pointer to the Narray variable still exists. When I reuse the Narray variable for assignment operation recursively, e.g. data=owl.from_numpy(np.array([...])), I found that it works for a new array with different size, however, it get crashed when the size of new array is same as the old one that has been deleted. For example:

    //cannot work data = owl.from_numpy(np.range(10000).reshape(100,100)) owl.free_memory(data) //I write the function using cudaFree by myself data = owl.from_numpy(np.range(10000).reshape(100,100)) owl.free_memory(data) //core dump here

    //can work data = owl.from_numpy(np.arange(10000).reshape(100,100)) owl.free_memory(data) data = owl.from_numpy(np.arange(20000).reshape(200,100)) owl.free_memory(data)

    Would you please find out for me the reason? Thanks a lot

  • Binary Classifier - Log loss function

    Binary Classifier - Log loss function

    Hi, Fantastic Library! I was just wondering, i am trying to use the library for a binary classifier experiment using the log loss function to train the model. This is for a university experiment around benchmarking different models. Would you have time to provide an example of how to use the library to achieve the above goal. Also showing a visualization in how the algorithm learns and decreases the error. Many thanks, Best, Andrew

  • terminate called after throwing an instance of 'std::out_of_range'

    terminate called after throwing an instance of 'std::out_of_range'

    Here is my python codes:

    import owl, numpy a = numpy.zeros((300,400)) b = owl.from_numpy(a)

    And it just gave me such an error: terminate called after throwing an instance of 'std::out_of_range' what(): _Map_base::at Aborted (core dumped)

    help...

  • build failed on ubuntu14.04

    build failed on ubuntu14.04

    File /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp/CheckSymbolExists.c: /* */

    include <pthread.h>

    int main(int argc, char** argv) { (void)argv;

    ifndef pthread_create

    return ((int*)(&pthread_create))[argc];

    else

    (void)argc; return 0;

    endif

    }

    Determining if the function pthread_create exists in the pthreads failed with the following output: Change Dir: /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp

    Run Build Command:/usr/bin/make "cmTryCompileExec1004526741/fast" /usr/bin/make -f CMakeFiles/cmTryCompileExec1004526741.dir/build.make CMakeFiles/cmTryCompileExec1004526741.dir/build make[1]: Entering directory /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp' /usr/bin/cmake -E cmake_progress_report /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp/CMakeFiles 1 Building C object CMakeFiles/cmTryCompileExec1004526741.dir/CheckFunctionExists.c.o /usr/bin/gcc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTryCompileExec1004526741.dir/CheckFunctionExists.c.o -c /usr/share/cmake-2.8/Modules/CheckFunctionExists.c Linking C executable cmTryCompileExec1004526741 /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTryCompileExec1004526741.dir/link.txt --verbose=1 /usr/bin/gcc -DCHECK_FUNCTION_EXISTS=pthread_create CMakeFiles/cmTryCompileExec1004526741.dir/CheckFunctionExists.c.o -o cmTryCompileExec1004526741 -rdynamic -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status make[1]: *** [cmTryCompileExec1004526741] Error 1 make[1]: Leaving directory/home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp' make: *** [cmTryCompileExec1004526741/fast] Error 2

    [email protected]:~/github/DMLC/minerva$

  • none checking on owl + removal of compiled pyx files

    none checking on owl + removal of compiled pyx files

    Ran into a number of odd errors when accidentally passing None into owl functions. Turns out cython just tries to accept them and they turn into invalid memory causing segfaults. (https://github.com/cython/cython/wiki/FAQ#why-does-a-function-with-cdef-d-parameters-accept-none) This pull request tries to fix that. In addition I removed the generated cpp file for the pyx file.

    The new errors look something like:

    [23:49:49] /home/luke/Repos/minerva/minerva/system/minerva_system.cpp:86: dag engine enabled
    Traceback (most recent call last):
      File "tt.py", line 10, in <module>
        ele.relu(None)
      File "/home/luke/Repos/minerva/owl/owl/elewise.py", line 50, in relu
        return _owl.NArray.relu(x)
    TypeError: Argument 'lhs' has incorrect type (expected owl.libowl.NArray, got NoneType)
    

    The old error looks like:

    [23:51:53] /home/luke/Repos/minerva/minerva/system/minerva_system.cpp:86: dag engine enabled
    Segmentation fault (core dumped)
    

    To my knowledge there are no tests for this pyx file. In addition I cannot run the cuda code on my current machine but to my knowledge this shouldn't break anything.

  • race condition in owl imports

    race condition in owl imports

    Hello again, I think I found a race condition in importing of owl. I have a fairly simple test program that crashes maybe 1 in 3 times.

    import owl
    import owl.elewise as ele
    
    c = owl.create_cpu_device()
    owl.set_device(c)
    
    x = owl.zeros((10, 10))
    y = ele.relu(x)
    

    The error received on a bad run looks something like:

    ○ → python tt.py
    [22:55:26] /home/luke/Repos/minerva/minerva/system/minerva_system.cpp:86: dag engine enabled
    [22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:46: create new op node #1 on device #0
    [22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:149: node #1 running right after creation
    [22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:46: create new op node #3 on device #0
    [22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:176: dispatching node #1 to device #0
    [22:55:26] /home/luke/Repos/minerva/minerva/device/device.cpp:95: CPU device #0 create output for task data #0
    [22:55:26] /home/luke/Repos/minerva/minerva/device/data_store.cpp:18: create data #0 length 400
    terminate called after throwing an instance of 'dmlc::Error'
      what():  [22:55:26] minerva/common/singleton.h:13: Check failed: data_ please initialize before use
    ^[[AAborted (core dumped)
    

    Sadly I cannot get a legit stack trace as when I run it under gdb I get no failure.

    I can throw a sleep of a bit just after the import and it seems to fix the problem.

    This is under cpu, ubuntu 15.04 with the dag enabled.

    Thanks!

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

BLLIP Reranking Parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006 We request acknowledgement in any publications that

Nov 28, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Dec 4, 2022
An optimized neural network operator library for chips base on Xuantie CPU.

简介 CSI-NN2 是 T-HEAD 提供的一组针对无剑 SoC 平台的神经网络库 API。抽象了各种常用的网络层的接口,并且提供一系列已优化的二进制库。 CSI-NN2 的特性: 开源 c 代码版本的参考实现。 提供玄铁 CPU 的汇编优化实现。

Nov 20, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library,  for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Nov 27, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Nov 29, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

Sep 19, 2022
A C++ standalone library for machine learning

Flashlight: Fast, Flexible Machine Learning in C++ Quickstart | Installation | Documentation Flashlight is a fast, flexible machine learning library w

Nov 25, 2022
mlpack: a scalable C++ machine learning library --
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

Nov 27, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

Nov 28, 2022
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Aug 24, 2022
Flashlight is a C++ standalone library for machine learning
Flashlight is a C++ standalone library for machine learning

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech.

Dec 3, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Nov 25, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Nov 24, 2022
ML++ - A library created to revitalize C++ as a machine learning front end
ML++ - A library created to revitalize C++ as a machine learning front end

ML++ Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and

Nov 28, 2022
Support Vector Machines Implementation from scratch in C++

SVM C++ Samples These are sample programs of Support Vector Machines from scratch in C++. 1. Implementation Model Class Problem Decision Boundary Code

Nov 7, 2022
R2LIVE is a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses the measurement from the LiDAR, inertial sensor, visual camera to achieve robust, accurate state estimation.
R2LIVE is a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses the measurement from the LiDAR, inertial sensor, visual camera to achieve robust, accurate state estimation.

R2LIVE is a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses the measurement from the LiDAR, inertial sensor, visual camera to achieve robust, accurate state estimation.

Nov 24, 2022
This robot lcoalisation package for lidar-map based localisation using multi-sensor state estimation.
This robot lcoalisation package for lidar-map based localisation using multi-sensor state estimation.

A ROS-based NDT localizer with multi-sensor state estimation This repo is a ROS based multi-sensor robot localisation. An NDT localizer is loosely-cou

Nov 28, 2022