mlpack: a scalable C++ machine learning library --

mlpack: a fast, flexible machine learning library
a fast, flexible machine learning library

Home | Documentation | Doxygen | Community | Help | IRC Chat

Jenkins Coveralls License NumFOCUS

Download: current stable version (3.4.2)

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.

mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.


0. Contents

  1. Introduction
  2. Citation details
  3. Dependencies
  4. Building mlpack from source
  5. Running mlpack programs
  6. Using mlpack from Python
  7. Further documentation
  8. Bug reporting

1. Introduction

The mlpack website can be found at https://www.mlpack.org and it contains numerous tutorials and extensive documentation. This README serves as a guide for what mlpack is, how to install it, how to run it, and where to find more documentation. The website should be consulted for further information:

2. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2018,
    title     = {mlpack 3: a fast, flexible machine learning library},
    author    = {Curtin, Ryan R. and Edel, Marcus and Lozhnikov, Mikhail and
                 Mentekidis, Yannis and Ghaisas, Sumedh and Zhang,
                 Shangtong},
    journal   = {Journal of Open Source Software},
    volume    = {3},
    issue     = {26},
    pages     = {726},
    year      = {2018},
    doi       = {10.21105/joss.00726},
    url       = {https://doi.org/10.21105/joss.00726}
}

Citations are beneficial for the growth and improvement of mlpack.

3. Dependencies

mlpack has the following dependencies:

  Armadillo      >= 8.400.0
  Boost (math_c99, spirit) >= 1.58.0
  CMake          >= 3.2.2
  ensmallen      >= 2.10.0
  cereal         >= 1.1.2

All of those should be available in your distribution's package manager. If not, you will have to compile each of them by hand. See the documentation for each of those packages for more information.

If you would like to use or build the mlpack Python bindings, make sure that the following Python packages are installed:

  setuptools
  cython >= 0.24
  numpy
  pandas >= 0.15.0

If you would like to build the Julia bindings, make sure that Julia >= 1.3.0 is installed.

If you would like to build the Go bindings, make sure that Go >= 1.11.0 is installed with this package:

 Gonum

If you would like to build the R bindings, make sure that R >= 4.0 is installed with these R packages.

 Rcpp >= 0.12.12
 RcppArmadillo >= 0.8.400.0
 RcppEnsmallen >= 0.2.10.0
 BH >= 1.58
 roxygen2

If the STB library headers are available, image loading support will be compiled.

If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

4. Building mlpack from source

This document discusses how to build mlpack from source. These build directions will work for any Linux-like shell environment (for example Ubuntu, macOS, FreeBSD etc). However, mlpack is in the repositories of many Linux distributions and so it may be easier to use the package manager for your system. For example, on Ubuntu, you can install the mlpack library and command-line executables (e.g. mlpack_pca, mlpack_kmeans etc.) with the following command:

$ sudo apt-get install libmlpack-dev mlpack-bin

On Fedora or Red Hat (EPEL): $ sudo dnf install mlpack-devel mlpack-bin

Note: Older Ubuntu versions may not have the most recent version of mlpack available---for instance, at the time of this writing, Ubuntu 16.04 only has mlpack 3.4.2 available. Options include upgrading your Ubuntu version, finding a PPA or other non-official sources, or installing with a manual build.

There are some useful pages to consult in addition to this section:

mlpack uses CMake as a build system and allows several flexible build configuration options. You can consult any of the CMake tutorials for further documentation, but this tutorial should be enough to get mlpack built and installed.

First, unpack the mlpack source and change into the unpacked directory. Here we use mlpack-x.y.z where x.y.z is the version.

$ tar -xzf mlpack-x.y.z.tar.gz
$ cd mlpack-x.y.z

Then, make a build directory. The directory can have any name, but 'build' is sufficient.

$ mkdir build
$ cd build

The next step is to run CMake to configure the project. Running CMake is the equivalent to running ./configure with autotools. If you run CMake with no options, it will configure the project to build with no debugging symbols and no profiling information:

$ cmake ../

Options can be specified to compile with debugging information and profiling information:

$ cmake -D DEBUG=ON -D PROFILE=ON ../

Options are specified with the -D flag. The allowed options include:

DEBUG=(ON/OFF): compile with debugging symbols
PROFILE=(ON/OFF): compile with profiling symbols
ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols
BOOST_ROOT=(/path/to/boost/): path to root of boost installation
ARMADILLO_INCLUDE_DIR=(/path/to/armadillo/include/): path to Armadillo headers
ARMADILLO_LIBRARY=(/path/to/armadillo/libarmadillo.so): Armadillo library
BUILD_CLI_EXECUTABLES=(ON/OFF): whether or not to build command-line programs
BUILD_PYTHON_BINDINGS=(ON/OFF): whether or not to build Python bindings
PYTHON_EXECUTABLE=(/path/to/python_version): Path to specific Python executable
PYTHON_INSTALL_PREFIX=(/path/to/python/): Path to root of Python installation
BUILD_JULIA_BINDINGS=(ON/OFF): whether or not to build Julia bindings
JULIA_EXECUTABLE=(/path/to/julia): Path to specific Julia executable
BUILD_GO_BINDINGS=(ON/OFF): whether or not to build Go bindings
GO_EXECUTABLE=(/path/to/go): Path to specific Go executable
BUILD_GO_SHLIB=(ON/OFF): whether or not to build shared libraries required by Go bindings
BUILD_R_BINDINGS=(ON/OFF): whether or not to build R bindings
R_EXECUTABLE=(/path/to/R): Path to specific R executable
BUILD_TESTS=(ON/OFF): whether or not to build tests
BUILD_SHARED_LIBS=(ON/OFF): compile shared libraries as opposed to
   static libraries
DISABLE_DOWNLOADS=(ON/OFF): whether to disable all downloads during build
DOWNLOAD_ENSMALLEN=(ON/OFF): If ensmallen is not found, download it
ENSMALLEN_INCLUDE_DIR=(/path/to/ensmallen/include): path to include directory
   for ensmallen
DOWNLOAD_STB_IMAGE=(ON/OFF): If STB is not found, download it
STB_IMAGE_INCLUDE_DIR=(/path/to/stb/include): path to include directory for
   STB image library
USE_OPENMP=(ON/OFF): whether or not to use OpenMP if available
BUILD_DOCS=(ON/OFF): build Doxygen documentation, if Doxygen is available
   (default ON)

Other tools can also be used to configure CMake, but those are not documented here. See this section of the build guide for more details, including a full list of options, and their default values.

By default, command-line programs will be built, and if the Python dependencies (Cython, setuptools, numpy, pandas) are available, then Python bindings will also be built. OpenMP will be used for parallelization when possible by default.

Once CMake is configured, building the library is as simple as typing 'make'. This will build all library components as well as 'mlpack_test'.

$ make

If you do not want to build everything in the library, individual components of the build can be specified:

$ make mlpack_pca mlpack_knn mlpack_kfn

If the build fails and you cannot figure out why, register an account on Github and submit an issue. The mlpack developers will quickly help you figure it out:

mlpack on Github

Alternately, mlpack help can be found in IRC at #mlpack on chat.freenode.net.

If you wish to install mlpack to /usr/local/include/mlpack/, /usr/local/lib/, and /usr/local/bin/, make sure you have root privileges (or write permissions to those three directories), and simply type

$ make install

You can now run the executables by name; you can link against mlpack with -lmlpack and the mlpack headers are found in /usr/local/include/mlpack/ and if Python bindings were built, you can access them with the mlpack package in Python.

If running the programs (i.e. $ mlpack_knn -h) gives an error of the form

error while loading shared libraries: libmlpack.so.2: cannot open shared object file: No such file or directory

then be sure that the runtime linker is searching the directory where libmlpack.so was installed (probably /usr/local/lib/ unless you set it manually). One way to do this, on Linux, is to ensure that the LD_LIBRARY_PATH environment variable has the directory that contains libmlpack.so. Using bash, this can be set easily:

export LD_LIBRARY_PATH="/usr/local/lib/:$LD_LIBRARY_PATH"

(or whatever directory libmlpack.so is installed in.)

5. Running mlpack programs

After building mlpack, the executables will reside in build/bin/. You can call them from there, or you can install the library and (depending on system settings) they should be added to your PATH and you can call them directly. The documentation below assumes the executables are in your PATH.

Consider the 'mlpack_knn' program, which finds the k nearest neighbors in a reference dataset of all the points in a query set. That is, we have a query and a reference dataset. For each point in the query dataset, we wish to know the k points in the reference dataset which are closest to the given query point.

Alternately, if the query and reference datasets are the same, the problem can be stated more simply: for each point in the dataset, we wish to know the k nearest points to that point.

Each mlpack program has extensive help documentation which details what the method does, what each of the parameters is, and how to use them:

$ mlpack_knn --help

Running mlpack_knn on one dataset (that is, the query and reference datasets are the same) and finding the 5 nearest neighbors is very simple:

$ mlpack_knn -r dataset.csv -n neighbors_out.csv -d distances_out.csv -k 5 -v

The -v (--verbose) flag is optional; it gives informational output. It is not unique to mlpack_knn but is available in all mlpack programs. Verbose output also gives timing output at the end of the program, which can be very useful.

6. Using mlpack from Python

If mlpack is installed to the system, then the mlpack Python bindings should be automatically in your PYTHONPATH, and importing mlpack functionality into Python should be very simple:

>>> from mlpack import knn

Accessing help is easy:

>>> help(knn)

The API is similar to the command-line programs. So, running knn() (k-nearest-neighbor search) on the numpy matrix dataset and finding the 5 nearest neighbors is very simple:

>>> output = knn(reference=dataset, k=5, verbose=True)

This will store the output neighbors in output['neighbors'] and the output distances in output['distances']. Other mlpack bindings function similarly, and the input/output parameters exactly match those of the command-line programs.

7. Further documentation

The documentation given here is only a fraction of the available documentation for mlpack. If doxygen is installed, you can type make doc to build the documentation locally. Alternately, up-to-date documentation is available for older versions of mlpack:

8. Bug reporting

(see also mlpack help)

If you find a bug in mlpack or have any problems, numerous routes are available for help.

Github is used for bug tracking, and can be found at https://github.com/mlpack/mlpack/. It is easy to register an account and file a bug there, and the mlpack development team will try to quickly resolve your issue.

In addition, mailing lists are available. The mlpack discussion list is available at

mlpack discussion list

and the git commit list is available at

commit list

Lastly, the IRC channel #mlpack on Freenode can be used to get help.

Owner
mlpack
a scalable C++ machine learning library
mlpack
Comments
  • [GSoC] Augmented RNN models - benchmarking framework

    [GSoC] Augmented RNN models - benchmarking framework

    This PR is part of my GSoC project "Augmented RNNs". Imeplemented:

    • class CopyTask for evaluating models on the sequence copy problem, showcasing benchmarking framework;
    • unit test for it (a simple non-ML model that is hardcoded to copy the sequence required number of times is expected to ace the CopyTask).
  • Swap boost::variant with vtable.

    Swap boost::variant with vtable.

    I updated the abstract class and also update the Linear layer as an example, there are various layer we have to update, so if anybody likes to work on some of the layers I listed below, comment on the PR. Unfortunately I can't enable commit permission to a specific branch. So to get the changes in you you can just fork the repository as usual create a new feature branch and do the changes, but instead of opening another PR, just post the link to the branch here and I cherry-pick the commit.

    Steps:

    1. Inherit the Layer class, each layer should implement the necessary functions that are relevant for the layer-specific computations and inherent the rest from the base class.
    2. Rename InputDataType to InputType and OutputDataType to OutputType, to make the interface more consistent with the rest of the codebase, rename the type for the input and output data.
    3. Use InputType and OutputType instead of arma::mat or arma::Mat<eT>, to make the layer work with the abstract class we have to follow the interface accordingly.
    4. Provide default layer type to hide some of the template functionalities that could be confusing for users that aren’t familiar with templates. So instead of using Linear<> all the time, a user can just use Linear. This is a result of https://github.com/mlpack/mlpack/issues/2524#issuecomment-664776530.
    5. Update the tests to use the updated interface.

    Example: For an example checkout the Linear layer.

    Here is a list of layers we have to update:

    • [x] adaptive_max_pooling.hpp - @Aakash-kaushik
    • [x] adaptive_mean_pooling.hpp - @Aakash-kaushik
    • [x] add.hpp - @Aakash-kaushik
    • [x] add_merge.hpp - @Aakash-kaushik
    • [x] alpha_dropout.hpp - @Aakash-kaushik
    • [x] atrous_convolution.hpp - @Aakash-kaushik
    • [x] batch_norm.hpp - @Aakash-kaushik
    • [x] base_layer.hpp - @mrityunjay-tripathi
    • [x] bilinear_interpolation.hpp - @mrityunjay-tripathi
    • [x] c_relu.hpp - @zoq
    • [x] celu.hpp - @zoq
    • [x] concat.hpp - @mrityunjay-tripathi
    • [ ] concat_performance.hpp - @hello-fri-end
    • [x] concatenate.hpp - @mrityunjay-tripathi
    • [x] constant.hpp - @zoq
    • [x] convolution.hpp - @mrityunjay-tripathi
    • [x] dropconnect.hpp - @zoq
    • [x] dropout.hpp - @zoq
    • [x] elu.hpp - @zoq
    • [x] fast_lstm.hpp - @mrityunjay-tripathi
    • [x] flexible_relu.hpp - @zoq
    • [x] glimpse.hpp - @mrityunjay-tripathi
    • [ ] gru.hpp - @zoq
    • [x] hard_tanh.hpp - @zoq
    • [x] hardshrink.hpp - @zoq
    • [x] highway.hpp - @mrityunjay-tripathi
    • [x] join.hpp - @mrityunjay-tripathi
    • [x] layer_norm.hpp - @mrityunjay-tripathi
    • [x] leaky_relu.hpp - @zoq
    • [x] linear.hpp - @zoq
    • [x] linear3d.hpp - @mrityunjay-tripathi
    • [x] linear_no_bias.hpp - @zoq
    • [x] log_softmax.hpp - @zoq
    • [x] lookup.hpp - @mrityunjay-tripathi
    • [ ] lstm.hpp - @zoq
    • [x] max_pooling.hpp - @mrityunjay-tripathi
    • [x] mean_pooling.hpp - @mrityunjay-tripathi
    • [ ] minibatch_discrimination.hpp - @hello-fri-end
    • [x] multihead_attention.hpp - @mrityunjay-tripathi
    • [x] multiply_constant.hpp - @zoq
    • [x] multiply_merge.hpp - @mrityunjay-tripathi
    • [x] noisylinear.hpp - @zoq
    • [x] padding.hpp - @mrityunjay-tripathi
    • [x] parametric_relu.hpp - @zoq
    • [x] positional_encoding.hpp - @mrityunjay-tripathi
    • [x] radial_basis_function.hpp - @hello-fri-end
    • [ ] recurrent.hpp - @kaushal07wick
    • [ ] recurrent_attention.hpp - @kaushal07wick
    • [x] reinforce_normal.hpp - @mrityunjay-tripathi
    • [x] reparametrization.hpp - @mrityunjay-tripathi
    • [x] select.hpp - @mrityunjay-tripathi
    • [x] sequential.hpp - @mrityunjay-tripathi
    • [x] softmax.hpp - @zoq
    • [x] softmin.hpp - @zoq
    • [x] softshrink.hpp - @zoq
    • [x] spatial_dropout.hpp - @zoq
    • [x] subview.hpp - @mrityunjay-tripathi
    • [x] transposed_convolution.hpp - @mrityunjay-tripathi
    • [x] virtual_batch_norm.hpp - @mrityunjay-tripathi
    • [x] vr_class_reward.hpp - @mrityunjay-tripathi
    • [x] weight_norm.hpp - @mrityunjay-tripathi

    I left the base layer since I'm not sure yet if it makes sense to implement them as an independent class.


    Building upon the work from @Aakash-kaushik we can get a first impression of the advantage of using boost::visitor compared with a virtual inheritance approach (#2647)

    Note we stripped basically everything out, except the FNN class, linear layer, FlexibleReLU layer, LogSoftMax layer; which enables us to get a first impression about what timings we can expect from a virtual inheritance approach.

    I tested two scenarios, but used the same network for each:

    FFN<> model;
    model.Add<Linear<> >(trainData.n_rows, 128);
    model.Add<FlexibleReLU<> >();
    model.Add<Linear<> >(128, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 512);
    model.Add<Linear<> >(512, 2048);
    model.Add<Linear<> >(2048, 512);
    model.Add<Linear<> >(512, 8);
    model.Add<Linear<> >(8, 3);
    model.Add<LogSoftMax<> >();
    

    Scenario - 1

    batch-size: 1 iterations: 10000 trials: 10

    vtable - DEBUG=ON

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 494.485s
    elapsed time: 503.777s
    elapsed time: 496.802s
    elapsed time: 499.928s
    elapsed time: 502.504s
    elapsed time: 495.735s
    elapsed time: 495.745s
    elapsed time: 505.284s
    elapsed time: 495.32s
    elapsed time: 495.209s
    --------------------------------------
    elapsed time averaged(10): 498.479s
    

    boost::variant - DEBUG=ON

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 496.419s
    elapsed time: 495.27s
    elapsed time: 494.769s
    elapsed time: 494.922s
    elapsed time: 497.729s
    elapsed time: 497.464s
    elapsed time: 498.024s
    elapsed time: 501.722s
    elapsed time: 500.59s
    elapsed time: 497.925s
    --------------------------------------                                                                                                                                                                                                                                                                                       
    elapsed time averaged (10): 497.483s   
    

    vtable - DEBUG=OFF

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 199.713s
    elapsed time: 205.177s
    elapsed time: 200.135s
    elapsed time: 200.179s
    elapsed time: 205.792s
    elapsed time: 198.293s
    elapsed time: 198.535s
    elapsed time: 206.635s
    elapsed time: 198.263s
    elapsed time: 198.521s
    --------------------------------------
    elapsed time averaged(10): 201.124s
    

    boost::variant - DEBUG=OFF

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 198.645s
    elapsed time: 194.854s
    elapsed time: 194.748s
    elapsed time: 194.983s
    elapsed time: 197.42s
    elapsed time: 196.864s
    elapsed time: 197.454s
    elapsed time: 204.318s
    elapsed time: 201.076s
    elapsed time: 200.549s
    --------------------------------------
    elapsed time averaged (10): 198.091s
    

    Scenario - 2

    batch-size: 32 iterations: 10000 trials: 10

    vtable - DEBUG=ON

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 70.4116s
    elapsed time: 70.5631s
    elapsed time: 70.682s
    elapsed time: 70.5635s
    elapsed time: 71.2245s
    elapsed time: 71.1649s
    elapsed time: 71.4714s
    elapsed time: 71.2688s
    elapsed time: 71.3348s
    elapsed time: 71.3406s
    --------------------------------------
    elapsed time averaged(10): 71.0025s
    

    boost::variant - DEBUG=ON

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 70.3247s
    elapsed time: 70.5059s
    elapsed time: 70.5368s
    elapsed time: 70.5208s
    elapsed time: 70.4539s
    elapsed time: 70.788s
    elapsed time: 70.7692s
    elapsed time: 70.9473s
    elapsed time: 70.9146s
    elapsed time: 70.7278s
    --------------------------------------
    elapsed time averaged (10): 70.6489s
    

    vtable - DEBUG=OFF

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 59.7968s
    elapsed time: 59.4626s
    elapsed time: 59.9147s
    elapsed time: 59.9682s
    elapsed time: 60.5511s
    elapsed time: 60.2109s
    elapsed time: 60.7782s
    elapsed time: 60.4981s
    elapsed time: 60.719s
    elapsed time: 60.7632s
    --------------------------------------
    elapsed time averaged(10): 60.2663s
    

    boost::variant - DEBUG=OFF

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 60.8466s
    elapsed time: 61.0629s
    elapsed time: 61.1269s
    elapsed time: 60.7426s
    elapsed time: 60.8178s
    elapsed time: 60.7287s
    elapsed time: 60.864s
    elapsed time: 60.8982s
    elapsed time: 60.9232s
    elapsed time: 60.8519s
    --------------------------------------
    elapsed time averaged (10): 60.8863s
    

    Looking at the timings, boost::variant doesn't provide the speedup I thought it would, on top of that the little speedup we would gain with boost::variant is marginal in comparison to the actual calculation.

  • Adding All Loss Functions

    Adding All Loss Functions

    Hello, I was going through loss functions and managed to get a list of loss functions that aren't implemented yet. I found these using pytorch and tensor flow kindly refer for more informations. The list goes as:

    1. HingeEmbedding Loss (taken by me)
    2. CosineEmbedding Loss (taken up by @kartikdutt18)
    3. MultiLabelMargin Loss
    4. TripletMargin Loss
    5. L1 Loss
    6. BCE Loss

    This might not be complete list. I will update this list as I find more. I hope this is ok with the community. Kindly feel free to take up any of the idle loss functions here. Thank You. :)

  • Addition of all Activation Functions.

    Addition of all Activation Functions.

    Hi everyone, I have compiled a list of all activation functions that currently not implemented in mlpack but have can be found in either tensor flow or pytorch.

    1. ~~SELU~~
    2. CELU
    3. GELU (Currently taken up by @himanshupathak21061998 )
    4. Hard shrink
    5. Lisht ( I have currently taken up this issue)
    6. Soft shrink (Currently taken up by @ojhalakshya)
    7. ISRU (Inverse Square Root Unit)
    8. Inverse Square Root Linear.
    9. Square Non Linearity,

    I might have missed some functions, feel free to add them to list. If any one would like to taken up the above functions, please feel free to do so. I hope this is okay with members of the organisation, This was done in order to reduce effort in finding unimplemented functions as well as bring all add State of art activation functions to mlpack. In case I missed something or added an activation that has already been implemented, please forgive me. Thanks.

  • ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    -- The C compiler identification is GNU 4.8.1 -- The CXX compiler identification is GNU 4.8.1 -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Checking for C++11 compiler -- Checking for C++11 compiler - available -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include
    CMake Error at CMake/FindArmadillo.cmake:327 (message): ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found! Cannot determine what to link against. Call Stack (most recent call first): CMakeLists.txt:113 (find_package)

    how can I solve this problem? thanks a lot.

  • Implementation of SPSA optimizer

    Implementation of SPSA optimizer

    As of now, I have just created the basic files necessary to implement the optimizer for the sake of creating the PR... I'll push the code in the subsequent commits :v:

  • Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    For background knowledge, look at these

    Sample code to use the feature

    #include <iostream>
    #include <mlpack/core.hpp>
    
    int main()
    {
      arma::Mat<double> data;
      std::fstream file;
      
      file.open("data.csv");
      mlpack::data::load_data<double>(data, arma::csv_ascii, file);
      data.raw_print();
      
      return 0;  
    }
    
  • Addition of Essential Metrics Only.

    Addition of Essential Metrics Only.

    This is a good first issue and will help new contributors to get familiar with the codebase. Also This issue doesn't aim to add all Metrics to mlpack since each metric would have to be maintained, this aims to add metrics that either I find essential (or have used a couple of time) or those metrics which are very common. List of metrics that can be added include:

    1. IoU and meanIoU (Taken up by me)
    2. SSIM (Useful when you augment data and need to ensure that you don't augment it to an extent such that it becomes irrelevant. I used this in medical scans where there was heavy bias so I used as metric to find right augmentation parameters to perform oversampling [set augmentation parameters s.t. (average SSIM) > threshold] to automate process a bit). I think @ojhalakshya is working on it.

    Other interesting metrics would:

    1. r metric
    2. Top K Accuracy metric
    3. ~RMSE (Already implemented)~
    4. [Maybe, Not really sure about this.] Sparse Top K Accuracy

    In case some of these are implemented, please forgive my ignorance. Also anyone who starts working on them please check the following:

    1. Has it been implemented.
    2. Is there a PR open for this.
    3. Is this taken up by some one.

    This is especially necessary for functions like RMSE, r metric. Sorry for increasing workload of members and contributors, I think at least some of them will be nice additions. Thanks.

  • Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Hi @rcurtin I have tired to resolve some of the comments in PR#1492 and also add Markdown Documentation for Go Bindings.

    DONE:

    • [x] Build a fully working Go binding using make go.
    • [x] Configure CMake with cmake ../, which would find Go using FindGo.cmake.
    • [x] Add Markdown Documentation for Go Bindings.
    • [x] Resolve underscores to camelcase
    • [x] Tried to avoid unnecessary copies.
    • [x] Resolve output in arma_util.cpp , that was going out of scope.
    • [x] Removing unnecessary inputOptions and outputOptions.
    • [x] Resolve documentation for multiple outputs.
    • [x] Add Some getter and setter method for Umat,Urow and Ucol
    • [x] Add test for Umat ,Urow and Ucol
    • [x] Resolve Style issues(lines less than 80 characters) in go_binding_test.go
    • [x] Add vector of strings and int parameter type and added their tests.
    • [x] Add matrix with dataset info parameter type.
  • Algorithm yet to be implemented

    Algorithm yet to be implemented

    Hi there, I am interested in implementing an algorithm or a feature in mlpack which hasn't been implemented yet. It would be great if you could suggest any :smile:

  • Build scripts for Python bindings are not correct [Windows]

    Build scripts for Python bindings are not correct [Windows]

    Issue description

    Attempting to build python bindings on Windows using Visual Studio 2017 fails due to several issues:

    1. When using the flag BUILD_PYTHON_BINDINGS, CMake still shows a warning about not building python bindings, even though the bindings will be generated (not a roadblock).
    2. When the flag BUILD_PYTHON_BINDINGS is ON, the library will be built statically by default. I presume the python bindings require mlpack as a DLL? In that case, -DBUILD_SHARED_LIBS=ON must be enforced.
    3. Line 106 of setup.py refers to an invalid path. E.g. package_dir={ '': 'C:/mlpack/build/src/mlpack/bindings/python/' } This path ends in a slash which is not valid in a python package. What is more, I believe this path should be relative. If so, it should be replaced by: package_dir={ '': '.' },
    4. The linker expects to find mlpack and boost libraries in C:\mlpack\build\lib but this directory doesn't exist as a result of an mlpack build. Therefore, the directory needs to be manually created and populated with the following libraries: boost_serialization.lib, libboost_program_options-vc141-mt-1_65_1.lib, libboost_serialization-vc141-mt-1_65_1.lib, mlpack.dll, mlpack.lib
    5. After fixing issues 1 to 4, build will be successful. However, the resulting python package will fail to import mlpack with the following error: ImportError: cannot import name 'test_python_binding' from 'mlpack.test_python_binding' (C:\mlpack\build\src\mlpack\bindings\python\mlpack\test_python_binding.cp37-win_amd64.pyd)

    Your environment

    • version of mlpack: master branch April 19 (3.0.5)
    • operating system: windows 10 64 bits
    • compiler: MSVC 14.1
    • version of dependencies (Boost/Armadillo): boost 1.65.1, armadillo-9.300.2, OpenBLAS.0.2.14.1
    • any other environment information you think is relevant: miniconda3 (python 3.7.1)

    Steps to reproduce

    1. Clone master branch
    2. Run cmake including the flags: -DBUILD_PYTHON_BINDINGS=ON -DBUILD_SHARED_LIBS=ON
    3. Open solution with Visual Studio 2017 and build

    Expected behavior

    To successfully build python bindings AND the egg package to work (be able to import mlpack in python)

    Actual behavior

    Build failures (when workarounds are applied, the resulting package doesn't work)

  • Fix DBSCAN handling of non-core points

    Fix DBSCAN handling of non-core points

    This handles #3339. @iad-ABDUL-RAOUF, thanks for reporting the issue! If you are willing to review the changes here and see if they make sense (at least the comments for the approach), I would appreciate it. I think I have done it correctly but I may have dropped a small detail.

    The problem is that the existing DBSCAN implementation grows clusters "through" noise/non-core points (defined as points that have less than the minimum number of neighbors minPoints). This is demonstrated by the nice test case that @iad-ABDUL-RAOUF supplied. The fix essentially boils down to allowing clusters to add non-core points, but not connect two disparate clusters through a noise point.

    Our DBSCAN implementation strategy differs a good deal from the original algorithm's pseudocode and uses a union find structure to process points serially. I spent a while considering it, and to the best of my understanding our implementation will give the same result as the original algorithm, although it does look quite different.

  • Fix R build Github action

    Fix R build Github action

    I don't think we need to merge this before #3343, but this PR aims to address the issues found in the R build of that PR:

    • The Linux R CMD check build fails because rapidjson is not available. This can be addressed simply by installing libcereal-dev for that job.

    • The URL generated for documentation is invalid, if we are using git. Here we change it to https://www.mlpack.org/doc/mlpack-git/, instead of https://www.mlpack.org/doc/mlpack-<next version>/.

  • Check shape and size with respect to issue #2820

    Check shape and size with respect to issue #2820

    This is with respect to issue #2820. Adding shape and size checks for following methods and their related methods -

    1. Decision Tree
    2. GMM
    3. NCA
    4. Random Forest

    Please review and provide suggestions.

  • Fixing DBSCAN Alogrithm with issue #3339

    Fixing DBSCAN Alogrithm with issue #3339

    Implementing the concept of border points in order to fix the issue #3339. Steps performed -

    1. Forming the clusters with core points.
    2. Adding all border points to clusters.
  • DBSCAN behaviour is different from what is described in the original article.

    DBSCAN behaviour is different from what is described in the original article.

    Not sure if I should open bug issue or a question issue. Before using the DBSCAN implementation provided by mlpack, I inspected the code (on master branch) to assert it was clustering as described in the 1996 article [1]. It seems it does not cluster like DBSCAN should.

    DBSCAN In mlpack : see mlpack/methods/dbscan/dbscan_impl.hpp file. Using the UnionFind class, it forms clusters of points that can be reach step by step through step of size epsilon. THEN it checks if each cluster count more than minpts element.

    DBSCAN In the original article : See "algorithm 1" in this 2017 article [2] published by the same authors. It describes the original DBSCAN in clearer term. For each point it looks for its neighbors in an epsilon radius. BEFORE processing the next point, it checks if this neighborhood contains at least minpts element. IF NOT the cluster is not propagated from the current point.

    DBSCAN_algo

    As consequences, the original algorithm is more robust to noisy dataset. Calling this current mlpack implementation "DBSCAN" is also misleading for users expecting it to actually run the official DBSCAN version.

    To fix this issue, I would suggest looking at the scikit-learn implementation (files _dbscan.py and _dbscan_inner.pyx) : https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef5ee2a8aea80498388690e2213118efd/sklearn/cluster/_dbscan.py https://github.com/scikit-learn/scikit-learn/blob/dc580a8ef5ee2a8aea80498388690e2213118efd/sklearn/cluster/_dbscan_inner.pyx

    [1] Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." kdd. Vol. 96. No. 34. 1996. [2] Schubert, Erich, et al. "DBSCAN revisited, revisited: why and how you should (still) use DBSCAN." ACM Transactions on Database Systems (TODS) 42.3 (2017): 1-21.

  • preprocess_split() takes at least 1 positional argument (0 given)

    preprocess_split() takes at least 1 positional argument (0 given)

    Issue description

    I am trying out the random forest code in this quickstart but I am getting this error

    Your environment

    • version of mlpack: mlpack 4.0.0.post1
    • operating system: Ubuntu 22.10

    Steps to reproduce

    Install the mlpack python package using python3 install mlpack and run the random forest code in the quickstart mentioned above

    Expected behavior

    The code should have given the intended accuracy

    Actual behavior

    I am getting this error

    Traceback (most recent call last):
      File "/home/chetan/Coding/mlpack/python/doc.py", line 12, in <module>
        output = mlpack.preprocess_split(check_input_matrices=True,input=dataset, input_labels=labels, test_ratio=0.3)
      File "mlpack/preprocess_split.pyx", line 27, in mlpack.preprocess_split.preprocess_split
    TypeError: preprocess_split() takes at least 1 positional argument (0 given)
    
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Jan 7, 2023
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

Dec 17, 2022
A C++ standalone library for machine learning

Flashlight: Fast, Flexible Machine Learning in C++ Quickstart | Installation | Documentation Flashlight is a fast, flexible machine learning library w

Jan 8, 2023
Flashlight is a C++ standalone library for machine learning
Flashlight is a C++ standalone library for machine learning

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech.

Jan 8, 2023
ML++ - A library created to revitalize C++ as a machine learning front end
ML++ - A library created to revitalize C++ as a machine learning front end

ML++ Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and

Dec 31, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Dec 31, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Nov 24, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

Dec 26, 2022
Samsung Washing Machine replacing OS control unit

hacksung Samsung Washing Machine WS1702 replacing OS control unit More info at https://www.hackster.io/roni-bandini/dead-washing-machine-returns-to-li

Dec 19, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Jan 1, 2023
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Dec 14, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Jan 6, 2023
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Jan 2, 2023
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

Jan 1, 2023
oneAPI Data Analytics Library (oneDAL)
oneAPI Data Analytics Library (oneDAL)

Intel® oneAPI Data Analytics Library Installation | Documentation | Support | Examples | Samples | How to Contribute Intel® oneAPI Data Analytics Libr

Dec 30, 2022
A C library for product recommendations/suggestions using collaborative filtering (CF)

Recommender A C library for product recommendations/suggestions using collaborative filtering (CF). Recommender analyzes the feedback of some users (i

Dec 29, 2022
An open library of computer vision algorithms

VLFeat -- Vision Lab Features Library Version 0.9.21 The VLFeat open source library implements popular computer vision algorithms specialising in imag

Dec 29, 2022