Flashlight is a C++ standalone library for machine learning

Flashlight: Fast, Flexible Machine Learning in C++


Quickstart | Installation | Documentation

CircleCI Documentation Status Docker Image Build Status Join the chat at https://gitter.im/flashlight-ml/community

Docker Image for CUDA backend Docker Image for CPU backend

Install CUDA backend with vcpkg Install CPU backend with vcpkg

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech. Its core features include:

  • Just-in-time kernel compilation with modern C++ with the ArrayFire tensor library.
  • CUDA and CPU backends for GPU and CPU training.
  • An emphasis on efficiency and scale.

Native support in C++ and simple extensibility makes Flashlight a powerful research framework that's hackable to its core and enables fast iteration on new experimental setups and algorithms without sacrificing performance. In a single repository, Flashlight provides apps for research across multiple domains:

Project Layout

Flashlight is broken down into a few parts:

  • flashlight/lib contains kernels and standalone utilities for sequence losses, beam search decoding, text processing, and more.
  • flashlight/fl is the core neural network library using the ArrayFire tensor library.
  • flashlight/app are applications of the core library to machine learning across domains.
  • flashlight/ext are extensions on top of Flashlight and ArrayFire that are useful across apps.

Quickstart

First, build and install Flashlight and link it to your own project.

Sequential forms a sequence of Flashlight Modules for chaining computation.

Implementing a simple convnet is easy.
#include <flashlight/fl/flashlight.h>

Sequential model;

model.add(View(af::dim4(IM_DIM, IM_DIM, 1, -1)));
model.add(Conv2D(
    1 /* input channels */,
    32 /* output channels */,
    5 /* kernel width */,
    5 /* kernel height */,
    1 /* stride x */,
    1 /* stride y */,
    PaddingMode::SAME; /* padding mode */,
    PaddingMode::SAME; /* padding mode */));
model.add(ReLU());
model.add(Pool2D(
    2 /* kernel width */,
    2 /* kernel height */,
    2 /* stride x */,
    2 /* stride y */));
model.add(Conv2D(32, 64, 5, 5, 1, 1, PaddingMode::SAME;, PaddingMode::SAME;));
model.add(ReLU());
model.add(Pool2D(2, 2, 2, 2));
model.add(View(af::dim4(7 * 7 * 64, -1)));
model.add(Linear(7 * 7 * 64, 1024));
model.add(ReLU());
model.add(Dropout(0.5));
model.add(Linear(1024, 10));
model.add(LogSoftmax());

Performing forward and backward computation is straightforwards:

auto output = model.forward(input);
auto loss = categoricalCrossEntropy(output, target);
loss.backward();

See the MNIST example for a full tutorial including a training loop and dataset abstractions.

Variable is the base Flashlight tensor that operates on ArrayFire arrays. Tape-based Automatic differentiation in Flashlight is simple and works as you'd expect.

Autograd Example
auto A = Variable(af::randu(1000, 1000), true /* calcGrad */);
auto B = 2.0 * A;
auto C = 1.0 + B;
auto D = log(C);
D.backward(); // populates A.grad() along with gradients for B, C, and D.

Building and Installing

Install with vcpkg | With Docker | From Source | From Source with vcpkg | Build Your Project with Flashlight

Requirements

At minimum, compilation requires:

  • A C++ compiler with good C++17 support (e.g. gcc/g++ >= 7)
  • CMake — version 3.10 or later, and make
  • A Linux-based operating system.

See the full dependency list for more details if building from source.

Instructions for building/installing Python bindings can be found here.

Flashlight Build Setups

Flashlight can be broken down into several components as described above. Each component can be incrementally built by specifying the correct build options.

There are two ways to work with Flashlight:

  1. As an installed library that you link to with your own project. This is best for building standalone applications dependent on Flashlight.
  2. With in-source development where the Flashlight project source is changed and rebuilt. This is best if customizing/hacking the core framework or the Flashlight-provided app binaries.

Flashlight can be built in one of two ways:

  1. With vcpkg, a C++ package manager.
  2. From source by installing dependencies as needed.

Installing Flashlight with vcpkg

Library Installation with vcpkg

Flashlight is most-easily built and installed with vcpkg. Both the CUDA and CPU backends are supported with vcpkg. For either backend, first install Intel MKL. For the CUDA backend, install CUDA >= 9.2, cuDNN, and NCCL. Then, after installing vcpkg, install the libraries and core with:

./vcpkg/vcpkg install flashlight-cuda # CUDA backend, OR
./vcpkg/vcpkg install flashlight-cpu  # CPU backend

To install Flashlight apps, check the features available for installation by running ./vcpkg search flashlight-cuda or ./vcpkg search flashlight-cpu. Each app is a "feature": for example, ./vcpkg install flashlight-cuda[asr] installs the ASR app with the CUDA backend.

Below is the currently-supported list of features (for each of flashlight-cuda and flashlight-cpu):

flashlight-{cuda/cpu}[lib]      # Flashlight libraries
flashlight-{cuda/cpu}[nn]       # Flashlight neural net library
flashlight-{cuda/cpu}[asr]      # Flashlight speech recognition app
flashlight-{cuda/cpu}[lm]       # Flashlight language modeling app
flashlight-{cuda/cpu}[imgclass] # Flashlight image classification app

Flashlight app binaries are also built for the selected features and are installed into the vcpkg install tree's tools directory.

Integrating Flashlight into your own project with is simple using vcpkg's CMake toolchain integration.

From-Source Build with vcpkg

First, install the dependencies for your backend of choice using vcpkg (click to expand the below):

Installing CUDA Backend Dependencies with vcpkg

To build the Flashlight CUDA backend from source using dependencies installed with vcpkg, install CUDA >= 9.2, cuDNN, NCCL, and Intel MKL, then build the rest of the dependencies for the CUDA backend based on which Flashlight features you'd like to build:

./vcpkg install \
    cuda intel-mkl fftw3 cub kenlm                \ # if building flashlight libraries
    arrayfire[cuda] cudnn nccl openmpi cereal stb \ # if building the flashlight neural net library
    gflags glog                                   \ # if building any flashlight apps
    libsndfile                                    \ # if building the flashlight asr app
    gtest                                           # optional, if building tests
Installing CPU Backend Dependencies with vcpkg

To build the Flashlight CPU backend from source using dependencies installed with vcpkg, install Intel MKL, then build the rest of the dependencies for the CPU backend based on which Flashlight features you'd like to build:

./vcpkg install \
    intel-mkl fftw3 kenlm                              \ # for flashlight libraries
    arrayfire[cpu] gloo[mpi] openmpi onednn cereal stb \ # for the flashlight neural net library
    gflags glog                                        \ # for any flashlight apps
    libsndfile                                         \ # for the flashlight asr app
    gtest                                                # optional, for tests
Build Using the vcpkg Toolchain File

To build Flashlight from source with these dependencies, clone the repository:

git clone https://github.com/flashlight/flashlight.git && cd flashlight
mkdir -p build && cd build

Then, build from source using vcpkg's CMake toolchain:

cmake .. \
    -DCMAKE_BUILD_TYPE=Release
    -DFL_BACKEND=CUDA
    -DCMAKE_TOOLCHAIN_FILE=[path to your vcpkg clone]/scripts/buildsystems/vcpkg.cmake
make -j$(nproc)
make install -j$(nproc) # only if you want to install Flashlight for external use

To build a subset of Flashlight's features, see the build options below.

Building from Source

To build from source, first install the below dependencies. Most are available with your system's local package manager.

Some dependencies marked below are downloaded and installed automatically if not found on the local system. FL_BUILD_STANDALONE determines this behavior — if disabled, dependencies won't be downloaded and built when building Flashlight.

Once all dependencies are installed, clone the repository:

git clone https://github.com/flashlight/flashlight.git && cd flashlight
mkdir -p build && cd build

Then build all Flashlight components with:

cmake .. -DCMAKE_BUILD_TYPE=Release -DFL_BACKEND=[backend] [...build options]
make -j$(nproc)
make install

Setting the MKLROOT environment variable (export MKLROOT=/opt/intel/oneapi/mkl/latest or export MKLROOT=/opt/intel/mkl on most Linux-based systems) can help CMake find Intel MKL if not initially found.

To build a smaller subset of Flashlight features/apps, see the build options below for a complete list of options.

To install Flashlight in a custom directory, use CMake's CMAKE_INSTALL_PREFIX argument. Flashlight libraries can be built as shared libraries using CMake's BUILD_SHARED_LIBS argument.

Flashlight uses modern CMake and IMPORTED targets for most dependencies. If a dependency isn't found, passing -D_DIR to your cmake command or exporting _DIR as an environment variable equal to the path to Config.cmake can help locate dependencies on your system. See the documentation for more details. If CMake is failing to locate a package, check to see if a corresponding issue has already been created before creating your own.

Dependencies

Dependencies marked with * are automatically downloaded and built from source if not found on the system. Setting FL_BUILD_STANDALONE to OFF disables this behavior.

Dependencies marked with ^ are required if building with distributed training enabled (FL_BUILD_DISTRIBUTED — see the build options below). Distributed training is required for all apps.

Dependencies marked with are installable via vcpkg. See the instructions for installing those dependencies above for doing a Flashlight from-source build.

Component Backend Dependencies
libraries CUDA CUDA >= 9.2, CUB*† (if CUDA < 11)
CPU A BLAS library (Intel MKL >= 2018, OpenBLAS†, etc)
core Any ArrayFire >= 3.7.3†, an MPI library^(OpenMPI†, etc),  cereal*† >= 1.3.0, stb*†
CUDA CUDA >= 9.2, NCCL^, cuDNN
CPU oneDNN† >= 2.0, gloo (with MPI)*^†
app: all Any Google Glog†, Gflags
app: asr Any libsndfile*† >= 10.0.28, a BLAS library (Intel MKL >= 2018, OpenBLAS†, etc)
app: imgclass Any -
app: objdet Any -
app: lm Any -
tests Any Google Test (gtest, with gmock)*† >= 1.10.0

Build Options

The Flashlight CMake build accepts the following build options (prefixed with -D when running CMake from the command line):

Name Options Default Value Description
FL_BACKEND CUDA, CPU, OPENCL CUDA Backend with which to build all components.
FL_BUILD_STANDALONE ON, OFF ON Downloads/builds some dependencies if not found.
FL_BUILD_LIBRARIES ON, OFF ON Build the Flashlight libraries.
FL_BUILD_CORE ON, OFF ON Build the Flashlight neural net library.
FL_BUILD_DISTRIBUTED ON, OFF ON Build with distributed training; required for apps.
FL_BUILD_CONTRIB ON, OFF ON Build contrib APIs subject to breaking changes.
FL_BUILD_ALL_APPS ON, OFF OFF Defines default value for every app (see below).
FL_BUILD_APP_ASR ON, OFF FL_BUILD_ALL_APPS Build the automatic speech recognition app.
FL_BUILD_APP_IMGCLASS ON, OFF FL_BUILD_ALL_APPS Build the image classification app.
FL_BUILD_APP_OBJDET ON, OFF FL_BUILD_ALL_APPS Build automatic speech recognition app tools.
FL_BUILD_APP_LM ON, OFF FL_BUILD_ALL_APPS Build the language modeling app.
FL_BUILD_APP_ASR_TOOLS ON, OFF FL_BUILD_APP_ASR Build automatic speech recognition app tools.
FL_BUILD_TESTS ON, OFF ON Build tests.
FL_BUILD_EXAMPLES ON, OFF ON Build examples.
FL_BUILD_EXPERIMENTAL ON, OFF OFF Build experimental components.
CMAKE_BUILD_TYPE See docs. Debug See the CMake documentation.
CMAKE_INSTALL_PREFIX [Directory] See docs. See the CMake documentation.

Building Your Own Project with Flashlight

Flashlight is most-easily linked to using CMake. Flashlight exports the following CMake targets when installed:

  • flashlight::fl-libraries — contains flashlight libraries headers and symbols.
  • flashlight::flashlight — contains flashlight libraries as well as the flashlight core autograd and neural network library.
  • flashlight::flashlight-app-asr — contains the automatic speech recognition app along with the flashlight core and flashlight libraries.
  • flashlight::flashlight-app-imgclass — contains the image classification app along with the flashlight core and flashlight libraries.
  • flashlight::flashlight-app-objdet — contains the object detection app along with the flashlight core and flashlight libraries.
  • flashlight::flashlight-app-lm — contains the language modeling app along with the flashlight core and flashlight libraries.

Given a simple project.cpp file that includes and links to Flashlight:

#include <iostream>

#include <arrayfire.h>
#include <flashlight/fl/flashlight.h>

int main() {
 fl::Variable v(af::constant(1, 1), true);
 auto result = v + 10;
 std::cout << "Hello World!" << std::endl;
 af::print("Array value is ", result.array()); // 11.000
 return 0;
}

The following CMake configuration links Flashlight and sets include directories:

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

add_executable(myProject project.cpp)

find_package(flashlight CONFIG REQUIRED)
target_link_libraries(myProject PRIVATE flashlight::flashlight)

With a vcpkg Flashlight Installation

If you installed Flashlight with vcpkg, the above CMake configuration for myProject can be built by running:

cd project && mkdir build && cd build
cmake .. \
  -DCMAKE_TOOLCHAIN_FILE=[path to vcpkg clone]/scripts/buildsystems/vcpkg.cmake \
  -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

With a From-Source Flashlight Installation

If using a from-source installation of Flashlight, Flashlight will be found automatically by CMake:

cd project && mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

If Flashlight is installed in a custom location using a CMAKE_INSTALL_PREFIX, passing -Dflashlight_DIR=[install prefix]/share/flashlight/cmake as an argument to your cmake command can help CMake find Flashlight.

Building and Running Flashlight with Docker

Flashlight and its dependencies can also be built with the provided Dockerfiles — see the accompanying Docker documentation for more information.

Contributing and Contact

Contact: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Flashlight is being very actively developed. See CONTRIBUTING for more on how to help out.

Acknowledgments

Some of Flashlight's code is derived from arrayfire-ml.

License

Flashlight is under a BSD license. See LICENSE for more information.

Owner
Comments
  • Build errors for CPU backend

    Build errors for CPU backend

    While trying to build for CPU backend, I get a list of build error starting with these:-

    [ 61%] Linking CXX executable MemoryFrameworkTest CMakeFiles/MemoryFrameworkTest.dir/memory/MemoryFrameworkTest.cpp.o: In function (anonymous namespace)::MockTestMemoryManager::alloc(bool, unsigned int, long long*, unsigned int)': MemoryFrameworkTest.cpp:(.text+0x5ca): undefined reference totesting::internal::UntypedFunctionMockerBase::SetOwnerAndName(void const*, char const*)' MemoryFrameworkTest.cpp:(.text+0x5eb): undefined reference to testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith(void*)' CMakeFiles/MemoryFrameworkTest.dir/memory/MemoryFrameworkTest.cpp.o: In function(anonymous namespace)::MockTestMemoryManager::allocated(void*)': MemoryFrameworkTest.cpp:(.text+0x6dc): undefined reference to testing::internal::UntypedFunctionMockerBase::SetOwnerAndName(void const*, char const*)' MemoryFrameworkTest.cpp:(.text+0x6ee): undefined reference totesting::internal::UntypedFunctionMockerBase::UntypedInvokeWith(void*)' CMakeFiles/MemoryFrameworkTest.dir/memory/MemoryFrameworkTest.cpp.o: In function (anonymous namespace)::MockTestMemoryManager::jitTreeExceedsMemoryPressure(unsigned long)': MemoryFrameworkTest.cpp:(.text+0x7dc): undefined reference totesting::internal::UntypedFunctionMockerBase::SetOwnerAndName(void const*, char const*)' MemoryFrameworkTest.cpp:(.text+0x7ee): undefined reference to testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith(void*)' CMakeFiles/MemoryFrameworkTest.dir/memory/MemoryFrameworkTest.cpp.o: In function(anonymous namespace)::MockTestMemoryManager::getMemoryPressure()': MemoryFrameworkTest.cpp:(.text+0x8d8): undefined reference to testing::internal::UntypedFunctionMockerBase::SetOwnerAndName(void const*, char const*)' MemoryFrameworkTest.cpp:(.text+0x8e5): undefined reference totesting::internal::UntypedFunctionMockerBase::UntypedInvokeWith(void*)'

  • [gh-actions] Action for running lightweight oneDNN benchmarks

    [gh-actions] Action for running lightweight oneDNN benchmarks

    See title — action stub/starting point. Builds on Ubuntu 20.04.

    We'll ship this lib to downstream projects pulling in libflashlight on install.

    Test Plan: CI

  • NameError: name 'CriterionType' is not defined

    NameError: name 'CriterionType' is not defined

    This is the error I get:

    NameError                                 Traceback (most recent call last)
    /tmp/ipykernel_18106/1529090364.py in <module>
         31 
         32 upstream = 'wav2vec2_hug_base_960'
    ---> 33 runner = Runner(args, config)
    
    ~/Desktop/ASR/s3prl/s3prl/downstream/runner.py in __init__(self, args, config)
         50         self.upstream = self._get_upstream()
         51         self.featurizer = self._get_featurizer()
    ---> 52         self.downstream = self._get_downstream()
         53         self.all_entries = [self.upstream, self.featurizer, self.downstream]
         54 
    
    ~/Desktop/ASR/s3prl/s3prl/downstream/runner.py in _get_downstream(self)
        131             upstream_rate = self.featurizer.model.downsample_rate,
        132             **self.config,
    --> 133             **vars(self.args)
        134         ).to(self.args.device)
        135 
    
    ~/Desktop/ASR/s3prl/s3prl/downstream/
    

    I reviewed this issue but I did not help me https://github.com/flashlight/flashlight/issues/416#issue-783305697

    1. when I run this import flashlight --> does not raises any error
    2. when I run this from flashlight.lib.text.decoder import CriterionType --> it raises this: ModuleNotFoundError: No module named 'flashlight.lib.text'
    3. I can successfully run examples in the flashlight without any error. so this command python flashlight/bindings/python/example/criterion_example.py does NOT raise any error
    4. And I have kenlm in this address /usr/local/share/kenlm

    @jacobkahn I tried to address the questions @tlikhomanenko had asked in that issue in order to make it easier.

    I really really appreciate it if you could give me some hints where should I check.

  • [sfx] time stretch

    [sfx] time stretch

    Add time stretch using sox's algorithm but without libsox threading issues. The algorithm is almost verbatim copied to make review simpler. Changes to original code applied only where required.

  • getting no error while cmake, but it stuck on make

    getting no error while cmake, but it stuck on make

    getting no error while cmake, but it stuck on make. plz help

    cmake .. -DCMAKE_BUILD_TYPE=Release -DFLASHLIGHT_BACKEND=CPU

    -- ArrayFire found (include: /opt/arrayfire/include, library: ArrayFire::afcuda) -- Could NOT find cereal (missing: cereal_INCLUDE_DIRS) -- cereal NOT found. Will download from source -- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_gf_lp64: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so -- Library mkl_gnu_thread: /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library iomp5: not found -- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_gf_lp64: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so -- Library mkl_intel_thread: /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library iomp5: not found -- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_gf: not found -- Checking for [mkl_gf - mkl_intel_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_gf: not found -- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_intel_lp64: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so -- Library mkl_gnu_thread: /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library iomp5: not found -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_intel_lp64: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so -- Library mkl_intel_thread: /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library iomp5: not found -- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_intel: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core - iomp5 - pthread - m] -- Library mkl_intel: not found -- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - pthread - m] -- Library mkl_gf_lp64: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so -- Library mkl_gnu_thread: /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so -- Library m: /usr/lib/x86_64-linux-gnu/libm.so -- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - pthread - m] -- Library mkl_gf_lp64: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so -- Library mkl_intel_thread: /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so -- Library m: /usr/lib/x86_64-linux-gnu/libm.so -- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - pthread - m] -- Library mkl_gf: not found -- Checking for [mkl_gf - mkl_intel_thread - mkl_core - pthread - m] -- Library mkl_gf: not found -- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - pthread - m] -- Library mkl_intel_lp64: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so -- Library mkl_gnu_thread: /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so -- Library m: /usr/lib/x86_64-linux-gnu/libm.so -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - pthread - m] -- Library mkl_intel_lp64: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so -- Library mkl_intel_thread: /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so -- Library m: /usr/lib/x86_64-linux-gnu/libm.so -- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - pthread - m] -- Library mkl_intel: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core - pthread - m] -- Library mkl_intel: not found -- Checking for [mkl_gf_lp64 - mkl_sequential - mkl_core - m] -- Library mkl_gf_lp64: /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so -- Library mkl_sequential: /opt/intel/mkl/lib/intel64/libmkl_sequential.so -- Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so -- Library m: /usr/lib/x86_64-linux-gnu/libm.so -- MKL found -- A library with BLAS API found. -- MKLDNN headers found in /usr/local/include -- Using MKLDNN library found in /usr/local/lib/libmkldnn.so -- Using MKL with MKL-DNN -- MKLDNN found -- Will build flashlight contrib assets. -- Gloo found -- NCCL not found -- MPI_CXX found -- MPI_CXX compile flags: -pthread -- MPI_CXX include path: /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/x86_64-linux-gnu/openmpi/include -- MPI_CXX LINK flags path: -pthread -- MPI_CXX libraries: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so -- MPI_C found -- MPI_C compile flags: -pthread -- MPI_C include path: /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/x86_64-linux-gnu/openmpi/include -- MPI_C LINK flags path: -pthread -- MPI_C libraries: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so -- gtest found: (include: /usr/include, lib: /usr/lib/libgtest.a;/usr/lib/libgtest_main.a -- Configuring done -- Generating done -- Build files have been written to: /home/test/Desktop/asr/wav2latter/flashlight/build

    make -j 4

    [ 14%] Linking CXX executable DatasetUtilsTest [ 14%] Building CXX object tests/CMakeFiles/ContribSerializationTest.dir/__/flashlight/contrib/modules/Transformer.cpp.o [ 14%] Built target DatasetUtilsTest [ 14%] Linking CXX executable ContribSerializationTest [ 14%] Built target ContribSerializationTest Makefile:140: recipe for target 'all' failed make: *** [all] Error 2

  • Conformer Module

    Conformer Module

    Hi, do you have an implementation of the Conformer Block in Flashlight you can share?

    These are my rough notes for what conformer seems to be: https://gist.github.com/lunixbochs/4d8a8c0ab9be45469337b82363bc2105

    This is roughly what I have so far, based on TDSBlock: https://gist.github.com/lunixbochs/207eff6e78b29e26712cee6fca42c400

    Here's my janky conformer W2L arch based on TDS:

    V -1 NFEAT 1 0
    SAUG 80 27 2 10 0.05 2
    PD 0 5 3
    C2 1 15 10 1 2 1 0 0
    R
    DO 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    CONFORMER 15 144 4 80 32 20 20 0.1
    V 1200 -1 1 0
    L 1200 NLABEL
    V NLABEL 0 -1 1
    
    CONFORMER is:
        return std::make_shared<w2l::ConformerBlock>(
            channels, encoderDim, attentionHeads, width, kernelSize,
            ffnInnerDim, mhsaInnerDim, dropout, rPad, lNormIncludeTime);
    

    Whitepaper is here: https://arxiv.org/pdf/2005.08100.pdf I believe this is a tensorflow implementation: https://github.com/TensorSpeech/TensorFlowASR/blob/main/tensorflow_asr/models/conformer.py#L209

    I haven't worked with Keras enough to get a sense for the correct activation shapes through this, my dimensions are definitely not quite right, and I'm kind of guessing on the precise architecture. I'm happy to fiddle with this more but if you have any advice for major things I may be doing wrong that would be appreciated. Right now I know for sure my shape going into the transformer is wrong, but I don't know how exactly. And I probably did the depth-wise convolution wrong.

  • Update warpctc library

    Update warpctc library

    Original PR: [!215]

    Summary

    Brings the latest updates of warpctc and uses warpctc CMakeLists.txt submit in this PR. It works but I think it will take time to be approved.

  • Add compiling support for CUDA v11 and cuDNN v8

    Add compiling support for CUDA v11 and cuDNN v8

    Original Issue: [#198] & [#213] & [#147]

    Summary

    • I updated CMakeList.txt of third party library (warp-ctc)[https://github.com/baidu-research/warp-ctc]. The latest CMakeLists.txt handles conditional compilation for architecture compute_30 depending on CUDA version (CUDAv11 deprecates this architecture).
    • The same is done for main CMakeLists.txt file
    • The deprecation policy of CUDA has changed. Therefore, the condition of cudnnSetRNNDescriptor used here needs to also include until version 8000 of cuDNN.
    • Finally, with CUDA v11 Nvidia cub is included in CUDA toolkit, but because flashlight downloads and uses it's own version, it collides with Thrust version. Hence, THRUST_IGNORE_CUB_VERSION_CHECK must be define in if using CUDA v11
  • [Error when importing Flashlight] ImportError: libfl-libraries.so.0: undefined symbol: _ZN2lm5ngram6ConfigC1Ev

    [Error when importing Flashlight] ImportError: libfl-libraries.so.0: undefined symbol: _ZN2lm5ngram6ConfigC1Ev

    Question

    I want to fine-tune Wav2vec with my own data. I got error when running this command:

    python3.7 fairseq/train.py './labelled_manifest' --save-dir './model_finetuning_wav2vec' --wer-args '("./labelled_manifest/lm.bin","./labelled_manifest/lexicon.txt",2,-1)' --post-process letter --valid-subset valid --no-epoch-checkpoints --best-checkpoint-metric wer --num-workers 128 --max-update 400000 --sentence-avg --task audio_pretraining --arch wav2vec_ctc --w2v-path './w2v2_pre_train_model/wav2vec_small.pt' --labels ltr --apply-mask --mask-selection static --mask-other 0 --mask-length 10 --mask-prob 0.5 --layerdrop 0.1 --mask-channel-selection static --mask-channel-other 0 --mask-channel-length 64 --mask-channel-prob 0.5 --zero-infinity --feature-grad-mult 0.0 --freeze-finetune-updates 10000 --validate-after-updates 10000 --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-08 --lr 2e-05 --lr-scheduler tri_stage --warmup-steps 8000 --hold-steps 32000 --decay-steps 40000 --final-lr-scale 0.05 --final-dropout 0.0 --dropout 0.0 --activation-dropout 0.1 --criterion ctc --attention-dropout 0.0 --max-tokens 1280000 --seed 2337 --log-format json --log-interval 500 --ddp-backend no_c10d

    The error is the following: NameError: name 'CriterionType' is not defined

    I have successfully installed flashlight according to these steps.

    I know that an issue has been opened with same error #468 but it doesn't fix my problem.

    Additional Context

    Im on Ubuntu 18.04 python 3.7.4 Cuda 11.2

  • Augment dataset as we train, augment datasets on disk, and use library to add sound effects

    Augment dataset as we train, augment datasets on disk, and use library to add sound effects

    Summary

    commit 942fc5d1805d13265d6488d8e8b235031716a798 (HEAD -> noise5, gilad/noise5) Author: Gilad Avidov [email protected] Date: Tue Sep 22 01:42:45 2020 -0700

    [sfx] add real-time dataset augmentation to Train.cpp
    
    The input dataset is augmented in the inner working of the dataset class hierarchy. The dataset class hierarchy reads sound files, transforms them to frequency domain using STFT and extracts using filterband. The augmentation is applied in the time domain. It takes place before frequency domain transformation. Train.cpp reads datasets for training and validation. A newly added set of flags supports augmenting both datasets.
    
    Here is an example for a training command that applies sound effects to the training dataset. The sound effects are:
    Reverberation with random number of echos, random absorption coefficient and random distance to the reflective objects.
    Additive noise with random 2 noise clips applied with random SNR chosen in the range of 10..25.
    Normalization of the augmented sound to ensure the signal is still in valid range after applying the sound effects.
    The sound effects are applied over the GPU as a backend.
    
    Example config file: sfx_config_noise_reverb_norm.json
    
    {
        "soundEffectChain": [
            {
                "type_": "AdditiveNoise",
                "noiseConfig_": {
                    "maxTimeRatio_": 1.0,
                    "minSnr_": 10.0,
                    "maxSnr_": 30.0,
                    "nClipsPerUtteranceMin_": 4,
                    "nClipsPerUtteranceMax_": 10,
                    "listFilePath_": "/private/home/wesbz/experiments/noise-unbalanced-16kHz-mono-train.lst",
                    "randomNoiseWithReplacement_": true,
                    "randomSeed_": 1234
                }
            },
            {
                "type_": "Reverberation",
                "reverbConfig_": {
                    "absorptionCoefficientMin_": 0.1,
                    "absorptionCoefficientMax_": 0.5,
                    "distanceToWallInMetersMin_": 1.0,
                    "distanceToWallInMetersMax_": 10.0,
                    "numEchosMin_": 0,
                    "numEchosMax_": 10,
                    "jitter_": 0.1,
                    "randomSeed_": 1234567
                }
            },
            {
                "type_": "Normalize"
            }
        ]
    }
    
    Set the flag --sfx_config_filename to point to the config file
    
    Train train --flagsfile=${FLAGS_FILE}  --sfx_config_filename=sfx_config_noise_reverb_norm.json
    

    commit bf6e9b285d3acd194339961476e0fbfea20b86d8 Author: Gilad Avidov [email protected] Date: Tue Sep 22 01:01:12 2020 -0700

    [sfx] add dataset augmentation tool.
    
    ApplySoundEffect is an executable for creating datasets by augmenting existing ones. It applies a sound effect chain to sounds from list files. ApplySoundEffect runs as follows:
    reads a list of list files
    reads the sound files from these list files
    augment the sound files
    save the augmented files with the same path but prefixed on a different root dir.
    add the augmented files to a consolidated list file of the augmented files.
    
    Example for creating new dataset called librispeech-clean-other-noise1-snr-10-20-clips-3 by augmenting librispeech dev-clean and dev-other with noise from a dataset at /dataset/noise1.lst.
    
    Example config file: sfx_config_noise.json
    {
        "soundEffectChain": [
            {
                "type_": "AdditiveNoise",
                "noiseConfig_": {
                    "maxTimeRatio_": 1.0,
                    "minSnr_": 10.0,
                    "maxSnr_": 20.0,
                    "nClipsPerUtteranceMin_": 0,
                    "nClipsPerUtteranceMax_": 3,
                    "listFilePath_": "/private/home/wesbz/experiments/noise-unbalanced-16kHz-mono-train.lst",
                    "randomNoiseWithReplacement_": true,
                    "randomSeed_": 1234
                }
            },
        ]
    }
    
    Set the flag --sfx_config_filename to point to the config file
    
    ApplySoundEffect \
      --input_rootdir=/dataset/librispeech/ \
      --input_listfiles=dev-clean.lst,dev-other.lst \
      --output_rootdir=/home/$USER/ \
      --output_listfile="librispeech-clean-other-noise1-snr-10-20-clips-3.lst" \
      --sfx_config_filename=sfx_config_noise.json
    

    commit 956e12ff24216af9dd82db58998a4b872188850b Author: Gilad Avidov [email protected] Date: Sun Sep 20 23:20:13 2020 -0700

    [sfx] experimental cpu/gpu sound effect augmentation library
    
    Add experimental sound effect library for high performance (support both cpu/gpu) speech augmentation.
    
    Currently implemented effects:
     - additive noise
     - reverberation
     - amplification.
     -normalization
    
    Sound effect objects are instantiated, configured and then chained.
    The sound effect chain is applied to the input set one at a time.
    Threading safe.
    
            auto soundEffectChain = std::make_shared<SoundEffectChain>();
    
            // Add reverberation sound effect.
            soundEffectChain->add(std::make_shared<Reverberation>(
                std::make_shared<Reverberation::ReverbEchoRirGenerator>(reverbConf)));
    
            // Add additive noise sound effect.
            soundEffectChain->add(std::make_shared<AdditiveNoise>(
               std::make_shared<AdditiveNoise::RandomNoiseGenerator>(noiseConf)));
    
            // Add sound normalization (scale down when out of range) sound effect.
            soundEffectChain->add(std::make_shared<Normalize>());
    
            // Apply the sound effect chain on sound data.
            std::function<void(std::vector<float>*)> augment = trainDsSoundEffect->asStdFunction();
                augment(sound)
    
    Add the library at fl::app::asr::sfx namespace and
    app/asr/experimental/soundeffect directory.
    
    Quip: https://fb.quip.com/UobCAPwM6e0t
    

    Test Plan (required)

    [steps by which you tested that your fix resolves the issue. These might include specific commands and configurations]

  • Why long audio get poor performance?

    Why long audio get poor performance?

    Question

    I found that decode long audio is much worse than split long audio into multi parts. I use librispeech as dataset and test with decode_transformer_s2s_ngram.cfg. e.g.

    LibriSpeech/test-clean/1995/1836/1995-1836-0004.flac is an audio with 33.91 seconds, its groundTruth text

    as she awaited her guests she surveyed the table with both satisfaction and disquietude for her social functions were few tonight there were she checked them off on her fingers sir james creighton the rich english manufacturer and lady creighton mister and missus vanderpool mister harry cresswell and his sister john taylor and his sister and mister charles smith whom the evening papers mentioned as likely to be united states senator from new jersey a selection of guests that had been determined unknown to the hostess by the meeting of cotton interests earlier in the day
    

    The decode result is

    as she awaited her guest she surveyed the table with both satisfaction and disquietude for her social functions were few to night there were she checked them off on her fingers sir james clinton the rich english manufacturer and lady horton mister and missus vanderpole mister harry cresswell and his sister john taylor and his sister and mister charles smith whom the evening papers mentioned as likely to be united states senator from new jersey a selection of guests that had been determined unknown to the hostess and mister charles smith whom the evening papers mentioned as likely to be united states senator from
    

    Note the ending part evening papers mentioned as likely to be united states senator from is far from ground truth determined unknown to the hostess by the meeting of cotton interests earlier in the day

    But if I break this long audio into 2 parts as following

    // first part ground truth
    as she awaited her guests she surveyed the table with both satisfaction and disquietude for her social functions were few tonight there were she checked them off on her fingers sir james creighton the rich english manufacturer and lady creighton mister and missus vanderpool
    
    //latter part ground truth
    mister harry cresswell and his sister john taylor and his sister and mister charles smith whom the evening papers mentioned as likely to be united states senator from new jersey a selection of guests that had been determined unknown to the hostess by the meeting of cotton interests earlier in the day
    

    The decode result is better as following

    //first part decode
    as she awaited her guest she surveyed the table with both satisfaction and disquietude for her social functions were few to night there were she checked them off on her fingers sir james clinton the rich english manufacturer and lady horton mister and missus vanderpole
    
    //latter part decode
    mister harry creswell and his sister john taylor and his sister and mister charles smith whom the evening papers mentioned as likely to be united states senator from new jersey a selection of guests that had been determined unknown to the hostess by the meeting of cotton interests earlier in the day
    

    For my result,

    | | WER | LER | |--------------------|-------|-------| | Decode whole audio | 22.9% | 15.3% | | Decode first part | 15.1% | 8.9% | | Decode latter part | 15.5% | 10.1 |

    Is this behavior OK for decoding?

  • Bump pillow from 9.0.1 to 9.3.0 in /flashlight/app/objdet/scripts

    Bump pillow from 9.0.1 to 9.3.0 in /flashlight/app/objdet/scripts

    Bumps pillow from 9.0.1 to 9.3.0.

    Release notes

    Sourced from pillow's releases.

    9.3.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.3.0 (2022-10-29)

    • Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

    • Initialize libtiff buffer when saving #6699 [radarhere]

    • Inline fname2char to fix memory leak #6329 [nulano]

    • Fix memory leaks related to text features #6330 [nulano]

    • Use double quotes for version check on old CPython on Windows #6695 [hugovk]

    • Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

    • Fixed set_variation_by_name offset #6445 [radarhere]

    • Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

    • Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

    • Added ExifTags enums #6630 [radarhere]

    • Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

    • Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

    • Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

    • Added GPS TIFF tag info #6661 [radarhere]

    • Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

    • Do not attempt normalization if mode is already normal #6644 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Swap to CMake FetchContent and cleanup modules

    Swap to CMake FetchContent and cleanup modules

    See title. CMake FetchContent solves a bunch of install-related issues and makes reasoning about configurable downstream dependencies easier.

    Test plan: CI + local builds without the required dependencies and with FL_BUILD_STANDALONE set to ON

  • CMake Hygiene post-lib removal

    CMake Hygiene post-lib removal

    Numerous improvements to Flashlight's CMake config including:

    • append to CMAKE_MODULE_PATH rather than setting to avoid clobbering downstream projects' path
    • use modern CUDA components from CMake > 3.15
    • remove FL_BUILD_CORE since flashlight/lib is gone and the core always builds by default
    • remove a bunch of useless shims on source paths
    • improvements to distributed build configuration
    • modern usage of CMake FindMPI

    Test plan: CI + build with:

    cmake .. -DFL_USE_ARRAYFIRE=ON -DFL_ARRAYFIRE_USE_CUDA=ON -DFL_USE_ONEDNN=OFF -DFL_BUILD_TESTS=ON -DFL_BUILD_EXAMPLES=OFF -DArrayFire_DIR=/checkpoint/jacobkahn/usr/share/ArrayFire/cmake -DFL_BUILD_DISTRIBUTED=
    ON -DBUILD_SHARED_LIBS=ON -DFL_BUILD_PKG_SPEECH=ON -DFL_BUILD_PKG_RUNTIME=ON
    
  • python3 setup.py install fails because FFTW3 doesn't support installation via CMake

    python3 setup.py install fails because FFTW3 doesn't support installation via CMake

    Bug Description

    When creating python bindings for flashlight, we run python setup.py install in the directory bindings/python. The scriptsetup.py starts a new sub-process: https://github.com/flashlight/flashlight/blob/2261fd9e4ff9f6155ddd40eed19f45c2dd72939a/bindings/python/setup.py#L99-L104 It runs Cmake to build essential components. The problem is that FFTW3 doesn't support installation via CMake according to the reply in the issue:

    However as the documentation says to use configure rather than cmake, the file FFTW3LibraryDepends.cmake may be missing on most installations. Is there any plan to make CMake the default compilation toolchain? Anyway, it would be worth supporting fftw detection through Cmake's find_package.

    and the official documentation

    Reproduction Steps

    • cd ./repos/flashlight/bindings/python/
    • Run command python ./setup.py install
    • A trace of the error
    running bdist_egg
    running egg_info
    writing flashlight.egg-info/PKG-INFO
    writing dependency_links to flashlight.egg-info/dependency_links.txt
    writing namespace_packages to flashlight.egg-info/namespace_packages.txt
    writing top-level names to flashlight.egg-info/top_level.txt
    reading manifest file 'flashlight.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'flashlight.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-x86_64/egg
    running install_lib
    running build_py
    running build_ext
    -- -rdynamic supported.
    -- CUDA found (library: /usr/local/cuda-11.1/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so include: /usr/local/cuda-11.1/include)
    -- OpenCL found (library: /usr/lib/x86_64-linux-gnu/libOpenCL.so include: /usr/include)
    -- CUDA found (library: /usr/local/cuda-11.1/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so include: /usr/local/cuda-11.1/include)
    -- CUDA architecture flags: -gencodearch=compute_35,code=sm_35-gencodearch=compute_50,code=sm_50-gencodearch=compute_52,code=sm_52-gencodearch=compute_60,code=sm_60-gencodearch=compute_61,code=sm_61-gencodearch=compute_70,code=sm_70-gencodearch=compute_75,code=sm_75-gencodearch=compute_80,code=sm_80-gencodearch=compute_80,code=compute_80
    -- MKL_THREADING = OMP
    -- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
    --   Library mkl_intel_lp64: /usr/lib/x86_64-linux-gnu/libmkl_intel_lp64.so
    --   Library mkl_gnu_thread: /usr/lib/x86_64-linux-gnu/libmkl_gnu_thread.so
    --   Library mkl_core: /usr/lib/x86_64-linux-gnu/libmkl_core.so
    --   Library gomp: -fopenmp
    --   Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so
    --   Library m: /usr/lib/x86_64-linux-gnu/libm.so
    --   Library dl: /usr/lib/x86_64-linux-gnu/libdl.so
    -- MKL library found
    -- CBLAS found (include: /usr/include/mkl, library: /usr/lib/x86_64-linux-gnu/libmkl_intel_lp64.so;/usr/lib/x86_64-linux-gnu/libmkl_gnu_thread.so;/usr/lib/x86_64-linux-gnu/libmkl_core.so;-fopenmp;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libdl.so)
    CMake Error at /usr/local/lib/cmake/fftw3/FFTW3Config.cmake:14 (include):
      include could not find load file:
    
        /usr/local/lib/cmake/fftw3/FFTW3LibraryDepends.cmake
    Call Stack (most recent call first):
      cmake/FindFFTW3.cmake:22 (find_package)
      flashlight/lib/audio/feature/CMakeLists.txt:17 (find_package)
      flashlight/lib/audio/CMakeLists.txt:16 (include)
      flashlight/lib/CMakeLists.txt:17 (include)
      CMakeLists.txt:118 (include)
    
    
    -- FFTW found
    -- Configuring incomplete, errors occurred!
    See also "~/repos/flashlight/bindings/python/build/temp.linux-x86_64-cpython-38/CMakeFiles/CMakeOutput.log".
    See also "~/repos/flashlight/bindings/python/build/temp.linux-x86_64-cpython-38/CMakeFiles/CMakeError.log".
    Traceback (most recent call last):
      File "./setup.py", line 113, in <module>
        setup(
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
        return run_commands(dist)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
        dist.run_commands()
      File ~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
        self.run_command(cmd)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
        cmd_obj.run()
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/command/install.py", line 74, in run
        self.do_egg_install()
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/command/install.py", line 123, in do_egg_install
        self.run_command('bdist_egg')
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
        self.distribution.run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
        cmd_obj.run()
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 165, in run
        cmd = self.call_command('install_lib', warn_dir=0)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
        self.run_command(cmdname)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
        self.distribution.run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
        cmd_obj.run()
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
        self.build()
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/command/install_lib.py", line 112, in build
        self.run_command('build_ext')
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
        self.distribution.run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/dist.py", line 1217, in run_command
        super().run_command(command)
      File "~/miniconda3/envs/lrs/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
        cmd_obj.run()
      File "./setup.py", line 55, in run
        self.build_extensions(ext)
      File "./setup.py", line 99, in build_extensions
        subprocess.check_call(
      File "~/miniconda3/envs/lrs/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '~/repos/flashlight', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=~/repos/flashlight/bindings/python/build/lib.linux-x86_64-cpython-38', '-DPYTHON_EXECUTABLE=~/miniconda3/envs/lrs/bin/python3', '-DFL_BUILD_STANDALONE=OFF', '-DBUILD_SHARED_LIBS=ON', '-DFL_BUILD_CORE=OFF', '-DFL_BUILD_ALL_LIBS=ON', '-DFL_BUILD_EXAMPLES=OFF', '-DFL_BUILD_TESTS=OFF', '-DFL_LIBRARIES_BUILD_FOR_PYTHON=ON', '-DFL_LIBRARIES_USE_MKL=ON', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
    

    Platform and Hardware

    • Linux avsu-ESC8000-G4 5.15.0-52-generic 58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    • * GeForce RTX3090

    Additional Context

    No additional information

    Solution

    As we can see from the error message:

    CMake Error at /usr/local/lib/cmake/fftw3/FFTW3Config.cmake:14 (include):
      include could not find load file:
    
        /usr/local/lib/cmake/fftw3/FFTW3LibraryDepends.cmake
    Call Stack (most recent call first):
      cmake/FindFFTW3.cmake:22 (find_package)
      flashlight/lib/audio/feature/CMakeLists.txt:17 (find_package)
      flashlight/lib/audio/CMakeLists.txt:16 (include)
      flashlight/lib/CMakeLists.txt:17 (include)
      CMakeLists.txt:118 (include)
    
    
    -- FFTW found
    -- Configuring incomplete, errors occurred!
    

    CMake could find the file FFTW3LibraryDepends.cmake due to the inherent bug of FFTW as mentioned before. However, we did have installed FFTW3 via ./configure && make && sudo make install. Since it shows -- FFTW found, we can safely ignore the error message and continue compiling the python bindings.

    Solution: Change subprocess.check_call to subprocess.call to ignore CMake errors.

  • Implement indexed update

    Implement indexed update

    Depends on #1052

    Summary

    Support indexed update while retaining maximal graph and preserving computation semantics. This requires a new IndexUpdateNode because the "indexed update" semantics cannot be composed from existing nodes.

    Example:

    Tensor t = ...;
    Tensor add = t + t;
    Tensor t0 = t(0);
    t(0, 0) = 42;
    // 1. both t and t0 are affected (if one prints them out, update is reflected)
    // 2. `add` isn't affected (old graph remains intact -- graph computation semantics is immutable)
    // 3. doesn't matter which tensor we use to trigger evaluation (lazy evaluation remains transparent to user)
    

    Test Plan (required)

    New unit tests are added

    make JitNodeTest
    make JitEvaluatorTest
    make JitTensorTest
    
  • Valgrind Reporting Memory Leaks in Flashlight Libraries.

    Valgrind Reporting Memory Leaks in Flashlight Libraries.

    Bug Description

    Running Inference with Valgrind tool reports memory leaks in Flashlight and ArrayFire libraires.

    ==209== 35,580,804 (472 direct, 35,580,332 indirect) bytes in 1 blocks are definitely lost in loss record 4,810 of 4,817 ==209== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==209== by 0x932E53: std::_Hashtable<int, std::pair<int const, std::vector<fl::lib::text::LexiconDecoderState, std::allocatorfl::lib::text::LexiconDecoderState > >, std::allocator<std::pair<int const, std::vector<fl::lib::text::LexiconDecoderState, std::allocatorfl::lib::text::LexiconDecoderState > > >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_rehash(unsigned long, unsigned long const&) ) ==209== by 0x932FA7: std::_Hashtable<int, std::pair<int const, std::vector<fl::lib::text::LexiconDecoderState, std::allocatorfl::lib::text::LexiconDecoderState > >, std::allocator<std::pair<int const, std::vector<fl::lib::text::LexiconDecoderState, std::allocatorfl::lib::text::LexiconDecoderState > > >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<int const, std::vector<fl::lib::text::LexiconDecoderState, std::allocatorfl::lib::text::LexiconDecoderState > >, false>, unsigned long) ) ==209== by 0x93018D: fl::lib::text::LexiconDecoder::decodeStep(float const, int, int) ) ==209== by 0x5BFE8C: fl::lib::text::Decoder::decode(float const*, int, int) (Decoder.h:54) ==209== by 0xE837608: start_thread (pthread_create.c:477) ==209== by 0x1B495132: clone (clone.S:95)

    ==4513== 269,528 (752 direct, 268,776 indirect) bytes in 2 blocks are definitely lost in loss record 4,446 of 4,519 ==4513== at 0x4C3217F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==4513== by 0x17E8E4F59: cudnnCreate (in /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5) ==4513== by 0x7F5A32: fl::getCudnnHandle() ==4513== by 0x86D85A: fl::batchnorm(fl::Variable const&, fl::Variable const&, fl::Variable const&, fl::Variable&, fl::Variable&, std::vector<int, std::allocator > const&, bool, double, double) ) ==4513== by 0x81FF31: fl::LayerNorm::forward(fl::Variable const&) ) ==4513== by 0x8278BD: fl::UnaryModule::forward(std::vector<fl::Variable, std::allocatorfl::Variable > const&) ) ==4513== by 0x858796: fl::ext::forwardSequentialModuleWithPadMask(fl::Variable const&, std::shared_ptrfl::Module, af::array const&) ) ==4513== by 0xEF866DA: start_thread (pthread_create.c:463) ==4513== by 0x1D57961E: clone (clone.S:95)

    ==209== 25,952 (416 direct, 25,536 indirect) bytes in 4 blocks are definitely lost in loss record 4,642 of 4,817 ==209== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==209== by 0x935C73: std::_Hashtable<int, std::pair<int const, std::shared_ptrfl::lib::text::TrieNode >, std::allocator<std::pair<int const, std::shared_ptrfl::lib::text::TrieNode > >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_rehash(unsigned long, unsigned long const&) ) ==209== by 0x935E6A: std::__detail::_Map_base<int, std::pair<int const, std::shared_ptrfl::lib::text::TrieNode >, std::allocator<std::pair<int const, std::shared_ptrfl::lib::text::TrieNode > >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](int const&) ) ==209== by 0x9356B9: fl::lib::text::Trie::insert(std::vector<int, std::allocator > const&, int, float) ) ==209== by 0x91835A: fl::app::asr::buildTrie(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, std::shared_ptrfl::lib::text::LM, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, fl::lib::text::Dictionary const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > > > > > const&, fl::lib::text::Dictionary const&, int, int) ) ==209== by 0xE837608: start_thread (pthread_create.c:477) ==209== by 0x1B495132: clone (clone.S:95)

    ==4513== 13,880 (1,680 direct, 12,200 indirect) bytes in 42 blocks are definitely lost in loss record 4,317 of 4,519 ==4513== at 0x4C3217F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==4513== by 0x807896: fl::CachingMemoryManager::alloc(bool, unsigned int, long long*, unsigned int) ) ==4513== by 0x80278C: fl::MemoryManagerInstaller::MemoryManagerInstaller(std::shared_ptrfl::MemoryManagerAdapter)::{lambda(void*, void**, int, unsigned int, long long*, unsigned int)#3}::_FUN(void*, void**, int, unsigned int, long long*, unsigned int) ) ==4513== by 0xB19AAA22: MemoryManagerFunctionWrapper::alloc(bool, unsigned int, long long*, unsigned int) (in /opt/arrayfire/lib/libafcuda.so.3.7.3) ==4513== by 0xB1229295: std::unique_ptr<float [], std::function<void (float*)> > cuda::memAlloc(unsigned long const&) (in /opt/arrayfire/lib/libafcuda.so.3.7.3) ==4513== by 0xB0CD3F6C: cuda::Array::Array(af::dim4 const&) (in /opt/arrayfire/lib/libafcuda.so.3.7.3) ==4513== by 0xB0CD4038: cuda::Array cuda::createEmptyArray(af::dim4 const&) (in /opt/arrayfire/lib/libafcuda.so.3.7.3) ==4513== by 0xB17DD2F0: createHandle(af::dim4 const&, af_dtype) (in /opt/arrayfire/lib/libafcuda.so.3.7.3) ==4513== by 0xB17E0D1D: af_create_handle (in /opt/arrayfire/lib/libafcuda.so.3.7.3) ==4513== by 0x7A49B5B: af_create_handle (in /opt/arrayfire/lib/libaf.so.3.7.3) ==4513== by 0x7B600BD: (anonymous namespace)::initEmptyArray(af_dtype, long long, long long, long long, long long) (in /opt/arrayfire/lib/libaf.so.3.7.3) ==4513== by 0x7B61424: af::array::array(af::dim4 const&, af_dtype) (in /opt/arrayfire/lib/libaf.so.3.7.3

    Reproduction Steps

    Use Valgrind tool with the inference application. Example: valgrind --log-file=/tmp/valgrind_$( date +"%Y_%m_%d-%H_%M_%S" ) --leak-check=full ./bin/inference.exe

    Platform and Hardware

    Intel Processor Nvidia GPU Ubuntu 18.04 LTS

    Additional Context

    Flashlight Commit: Version 0.31 Tag

Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference
 Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

Nov 16, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

Sep 19, 2022
mlpack: a scalable C++ machine learning library --
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

Nov 17, 2022
ML++ - A library created to revitalize C++ as a machine learning front end
ML++ - A library created to revitalize C++ as a machine learning front end

ML++ Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and

Nov 19, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Nov 17, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Nov 14, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library,  for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Nov 17, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

Nov 10, 2022
Samsung Washing Machine replacing OS control unit

hacksung Samsung Washing Machine WS1702 replacing OS control unit More info at https://www.hackster.io/roni-bandini/dead-washing-machine-returns-to-li

Sep 24, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Nov 18, 2022
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Aug 24, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Nov 23, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Nov 24, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

Nov 19, 2022
oneAPI Data Analytics Library (oneDAL)
oneAPI Data Analytics Library (oneDAL)

Intel® oneAPI Data Analytics Library Installation | Documentation | Support | Examples | Samples | How to Contribute Intel® oneAPI Data Analytics Libr

Nov 16, 2022
A C library for product recommendations/suggestions using collaborative filtering (CF)

Recommender A C library for product recommendations/suggestions using collaborative filtering (CF). Recommender analyzes the feedback of some users (i

Nov 17, 2022
An open library of computer vision algorithms

VLFeat -- Vision Lab Features Library Version 0.9.21 The VLFeat open source library implements popular computer vision algorithms specialising in imag

Nov 15, 2022