Facebook AI Research's Automatic Speech Recognition Toolkit

wav2letter++

CircleCI Join the chat at https://gitter.im/wav2letter/community

Important Note:

wav2letter has been moved and consolidated into Flashlight in the ASR application.

Future wav2letter development will occur in Flashlight.

To build the old, pre-consolidation version of wav2letter, checkout the wav2letter v0.2 release, which depends on the old Flashlight v0.2 release. The wav2letter-lua project can be found on the wav2letter-lua branch, accordingly.

For more information on wav2letter++, see or cite this arXiv paper.

Recipes

This repository includes recipes to reproduce the following research papers as well as pre-trained models:

Data preparation for training and evaluation can be found in data directory.

Building the Recipes

First, install Flashlight with the ASR application. Then, after cloning the project source:

mkdir build && cd build
cmake .. && make -j8

If Flashlight or ArrayFire are installed in nonstandard paths via a custom CMAKE_INSTALL_PREFIX, they can be found by passing

-Dflashlight_DIR=[PREFIX]/usr/share/flashlight/cmake/ -DArrayFire_DIR=[PREFIX]/usr/share/ArrayFire/cmake

when running cmake.

Join the wav2letter community

License

wav2letter++ is BSD-licensed, as found in the LICENSE file.

Comments
  • Error 'ArrayFire Exception (Device out of memory:101) when TDS CTC is decoded

    Error 'ArrayFire Exception (Device out of memory:101) when TDS CTC is decoded

    I used --iter=100000 instead of 10000000 to train the TDS CTC model. At decoding, I got

    ...
    Unable to allocate memory with native alloc for size 41943040 bytes with error 'ArrayFire Exception (Device out of memory:101):
    ArrayFire error: 
    In function fl::MemoryManagerInstaller::MemoryManagerInstaller(std::shared_ptr<fl::MemoryManagerAdapter>)::<lambda(size_t)>
    In file /root/flashlight/flashlight/memory/MemoryManagerInstaller.cpp:178'terminate called after throwing an instance of 'af::exception'
      what():  ArrayFire Exception (Unknown error:999):
    
    In function T* af::array::device() const [with T = void]
    In file src/api/cpp/array.cpp:1024
    *** Aborted at 1592624340 (unix time) try "date -d @1592624340" if you are using GNU date ***
    PC: @     0x7f077ba1ae97 gsignal
    *** SIGABRT (@0x5a) received by PID 90 (TID 0x7f07c12ea380) from PID 90; stack trace: ***
        @     0x7f07b9600890 (unknown)
        @     0x7f077ba1ae97 gsignal
        @     0x7f077ba1c801 abort
        @     0x7f077c40f957 (unknown)
        @     0x7f077c415ab6 (unknown)
        @     0x7f077c415af1 std::terminate()
        @     0x7f077c415d24 __cxa_throw
        @     0x7f079c66a728 af::array::device<>()
        @     0x55793ed1bc4c fl::DevicePtr::DevicePtr()
        @     0x55793ed9823d fl::conv2d()
        @     0x55793ed83c0f fl::AsymmetricConv1D::forward()
        @     0x55793ed57cfe fl::UnaryModule::forward()
        @     0x55793ed699b5 fl::WeightNorm::forward()
        @     0x55793ed8b2a1 fl::Residual::forward()
        @     0x55793ed8b3fd fl::Residual::forward()
        @     0x55793ed4369a fl::Sequential::forward()
        @     0x55793ec62a4a _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii
        @     0x55793ec63866 _ZNSt17_Function_handlerIFSt6vectorIS0_IfSaIfEESaIS2_EERKS0_IiSaIiEES8_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS8_S8_iiE_E9_M_invokeERKSt9_Any_dataS8_S8_OiSK_
        @     0x55793ebc6ba0 w2l::ConvLM::scoreWithLmIdx()
        @     0x55793ebc7264 w2l::ConvLM::score()
        @     0x55793ea7a9f6 main
        @     0x7f077b9fdb97 __libc_start_main
        @     0x55793ead6d2a _start
    Aborted (core dumped)
    

    Any ideas to take care of this? Thank you!

  • PSA: Realtime audio frontend demo for macOS

    PSA: Realtime audio frontend demo for macOS

    This is a working-out-of-the-box demo for realtime speech recognition on macOS with wav2letter++

    This is based on my C API in https://github.com/facebookresearch/wav2letter/issues/326 There's a src dir in the w2l_cli tarball with the frontend source (w2l_cli.cpp) and scripts/instructions for building this all from scratch.

    to install:

    wget https://talonvoice.com/research/w2l_cli.tar.gz
    tar -xf w2l_cli.tar.gz && rm w2l_cli.tar.gz
    cd w2l_cli
    wget https://talonvoice.com/research/epoch186-ls3_14.tar.gz
    tar -xf epoch186-ls3_14.tar.gz && rm epoch186-ls3_14.tar.gz
    

    to run: ./bin/w2l emit epoch186-ls3_14/model.bin epoch186-ls3_14/tokens.txt

    Then speak, and you should see emissions (letter predictions) in the terminal output after you speak, for example:

    $ ./bin/w2l emit epoch186-ls3_14/model.bin epoch186-ls3_14/tokens.txt 
    helow|world
    this|is|a|test|of|wave|to|leter
    

    Language model decoding is also wired up via ./bin/w2l decode am tokens lm lexicon, but as per #326 it segfaults right now when setting up the Trie.

    There are more pretrained english acoustic models at https://talonvoice.com/research/ you can try as well.

  • Re: Error: ArrayFire Exception (Device out of memory:101)

    Re: Error: ArrayFire Exception (Device out of memory:101)

    From https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/sota/2019/librispeech/README.md, I ran Resnet CTC training:

    [email protected]:~# wav2letter/build/Train train --flagsfile wav2letter/recipes/models/sota/2019/librispeech/train_am_resnet_ctc.cfg --minloglevel=0 --logtostderr=1
    

    and got:

    ...
    Falling back to using letters as targets for the unknown word: bui
    Falling back to using letters as targets for the unknown word: d'avrigny
    Falling back to using letters as targets for the unknown word: d'avrigny
    Falling back to using letters as targets for the unknown word: d'avrigny
    Falling back to using letters as targets for the unknown word: wildering
    Falling back to using letters as targets for the unknown word: quinci
    Falling back to using letters as targets for the unknown word: impara
    I0607 21:13:24.300433   299 W2lListFilesDataset.cpp:141] 2703 files found. 
    I0607 21:13:24.300465   299 Utils.cpp:102] Filtered 2703/2703 samples
    I0607 21:13:24.300472   299 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0
    I0607 21:13:24.489915   299 W2lListFilesDataset.cpp:141] 2864 files found. 
    I0607 21:13:24.489969   299 Utils.cpp:102] Filtered 4/2864 samples
    I0607 21:13:24.490177   299 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 715
    I0607 21:13:24.497853   299 Train.cpp:566] Shuffling trainset
    I0607 21:13:24.502732   299 Train.cpp:573] Epoch 1 started!
    Unable to allocate memory with native alloc for size 50331648 bytes with error 'ArrayFire Exception (Device out of memory:101):
    ArrayFire error: 
    In function fl::MemoryManagerInstaller::MemoryManagerInstaller(std::shared_ptr<fl::MemoryManagerAdapter>)::<lambda(size_t)>
    In file /root/flashlight/flashlight/memory/MemoryManagerInstaller.cpp:178'terminate called after throwing an instance of 'af::exception'
      what():  ArrayFire Exception (Unknown error:999):
    
    In function T* af::array::device() const [with T = void]
    In file src/api/cpp/array.cpp:1024
    *** Aborted at 1591564406 (unix time) try "date -d @1591564406" if you are using GNU date ***
    PC: @     0x7f07ea4b5e97 gsignal
    *** SIGABRT (@0x12b) received by PID 299 (TID 0x7f082fd85380) from PID 299; stack trace: ***
        @     0x7f082809b890 (unknown)
        @     0x7f07ea4b5e97 gsignal
        @     0x7f07ea4b7801 abort
        @     0x7f07eaeaa957 (unknown)
        @     0x7f07eaeb0ab6 (unknown)
        @     0x7f07eaeb0af1 std::terminate()
        @     0x7f07eaeb0d24 __cxa_throw
        @     0x7f080b105728 af::array::device<>()
        @     0x55d52ae5390c fl::DevicePtr::DevicePtr()
        @     0x55d52aecdd6d fl::conv2d()
        @     0x55d52ae7ce31 fl::Conv2D::forward()
        @     0x55d52ae8e7de fl::UnaryModule::forward()
        @     0x55d52aec17e1 fl::Residual::forward()
        @     0x55d52aec193d fl::Residual::forward()
        @     0x55d52ae7bd7a fl::Sequential::forward()
        @     0x55d52abbeaf3 _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEES_INS0_19FirstOrderOptimizerEES9_ddblE3_clES2_S5_S7_S9_S9_ddbl
        @     0x55d52ab595f8 main
        @     0x7f07ea498b97 __libc_start_main
        @     0x55d52abb936a _start
    Aborted (core dumped)
    

    I noticed that @kfmn, @vineelpratap, @sanjaykasturia and @jacobkahn etc. have talked about similar issues at https://github.com/facebookresearch/wav2letter/issues/133, and https://github.com/facebookresearch/wav2letter/issues/248. May I ask whether there is a way to take care of this? For example, any wav2letter models (other than E2E Speech Recognition on Librispeech-Clean Dataset https://github.com/facebookresearch/wav2letter/tree/master/tutorials/1-librispeech_clean), which might be good to transcribe a telephony audio, are smaller than this? BTW, I just have 1 GPU, and

    image

    Thank you!

  • The error: ModuleNotFoundError: No module named 'sox'

    The error: ModuleNotFoundError: No module named 'sox'

    From

    E2E Speech Recognition on Librispeech-Clean Dataset https://github.com/facebookresearch/wav2letter/tree/master/tutorials/1-librispeech_clean

    I did:

    $ pip install sox
    Collecting sox
      Using cached https://files.pythonhosted.org/packages/60/a0/5bee540554af8376e0313e462629d95bf2f2bc3c8cb60697aa01254e6cf5/sox-1.3.7-py2.py3-none-any.whl
    Installing collected packages: sox
    Successfully installed sox-1.3.7
    

    Then:

    $ python3.7 wav2letter/tutorials/1-librispeech_clean/prepare_data.py --src $W2LDIR/LibriSpeech/ --dst $W2LDIR
    Traceback (most recent call last):
      File "wav2letter/tutorials/1-librispeech_clean/prepare_data.py", line 30, in <module>
        import sox
    ModuleNotFoundError: No module named 'sox'
    Error in sys.excepthook:
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
        from apport.fileutils import likely_packaged, get_recent_crashes
      File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
        from apport.report import Report
      File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
        import apport.fileutils
      File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
        from apport.packaging_impl import impl as packaging
      File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
        import apt
      File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
        import apt_pkg
    ModuleNotFoundError: No module named 'apt_pkg'
    
    Original exception was:
    Traceback (most recent call last):
      File "wav2letter/tutorials/1-librispeech_clean/prepare_data.py", line 30, in <module>
        import sox
    ModuleNotFoundError: No module named 'sox'
    

    If I run it without having "python3.7" in the command, it is:

    $ wav2letter/tutorials/1-librispeech_clean/prepare_data.py --src $W2LDIR/LibriSpeech/ --dst $W2LDIR
    bash: wav2letter/tutorials/1-librispeech_clean/prepare_data.py: Permission denied
    

    If I did exactly the same as the tutorial, it is:

    $ wav2letter/tutorials/librispeech_clean/prepare_data.py --src $W2LDIR/LibriSpeech/ --dst $W2LDIR
    bash: wav2letter/tutorials/librispeech_clean/prepare_data.py: No such file or directory
    

    May I ask how to take care the above issue?

    I assume that those listed in

    Building Running with Docker https://github.com/facebookresearch/wav2letter/wiki/Building-Running-with-Docker

    are all dependencies we need.

    In addition, might a model like this be good for transcribing a telephony audio? Thank you!

  • Fine tune CTC model

    Fine tune CTC model

    I want to use wav2letter to adapt it to my training dataset. I am following the colab notebook FineTuneCTC.ipynb which uses

    ./flashlight/build/bin/asr/fl_asr_tutorial_finetune_ctc model.bin

    I have a few questions around this approach:

    1. My dataset is also English but has different pronunciations for certain words, e.g. carpark is pronounced as "capak". There are still speakers in my dataset which will use more American English and thus will pronounce carpark. I want the ASR system to be able to pick up on both variations.
    2. There are also new words in my dataset, all based on the same token set [a-z] as which the acoustic model has been trained for.

    So given 1,2, does it make sense for me to directly use FineTuneCTC.ipynb or there are additional steps for me to do here?

    Is this the best model to use in my case or should i use SOTA 2019 model?

    Thanks!

  • cannot find flashlight::distributed

    cannot find flashlight::distributed

    I get the following message while trying to build wav2letter. As far as I can tell the flashlight build went just fine and it was happily building all sorts of objects for distributed training. Can you help explain what this means?

    -- Found gflags (include: /usr/include, library: /usr/lib64/libgflags.so) -- GFLAGS found -- OpenMP found -- flashlight found (include: lib: flashlight::flashlight ) CMake Error at CMakeLists.txt:103 (message): flashlight must be build in distributed mode for wav2letter++

  • how to support userwords function?

    how to support userwords function?

    As many ASR egines support userwords function. It means the origin asr result of sample.wav is "中美数控"。And if you using userwords.txt which has "中美速控" in it, the finally asr result is "中美速控" instead of "中美数控". It acts like asr result corrector and I think I should do something with lm's word frequency to get the function but I don't know where to add it? I'm using kenlm samples.zip

  • Wrong Character Prediction in starting of transcription during Decoding

    Wrong Character Prediction in starting of transcription during Decoding

    I trained wav2letter model successfully on 100 hour subset of librispeech. and decoded using the below decoder.cfg file

    --lexicon=/hdd1/nfshare_hulk/wav2vec/libri_exp/am/lexicon.txt --lm=/hdd1/nfshare_hulk/wav2vec/libri_exp/lm/4-gram.binary --am=/hdd1/nfshare_hulk/wav2vec/wav2vec_exp/librispeech_clean_asg_12Conv_prelu_lowreg_512kernel_trainlogs/001_model_#hdd1#nfshare_hulk#wav2vec#libri_exp#lists#test.trans.h5.tsv.bin --test=/hdd1/nfshare_hulk/wav2vec/libri_exp/lists/test.trans.h5.tsv --sclite=/hdd1/nfshare_hulk/wav2vec/wav2vec_exp/librispeech_clean_asg_12Conv_prelu_lowreg_512kernel_trainlogs/sclite --lmweight=2 --wordscore=0.5 --beamsize=500 --beamthreshold=25 --silweight=-0.5 --nthread_decoder=1 --smearing=max --show=true

    but during decoding i am always getting some character in the starting of each sample which is effecting the overall results.

    |T|: they pointedly drew back from john jago as he approached the empty chair next to me and moved round to the opposite side of the table |P|: d they pointedly drew back from john jago as he approach the him to chair next me and move round to the opposite side of the table [sample: test1544, WER: 23.0769%, LER: 9.02256%, slice WER: 23.0769%, slice LER: 9.02256%, decoded samples (thread 0): 1]

    |T|: the ingenious hawkeye who recalled the hasty manner in which the other had abandoned his post at the bedside of the sick woman was not without his suspicions concerning the subject of so much solemn deliberation |P|: d the ingenious hawk eye who recalled the hasty manner in which the other had abandon his post at the bedside of the sick woman was not without his suspicions concerning the subject of so much solemn deliberation [sample: test296, WER: 11.1111%, LER: 2.36967%, slice WER: 16.129%, slice LER: 4.94186%, decoded samples (thread 0): 2]

    |T|: john taylor who had supported her through college was interested in cotton |P|: e john tailor who had supported her few college was interested in cotton [sample: test441, WER: 25%, LER: 13.5135%, slice WER: 17.5676%, slice LER: 6.45933%, decoded samples (thread 0): 3]

    |T|: please forgive me for this underhanded way of admitting i had turned forty |P|: d please forgive me for this underhanded way of admitting at i had turned forty [sample: test2474, WER: 15.3846%, LER: 6.75676%, slice WER: 20.8791%, slice LER: 7.58755%, decoded samples (thread 0): 5]

    For Example For below output: |T|: yes dead these four years an a good job for her too |P|: d yes dead these four years and a good job for her too [sample: test1365, WER: 16.6667%, LER: 5.88235%, slice WER: 22.3022%, slice LER: 7.56972%, decoded samples (thread 0): 8]

    WER is 16.667% but if i remove the starting character 'd' WER will be 8.3%

    Any reasons why is this happening ?

  • AM not saving while running streaming_convnets

    AM not saving while running streaming_convnets

    Hi, I have followed the readme.md in stream convnets folder, it is running successfully, but it isn't saving the model after every epoch, nor is it throwing any of the model stats after every epoch, like how it generally comes (tried the recipe in tutorials/1-libri) Each epoch finishes almost instantly, the logs look something like this... can you tell me what im doing wrong. I had faced a similar issue when I tried the sota resnet example recipe, where after editing the right path locations in the cfg file, it doesn't throw any error and trains instantly without saving any model (.bin file) , I0331 09:08:02.674365 30153 Train.cpp:538] Epoch 999998 started! I0331 09:08:02.674376 30153 Train.cpp:531] Shuffling trainset I0331 09:08:02.674386 30153 Train.cpp:538] Epoch 999999 started! I0331 09:08:02.674399 30153 Train.cpp:531] Shuffling trainset I0331 09:08:02.674408 30153 Train.cpp:538] Epoch 1000000 started! I0331 09:08:02.674418 30153 Train.cpp:689] Finished training Thanks

  • SegFault when running tutorial

    SegFault when running tutorial

    I was able to build wav2letter++ from master & and upon running the Train command in this tutorial, the binary runs for a few seconds and exits with a Segmentation Fault & nothing else.

    Wish I had more info. Here is the system info:

    Google Cloud Platform Deep Learning VM Debian GNU/Linux 9.6 miniconda v4.5.11 Python 3.7.1 g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

    CUDA Version 10.0.130 CuDNN 7.4.1

  • Python bindings

    Python bindings

    I got an error message below when I try ~/wav2letter/bindings/python$ pip install -e .

    
      Running setup.py develop for wav2letter
        ERROR: Command errored out with exit status 1:
         command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/root/wa                                                                                                                        v2letter/bindings/python/setup.py'"'"'; __file__='"'"'/root/wav2letter/bindings/python/setup.py'"'"';                                                                                                                        f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"                                                                                                                        ');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
             cwd: /root/wav2letter/bindings/python/
        Complete output (36 lines):
        running develop
        running egg_info
        writing wav2letter.egg-info/PKG-INFO
        writing dependency_links to wav2letter.egg-info/dependency_links.txt
        writing top-level names to wav2letter.egg-info/top_level.txt
        reading manifest file 'wav2letter.egg-info/SOURCES.txt'
        writing manifest file 'wav2letter.egg-info/SOURCES.txt'
        running build_ext
        Error: could not load cache
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/root/wav2letter/bindings/python/setup.py", line 109, in <module>
            zip_safe=False,
          File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 144, in setup
            return distutils.core.setup(**attrs)
          File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
            dist.run_commands()
          File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/opt/conda/lib/python3.7/site-packages/setuptools/command/develop.py", line 38, in run
            self.install_for_development()
          File "/opt/conda/lib/python3.7/site-packages/setuptools/command/develop.py", line 140, in insta                                                                                                                        ll_for_development
            self.run_command('build_ext')
          File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/root/wav2letter/bindings/python/setup.py", line 48, in run
            self.build_extensions()
          File "/root/wav2letter/bindings/python/setup.py", line 91, in build_extensions
            ["cmake", "--build", "."] + build_args, cwd=self.build_temp
          File "/opt/conda/lib/python3.7/subprocess.py", line 363, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-                                                                                                                        j4']' returned non-zero exit status 1.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: /opt/conda/bin/python -c 'import sys, setuptools, toke                                                                                                                        nize; sys.argv[0] = '"'"'/root/wav2letter/bindings/python/setup.py'"'"'; __file__='"'"'/root/wav2lett                                                                                                                        er/bindings/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().re                                                                                                                        place('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop                                                                                                                         --no-deps Check the logs for full command output.
    

    How can I bind python? Plz help me.

  • Bump mako from 1.1.0 to 1.2.2 in /recipes/sota/2019/raw_lm_corpus

    Bump mako from 1.1.0 to 1.2.2 in /recipes/sota/2019/raw_lm_corpus

    Bumps mako from 1.1.0 to 1.2.2.

    Release notes

    Sourced from mako's releases.

    1.2.2

    Released: Mon Aug 29 2022

    bug

    • [bug] [lexer] Fixed issue in lexer where the regexp used to match tags would not correctly interpret quoted sections individually. While this parsing issue still produced the same expected tag structure later on, the mis-handling of quoted sections was also subject to a regexp crash if a tag had a large number of quotes within its quoted sections.

      References: #366

    1.2.1

    Released: Thu Jun 30 2022

    bug

    • [bug] [tests] Various fixes to the test suite in the area of exception message rendering to accommodate for variability in Python versions as well as Pygments.

      References: #360

    misc

    • [performance] Optimized some codepaths within the lexer/Python code generation process, improving performance for generation of templates prior to their being cached. Pull request courtesy Takuto Ikuta.

      References: #361

    1.2.0

    Released: Thu Mar 10 2022

    changed

    • [changed] [py3k] Corrected "universal wheel" directive in setup.cfg so that building a wheel does not target Python 2.

      References: #351

    • [changed] [py3k] The bytestring_passthrough template argument is removed, as this flag only applied to Python 2.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Invalid normalized Non-overlap LM corpus link

    Invalid normalized Non-overlap LM corpus link

    Question

    https://github.com/flashlight/wav2letter/tree/main/recipes/sota/2019#non-overlap-lm-corpus-librispeech-official-lm-corpus-excluded-the-data-from-librivox

    The above link doesn't release normalized corpus. Is it possible for you to release one?

  • 100% wer while training and no prediction on decoding

    100% wer while training and no prediction on decoding

    Hi,

    I am trying to train flashlight ASR for my native language. Before training on full dataset I was trying to set up everything on a very small dataset in Google Colab. The data I am using for setup purposes is just around 1 hour. I have taken architecture files from the wave2letter’s recipes folder. On using architecture file from conv_glu/wsj I run the training for 100 epochs but my WER always remains 100%. Then I tried decoding and it gives no output/prediction for any audio file.

    What things can possibly cause this or any suggestions. Do I need to train on more data or I have to change any hyperparameters. In past I have trained models on Kaldi and I know that once a model is trained it gives some text output even though it might be wrong/inaccurate.

    Thanks.

  • Get correct model code in multilingual ASR model notebook

    Get correct model code in multilingual ASR model notebook

    IMPORTANT: Please do not create a Pull Request without creating an issue first. Changes must be discussed.

    Original Issue: [corresponding issue on Github]

    Note: You can add closes #[issue number] to automatically close the issue that this PR resolves when it is merged.

    Summary

    [Explain the details for this change and the problem that the pull request solves]

    Test Plan (required)

    [steps by which you tested that your fix resolves the issue. These might include specific commands and configurations]

  • Repetitive output in ASR Inference Tutorial

    Repetitive output in ASR Inference Tutorial

    Question

    Repetitive output in ASR Inference Tutorial : I get repetitive outputs in ASR Inference tutorial. I assume these are the n best hypothesis. Is there a way to change this? image

    Additional Context

    I am using the following model:

    inference_cmd = """./flashlight/build/bin/asr/fl_asr_tutorial_inference_ctc --am_path=am_transformer_ctc_stride3_letters_300Mparams.bin --tokens_path=tokens.txt --lexicon_path=lexicon.txt --lm_path=lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin --logtostderr=true --sample_rate=16000 --beam_size=50 --beam_size_token=30 --beam_threshold=100 --lm_weight=1.5 --word_score=0 --stderrthreshold=1 --colorlogtostderr=true --audio_list=listfiles.lst"""

  • ArrayFire Exception (Device out of memory:101) while decoding using Beam search decoder

    ArrayFire Exception (Device out of memory:101) while decoding using Beam search decoder

    Bug Description

    I am trying to run inference on a 30 min audio file using Beam search but it fail immediately with the following error: terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Device out of memory:101): In function af_err af_matmul(void**, af_array, af_array, af_mat_prop, af_mat_prop) In file src/api/c/blas.cpp:237

    Reproduction Steps

    [Steps to reproduce. Please include, if needed for your issue: This is the command I run: !../../build/bin/asr/fl_asr_tutorial_inference_ctc \ --am_path=am_transformer_ctc_stride3_letters_300Mparams.bin \ --tokens_path=tokens.txt \ --lexicon_path=lexicon.txt \ --lm_path=lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin \ --logtostderr=true \ --sample_rate=16000 \ --beam_size=50 \ --beam_size_token=30 \ --beam_threshold=100 \ --lm_weight=1.5 \ --word_score=0 \ --audio_list=audio.lst

    this is the output: `I0406 17:32:38.839179 11262 InferenceCTC.cpp:66] Gflags after parsing --flagfile=;--fromenv=;--tryfromenv=;--undefok=;--tab_completion_columns=80;--tab_completion_word=;--help=false;--helpfull=false;--helpmatch=;--helpon=;--helppackage=false;--helpshort=false;--helpxml=false;--version=false;--am_path=am_transformer_ctc_stride3_letters_300Mparams.bin;--audio_list=audio.lst;--beam_size=50;--beam_size_token=30;--beam_threshold=100;--lexicon_path=lexicon.txt;--lm_path=lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin;--lm_weight=1.5;--sample_rate=16000;--tokens_path=tokens.txt;--word_score=0;--alsologtoemail=;--alsologtostderr=false;--colorlogtostderr=false;--drop_log_memory=true;--log_backtrace_at=;--log_dir=;--log_link=;--log_prefix=true;--logbuflevel=0;--logbufsecs=30;--logemaillevel=999;--logfile_mode=436;--logmailer=/bin/mail;--logtostderr=true;--max_log_size=1800;--minloglevel=0;--stderrthreshold=2;--stop_logging_if_full_disk=false;--symbolize_stacktrace=true;--v=0;--vmodule=;

    I0406 17:32:38.839272 11262 InferenceCTC.cpp:89] [Inference tutorial for CTC] Reading acoustic model from am_transformer_ctc_stride3_letters_300Mparams.bin I0406 17:32:40.061904 11262 InferenceCTC.cpp:140] [Inference tutorial for CTC] Network is loaded. I0406 17:32:40.433898 11262 InferenceCTC.cpp:152] [Inference tutorial for CTC] Number of classes/tokens in the network: 29 I0406 17:32:40.433930 11262 InferenceCTC.cpp:155] [Inference tutorial for CTC] Number of words in the lexicon: 200001 I0406 17:32:40.667213 11262 InferenceCTC.cpp:166] [Inference tutorial for CTC] Language model is constructed. I0406 17:32:41.800864 11262 InferenceCTC.cpp:177] [Inference tutorial for CTC] Trie is planted. I0406 17:32:41.801892 11262 InferenceCTC.cpp:196] [Inference tutorial for CTC] Beam search decoder is created

    terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Device out of memory:101): In function af_err af_matmul(void**, af_array, af_array, af_mat_prop, af_mat_prop) In file src/api/c/blas.cpp:237`

    Platform and Hardware

    [Please list your operating system, [GPU] hardware, compiler, and other details if relevant] I am running on CPU machine with 4 core and 8 gb ram

eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Sep 29, 2022
A small fast portable speech synthesis system

Flite is an open source small fast run-time text to speech engine. It is the latest addition to the suite of free software synthesis tools including University of Edinburgh's Festival Speech Synthesis System and Carnegie Mellon University's FestVox project, tools, scripts and documentation for building synthetic voices.

Sep 21, 2022
Let’s Create a Speech Synthesizer

Speech Synthesizer Series Material for my video series about creating a peculiar English-language speech synthesizer with Finnish accent. Playlist: ht

Sep 3, 2022
Linear predictive coding (LPC) is an algorithm used to approximate audio signals like human speech
Linear predictive coding (LPC) is an algorithm used to approximate audio signals like human speech

lpc.lv2 LPC analysis + synthesis plugin for LV2 About Linear predictive coding (LPC) is an algorithm used to approximate audio signals like human spee

May 2, 2022
Libsio - A runtime library for Speech Input (stt) & Output (tts)

libsio A runtime library for Speech Input (stt) & Output (tts) Speech To Text unified CTC and WFST decoding via beam search online(streaming) decoding

Sep 6, 2022
Automatic Volume Adjuster for TVs and Sound Systems
Automatic Volume Adjuster for TVs and Sound Systems

Automatic Volume Adjuster Recently Great Scott built his version of an automatic volume adjuster. In this project he solved one of the biggest problem

Jan 16, 2022
Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition.

Gesture Recognition Toolkit (GRT) The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for re

Sep 13, 2022
WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,
WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,

WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models, to reduce the effort of productionizing E2E models, and to explore better E2E models for production.

Oct 2, 2022
ESP32-CAM with LVGL Speech/Face Recognition IR Control
ESP32-CAM with LVGL  Speech/Face Recognition  IR Control

ESP_MASTER 视频介绍:https://www.bilibili.com/video/BV1SM4y1V7Yb This is a comprehensive project that combines the powerful computing capabilities of ESP32

Sep 26, 2022
Very portable voice recorder with speech recognition.

DictoFun Small wearable voice recorder. NRF52832-based. Concept Device was initiated after my frustration while using voice recorder for storing ideas

Feb 3, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

Oct 2, 2022
This speech synthesizer is actually the SAM speech synthesizer in an ESP8266

SSSSAM Serial Speech Synthesizer SAM This speech synthesizer is actually the SAM speech synthesizer in an ESP8266. Where SAM was a software applicatio

Jul 30, 2022
Automatic License Plate Recognition library

openalpr OpenALPR is an open source Automatic License Plate Recognition library written in C++ with bindings in C#, Java, Node.js, Go, and Python. The

Sep 27, 2022
Dec 19, 2021
Windows 10 interface adjustment tool supports automatic switching of light and dark modes, automatic switching of themes and transparent setting of taskbar
  Windows 10 interface adjustment tool supports automatic switching of light and dark modes, automatic switching of themes and transparent setting of taskbar

win10_tools Windows 10 interface adjustment tool supports automatic switching of light and dark modes, automatic switching of themes and transparent s

Dec 3, 2021
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Sep 24, 2022
An open-source C++ library developed and used at Facebook.

Folly: Facebook Open-source Library What is folly? Folly (acronymed loosely after Facebook Open Source Library) is a library of C++14 components desig

Sep 28, 2022
A continuation of FSund's pteron-keyboard project. Feel free to contribute, or use these files to make your own! Kits and PCBs are also available through my facebook page.
A continuation of FSund's pteron-keyboard project. Feel free to contribute, or use these files to make your own! Kits and PCBs are also available through my facebook page.

pteron-pcb Intro This project is the evolution of the Pteron-Keyboard project, an incredible ergonomic keyboard that was handwired only. I aimed to in

Aug 15, 2022
A Brute-Force Tool For Facebook Accounts
A Brute-Force Tool For Facebook Accounts

fblookup fblookup is a facebook password cracking tool written in C which allows you to hack every facebook account using a wordlist without any block

Aug 26, 2022