Libsio - A runtime library for Speech Input (stt) & Output (tts)

libsio

A runtime library for Speech Input (stt) & Output (tts)

Speech To Text

  • unified CTC and WFST decoding via beam search
  • online(streaming) decoding
  • lattice-free on the fly rescoring with arbitrary language models, e.g.:
    • CTC + external LM
    • WFST - (lookahead LM) + (big LM)
    • E2E - (estimated internal LM) + (external LM)
    • one or more domain-specific LMs on top of base-LM
    • flexible contextual biasing
  • streaming + lattice free -> low latency
  • modular design with potentials to deploy models from various speech toolkits such as Kaldi, K2, Espnet, SpeechBrain, WeNet.

Text To Speech

Long term plan, won't happen soon.

Zen

  • Keep it simple. All softwares are fated to fight against complexity.
  • Keep it small. Codes are debts, less code means less interests to pay.
  • Keep it explicit. Hidden behaviors behind non-obvious codes are evil.
  • Keep it local. Things should be organized locally for easy understanding.
  • Keep it concrete. Unnecessary abstractions bring cognitive costs, abstract only when it is absolutely necessary.

License

TBD

Status:

  • Under heavy development, I won't suggest watching this repo because the commit notices might be annoying.
  • I haven't provide a good enough pretrained model yet for users to play with. Before releasing any working model, I would rather focus on core runtime functionalities first. You may read the code and compile it though.
Similar Resources

A command line and keyboard based strategy-game written in c++, where audio-input determines the AI-strategy and lays the seed for the map-generation.

A command line and keyboard based strategy-game written in c++, where audio-input determines the AI-strategy and lays the seed for the map-generation.

Table of contents Dissonance Premise Installation Requirements Installation Quick-guide Detailed installation guide Usage Logfiles Tests Uninstall Kno

Mar 1, 2022

A simple C++ library for reading and writing audio files.

AudioFile A simple header-only C++ library for reading and writing audio files. Current supported formats: WAV AIFF Author AudioFile is written and ma

May 17, 2022

A C library for reading and writing sound files containing sampled audio data.

libsndfile libsndfile is a C library for reading and writing files containing sampled audio data. Authors The libsndfile project was originally develo

May 13, 2022

C++ Audio and Music DSP Library

_____ _____ ___ __ _ _____ __ __ __ ____ ____ / \\_ \\ \/ / |/ \| | | | \_ \/ \ | Y Y \/ /_ \ | | Y Y \ | |_|

May 17, 2022

Single file audio playback and capture library written in C.

Single file audio playback and capture library written in C.

A single file library for audio playback and capture. Example - Documentation - Supported Platforms - Backends - Major Features - Building - Unofficia

May 18, 2022

Minimalistic MP3 decoder single header library

minimp3 Minimalistic, single-header library for decoding MP3. minimp3 is designed to be small, fast (with SSE and NEON support), and accurate (ISO con

May 11, 2022

🎵 Music notation engraving library for MEI with MusicXML and Humdrum support and various toolkits (JavaScript, Python)

🎵 Music notation engraving library for MEI with MusicXML and Humdrum support and various toolkits (JavaScript, Python)

Verovio is a fast, portable and lightweight library for engraving Music Encoding Initiative (MEI) digital scores into SVG images. Verovio also contain

May 11, 2022

highly efficient sound library for the Gameboy Advance

The Apex Audio System (AAS) is a sound library for the GBA. It includes a highly efficient mixer, MOD playing routines and support for up to 16 channels. It is designed for developers using a GCC-based development environment. AAS uses RAW, WAV or *tracker 1-16 channel MOD files as input.

May 11, 2022

A lightweight music DSP library.

Soundpipe Soundpipe is a lightweight music DSP library written in C. It aims to provide a set of high-quality DSP modules for composers, sound designe

Dec 14, 2021
Related tags
M5Core2 VoiceText TTS

M5Core2_VoiceText_TTS M5Stack Core2で、HOYA社が提供するVoiceText Web APIサービスを使った音声合成(TTS)を動かすテストプログラムです。 M5Core2_VoiceText_TTSは、kghrlaboさんのesp32_text_to_speec

Jan 7, 2022
C library for cross-platform real-time audio input and output

libsoundio C library providing cross-platform audio input and output. The API is suitable for real-time software such as digital audio workstations as

May 20, 2022
Input-overlay - Show keyboard, gamepad and mouse input on stream
Input-overlay - Show keyboard, gamepad and mouse input on stream

Show keyboard, mouse and gamepad input on stream. Available for OBS Studio 19.0.3+ on windows (32bit/64bit) and linux (64bit). Head over to releases f

May 19, 2022
Facebook AI Research's Automatic Speech Recognition Toolkit

wav2letter++ Important Note: wav2letter has been moved and consolidated into Flashlight in the ASR application. Future wav2letter development will occ

May 18, 2022
eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

May 12, 2022
A small fast portable speech synthesis system

Flite is an open source small fast run-time text to speech engine. It is the latest addition to the suite of free software synthesis tools including University of Edinburgh's Festival Speech Synthesis System and Carnegie Mellon University's FestVox project, tools, scripts and documentation for building synthetic voices.

May 11, 2022
Let’s Create a Speech Synthesizer

Speech Synthesizer Series Material for my video series about creating a peculiar English-language speech synthesizer with Finnish accent. Playlist: ht

Apr 6, 2022
Linear predictive coding (LPC) is an algorithm used to approximate audio signals like human speech
Linear predictive coding (LPC) is an algorithm used to approximate audio signals like human speech

lpc.lv2 LPC analysis + synthesis plugin for LV2 About Linear predictive coding (LPC) is an algorithm used to approximate audio signals like human spee

May 2, 2022
By controlling the frequency at which the output Pins of MSP430 are turned off and on, we can make music.
By controlling the frequency at which the output Pins of MSP430 are turned off and on, we can make music.

By controlling the frequency at which the output Pins of MSP430 are turned off and on, we can make music.

Nov 9, 2021