Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.

Flashlight Text: Fast, Lightweight Utilities for Text

Quickstart | Installation | Python Documentation | Citing

CircleCI Join the chat at https://gitter.im/flashlight-ml/community codecov

Flashlight Text is a fast, minimal library for text-based operations. It features:

Quickstart

Flashlight Text has Python bindings for decoder and Dictionary components. To install the bindings from source, install KenLM, then clone the repo and build:

git clone https://github.com/flashlight/text && cd text
cd bindings/python
python3 setup.py install

To install without KenLM, set the environment variable USE_KENLM=0 when running setup.py.

See the full Python binding documentation for examples and more.

Building and Installing

From Source (C++) | From Source (Python) | Adding to Your Own Project (C++)

Requirements

At minimum, compilation requires:

  • A C++ compiler with good C++17 support (e.g. gcc/g++ >= 7)
  • CMake — version 3.10 or later, and make
  • A Linux-based operating system.

KenLM Support: If building with KenLM support, KenLM is required. To toggle KenLM support use the FL_TEXT_USE_KENLM CMake option or the USE_KENLM environment variable when building the Python bindings.

Tests: If building tests, Google Test >= 1.10 is required. The FL_TEXT_BUILD_TESTS CMake option toggles building tests.

Instructions for building/installing the Python bindings from source can be found here.

Building from Source

Building the C++ project from source is simple:

git clone https://github.com/flashlight/text && cd flashlight
mkdir build && cd build
cmake ..
make -j$(nproc)
make test    # run tests
make install # install at the CMAKE_INSTALL_PREFIX

To disable KenLM while building, pass -DFL_TEXT_USE_KENLM=OFF to CMake. To disable building tests, pass -DFL_TEXT_BUILD_TESTS=OFF.

KenLM can be downloaded and installed automatically if not found on the local system. The FL_TEXT_BUILD_STANDALONE option controls this behavior — if disabled, dependencies won't be downloaded and built when building.

Adding Flashlight Text to a C++ Project

Given a simple project.cpp file that includes and links to Flashlight Text:

#include <iostream>

#include <flashlight/lib/text/dictionary/Dictionary.h>

int main() {
  fl::lib::text::Dictionary myDict("someFile.dict");
  std::cout << "Dictionary has " << myDict.entrySize()
            << " entries."  << std::endl;
 return 0;
}

The following CMake configuration links Flashlight and sets include directories:

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

add_executable(myProject project.cpp)

find_package(flashlight-text CONFIG REQUIRED)
target_link_libraries(myProject PRIVATE flashlight::flashlight-text)

Contributing and Contact

Contact: [email protected]

Flashlight Text is actively developed. See CONTRIBUTING for more on how to help out.

Citing

You can cite Flashlight using:

@misc{kahn2022flashlight,
      title={Flashlight: Enabling Innovation in Tools for Machine Learning},
      author={Jacob Kahn and Vineel Pratap and Tatiana Likhomanenko and Qiantong Xu and Awni Hannun and Jeff Cai and Paden Tomasello and Ann Lee and Edouard Grave and Gilad Avidov and Benoit Steiner and Vitaliy Liptchinsky and Gabriel Synnaeve and Ronan Collobert},
      year={2022},
      eprint={2201.12465},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

Flashlight Text is under an MIT license. See LICENSE for more information.

Owner
A C++ standalone library for machine learning.
null
Similar Resources

Upgraded from Pixar postcard path tracing, instead of printing Pixar, this program print my name, which pose to be more challenging than the original code. The upgraded is also more readable and run 9x faster than the original source code.

Upgraded from Pixar postcard path tracing, instead of printing Pixar, this program print my name, which pose to be more challenging than the original code. The upgraded is also more readable and run 9x faster than the original source code.

SDF-Sphere-Tracing Upgraded from Pixar postcard path tracing, instead of printing Pixar, this program print my name, which pose to be more challenging

Jun 21, 2022

3D scanning is becoming more and more ubiquitous.

Welcome to the MeshLib! 3D scanning is becoming more and more ubiquitous. Robotic automation, self-driving cars and multitude of other industrial, med

Jun 17, 2022

A guide that teach you build a custom version of Chrome / Electron on macOS / Windows / Linux that supports hardware / software HEVC decoding.

enable-chromium-hevc-hardware-decoding A guide that teach you build a custom version of Chrome / Electron on macOS / Windows / Linux that supports har

Jun 15, 2022

Cross-platform tool to extract wavetables and draw envelopes from sample files, exporting the wavetable and generating the appropriate SFZ text to use in a suitable player.

Cross-platform tool to extract wavetables and draw envelopes from sample files, exporting the wavetable and generating the appropriate SFZ text to use in a suitable player.

wextract Cross-platform tool to extract wavetables and draw envelopes from sample files, exporting the wavetable and generating the appropriate SFZ te

Jan 5, 2022

LLVM IR and optimizer for shaders, including front-end adapters for GLSL and SPIR-V and back-end adapter for GLSL

Licensing LunarGLASS is available via a three clause BSD-style open source license. Goals The primary goals of the LunarGLASS project are: Reduce the

Jun 18, 2022

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

Jun 20, 2022

Firmware for DMR transceivers using the NXP MK22 MCU, AT1846S RF chip and HR-C6000 DMR chipset. Including the Radioddiy GD-77, Baofeng DM-1801 and Baofeng RD-5R.

OpenGD77 Firmware for DMR transceivers using the NXP MK22 MCU, AT1846S RF chip and HR-C6000 DMR chipset. This includes the Radioddiy GD-77, Radioddity

Jun 18, 2022

By putting in a lot of speed, the speed sequence is sorted and divided, three types of speed interval distribution maps are generated.(including broken line graph,histogram and curve graph)

Auto-drawing-speed-range-map By putting in a lot of speed, the speed sequence is sorted and divided, three types of speed interval distribution maps a

May 14, 2022

The home for algorithms ranging from searching to search all the way to dynamic programming, branch and bound, etc.

Algorithms The home for algorithms ranging from searching and sorting all the way to dynamic programming algorithms, divide and conquer, etc. What are

Dec 6, 2021
Comments
  • Add Codecov

    Add Codecov

    Add codecov build; rename codecov flag; set standalone CI build to test with standalone disabled

    Test plan: CI + https://app.codecov.io/gh/flashlight/text

  • Fix gcc 12 build

    Fix gcc 12 build

    Summary: torchaudio builders report https://github.com/pytorch/audio/issues/2445 with gcc 12. Fix it upstream and add a CI baseline for gcc 12

    Differential Revision: D36952141

  • CI Baselines

    CI Baselines

    Add CircleCI baselines on Ubuntu. macOS and MSVC baselines coming soon.

    Add baselines across {[static, shared] x [KenLM, no KenLM] x [Python bindings, no bindings]}.

Related tags
This is the laplight software for enabling flashlight support on a laptop/netbook. For the specification, see: https://github.com/LapLight/

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Dec 8, 2021
Typesense is a fast, typo-tolerant search engine for building delightful search experiences.
 Typesense is a fast, typo-tolerant search engine for building delightful search experiences.

Fast, typo tolerant, fuzzy search engine for building delightful search experiences ⚡ ??

Jun 17, 2022
Read file to console, automatically recognize file encoding, include ansi, utf16le, utf16be, utf8. Currently output ansi as gbk for chinese text search.

rgpre A tool for rg --pre. Read file to console, automatically recognize file encoding, include ansi, utf16le, utf16be, utf8. Currently output ansi as

Mar 18, 2022
A collection of valorant cheating codes, including offsets, world to screen and much more!

Valorant External Cheating Help Always up to date Valorant Offsets + a wide variety of noob friendly helper functions. Functions are not heaviky teste

Jun 12, 2022
Typewriter Effect with Rich Text + *Correct* Text Wrapping
Typewriter Effect with Rich Text + *Correct* Text Wrapping

Typewriter Effect with Rich Text + Correct Text Wrapping I've spent way too long getting this right. This is meant as a base class for a UMG dialogue

May 27, 2022
Text - A spicy text library for C++ that has the explicit goal of enabling the entire ecosystem to share in proper forward progress towards a bright Unicode future.

ztd.text Because if text works well in two of the most popular systems programming languages, the entire world over can start to benefit properly. Thi

Jun 8, 2022
Simple text editor in C++ - Simple editor built upon kilo editor.

GUMBO editor Simple editor built upon kilo editor. Still big work in progress although this is just fun side project to learn more C/C++. From 0.0.2->

Sep 15, 2021
A collection of DLLs that use search order hijacking to automatically inject specified DLLs.

?? Koaloader ?? A collection of DLLs that use search order hijacking to automatically inject specified DLLs. ?? Usage Simply place one of the proxy dl

Jun 11, 2022
Decoding light morse code with a light dependent resistor and Arduino board
Decoding light morse code with a light dependent resistor and Arduino board

Morse decoder The project's idea is very simple, the Arduino program has the responsibility to upload the sensor's data to the USB serial port.

Mar 12, 2022
⛵ The missing small and fast image decoding library for humans (not for machines).
⛵ The missing small and fast image decoding library for humans (not for machines).

Squirrel Abstract Image Library The missing fast and easy-to-use image decoding library for humans (not for machines). Target Audience • Features • Im

Jun 13, 2022