This is the repo that hosts the code for Mozilla's translation service

Translation service

HTTP service that uses bergamot-translator and compressed neural machine translation models for fast inference on CPU.

Running locally

  1. git clone this repo
  2. make setup-models
  3. make build-docker
  4. make run
  5. make call

Calling the service

$ curl --header "Content-Type: application/json" \
      --request POST \
      --data '{"from":"es", "to":"en", "text": "Hola Mundo"}' \
      http://0.0.0.0:8080/v1/translate
> {"result": "Hello World"}

Service configuration

Directory that contains models ('esen', 'ende' etc.) should be mounted to /models in Docker container.

Environment variables to set in container:

PORT - service port (default is 8000)

LOGGING_LEVEL - ERROR, WARNING, INFO or DEBUG (default is INFO)

WORKERS - number of bergamot-translator workers (default is 1). 0 - automatically set as number of available CPUs. It is recommended to minimize workers and scale horizontaly with k8s means.

Testing

make python-env - install pip packages

make test - to run integration API tests

make load-test - to run a stress test (requires more models to download that unit tests)

Owner
Mozilla
This technology could fall into the right hands.
Mozilla
Comments
  • Segfault on loading of some models

    Segfault on loading of some models

    It happens on loading of bgen, enbg, nben, nnen. All other models are loaded correctly. Update of moz-bergamot-translator module didn't help.

    docker run --name translation-service -it --rm -v $(pwd)/tmp:/models -p 8080:8080 -e PORT=8080 translation-service
    [2022-02-23 20:42:24] [data] Loading SentencePiece vocabulary from file /models/bgen/vocab.bgen.spm
    [2022-02-23 20:42:24] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file.
    [2022-02-23 20:42:24] [data] Loading binary shortlist as /models/bgen/lex.50.50.bgen.s2t.bin true
    [2022-02-23 20:42:24] [data] Lexical short list firstNum 100 and bestNum 100
    [2022-02-23 20:42:24] [memory] Extending reserved space to 128 MB (device cpu0)
    [2022-02-23 20:42:24] Loaded model config
    [2022-02-23 20:42:24] Loading scorer of type transformer as feature F0
    [2022-02-23 20:42:24] [memory] Reserving 31 MB, device cpu0
    [2022-02-23 20:42:24] [memory] Reserving 8 MB, device cpu0
    Model bgen is loaded
    [2022-02-23 20:42:24] [data] Loading SentencePiece vocabulary from file /models/enbg/vocab.bgen.spm
    [2022-02-23 20:42:24] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file.
    [2022-02-23 20:42:24] [data] Loading binary shortlist as /models/enbg/lex.50.50.enbg.s2t.bin true
    [2022-02-23 20:42:24] [data] Lexical short list firstNum 100 and bestNum 100
    [2022-02-23 20:42:24] Error: Error: shortlist indices are out of bounds
    [2022-02-23 20:42:24] Error: Aborted from void marian::data::BinaryShortlistGenerator::contentCheck() in /app/3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/data/shortlist.cpp:160
    
    [CALL STACK]
    [0x562d1c41c68c]                                                       + 0x2ad68c
    [0x562d1c41d968]                                                       + 0x2ae968
    [0x562d1c41e25b]                                                       + 0x2af25b
    [0x562d1c41fcef]                                                       + 0x2b0cef
    [0x562d1c2e8514]                                                       + 0x179514
    [0x562d1c2e42ad]                                                       + 0x1752ad
    [0x562d1c2c6f0c]                                                       + 0x157f0c
    [0x562d1c289037]                                                       + 0x11a037
    [0x562d1c2663ae]                                                       + 0xf73ae
    [0x7fa9b709f0b3]    __libc_start_main                                  + 0xf3
    [0x562d1c286ace]                                                       + 0x117ace
    
    [2022-02-23 20:42:24] Error: Segmentation fault
    [2022-02-23 20:42:24] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /app/3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/common/logging.cpp:134
    
    [CALL STACK]
    [0x562d1c34ee60]                                                       + 0x1dfe60
    [0x562d1c34f0af]                                                       + 0x1e00af
    [0x7fa9b75cb3c0]                                                       + 0x153c0
    [0x7fa9b709d941]    abort                                              + 0x213
    [0x562d1c22714d]                                                       + 0xb814d
    [0x562d1c41d968]                                                       + 0x2ae968
    [0x562d1c41e25b]                                                       + 0x2af25b
    [0x562d1c41fcef]                                                       + 0x2b0cef
    [0x562d1c2e8514]                                                       + 0x179514
    [0x562d1c2e42ad]                                                       + 0x1752ad
    [0x562d1c2c6f0c]                                                       + 0x157f0c
    [0x562d1c289037]                                                       + 0x11a037
    [0x562d1c2663ae]                                                       + 0xf73ae
    [0x7fa9b709f0b3]    __libc_start_main                                  + 0xf3
    [0x562d1c286ace]                                                       + 0x117ace
    
    make: *** [run] Error 127
    
  • Server will not start - Aborted (core dumped)

    Server will not start - Aborted (core dumped)

    Hi!

    I tried to give this server a try. I followed the instructions to build the Docker image which worked well, but then when I try to start the server, I get the following error:

    docker run --name translation-service -it --rm -v $(pwd)/models:/models -p 8000:8000 -e LOGGING_LEVEL=DEBUG translation-service
    
    Looking for models in "/models/uken"
    Adding file trgvocab.uken.spm
    Adding file srcvocab.uken.spm
    Adding file model.uken.intgemm8.bin
    Adding file lex.uken.s2t.bin
    Building models config for uken
    Aborted (core dumped)
    

    I didn't see any errors in the console when building the docker image. I get the same error when I run the latest image from Docker hub mozilla/translation-service.

    The problem seems to be the uken / enuk models - if I remove those folders from the model directory the server appears work well !

    Probably this is fixed in PR #22 ?

  • Fix failing on some models

    Fix failing on some models

    fixes #18 fixes #19

    • Fixed translation config
    • Updated submodules

    Tried to use gemm-precision: int8shiftAlphaAll instead of int8shift and observed how quality decreased:

    enbg

    int8shift Hello world -> Здравей свят (exact translation according to google translate)

    int8shiftAlphaAll Hello world -> Здравей за света (reversed translation is "Hello to the world" according to google translate)

    enru

    int8shift How are you? -> Как дела?

    int8shiftAlphaAll How are you? -> Как дела? Как дела? (it just doubles the phrase which is especially bad)

    So leaving int8shift, not sure if there is much difference in speed, didn't benchmark.

    Does anybody know, are there any side effects of using int8shift instead of int8shiftAlphaAll? @kpu @XapaJIaMnu @jerinphilip

  • enpl enbg nben causing

    enpl enbg nben causing "Error: shortlist indices are out of bounds"

    Worked well via Docker but I had to delete enpl enbg nben from /models/prod/ first otherwise I'd get e.g

    [2022-06-07 20:16:32] [data] Loading SentencePiece vocabulary from file /models/nben/vocab.nben.spm [2022-06-07 20:16:32] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file. [2022-06-07 20:16:32] [data] Loading binary shortlist as /models/nben/lex.50.50.nben.s2t.bin true [2022-06-07 20:16:32] [data] Lexical short list firstNum 50 and bestNum 50 [2022-06-07 20:16:32] Error: Error: shortlist indices are out of bounds [2022-06-07 20:16:32] Error: Aborted from void marian::data::BinaryShortlistGenerator::contentCheck() in /app/3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/data/shortlist.cpp:160

  • Charset isues with russian?

    Charset isues with russian?

    STR:

    • Install and setup the service following the README
    • Try to translate to russiain:
    macbookpro:translation-service anatal$ curl -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8"--header "Content-Type: application/json"       --request POST       --data '{"from":"es", "to":"ru", "text": "Buenos días"}'       http://0.0.0.0:8080/v1/translate
    curl: (3) URL using bad/illegal format or missing URL
    {"result":"\u00d0\u00a4\u00d0\u00ce\u00d0\u00c1\u00e1\u0080\u00d0\u00ce\u00d0\u00c5 \u00e1\u0093\u00e1\u0092\u00e1\u0080\u00d0\u00ce"}
    
  • Tests and small improvements

    Tests and small improvements

    • load tests
    • unit tests
    • test models downloading
    • service configuration
    • implemented some dockerflow guidelines

    integration tests are not included in CI yet (next pr)

    fixes #3

  • CI

    CI

    Docker build works, we should merge it to test pushing to Docker hub from the main branch. The project is configured to use my personal user account for now. It is a matter of editing environment variables in circle ci project settings.

    fixes #4

  • Dockerhub image doesn't run on MBP with Apple Silicon

    Dockerhub image doesn't run on MBP with Apple Silicon

    docker run --name translation-service -it --rm -v $(pwd)/firefox-translations-models/models/prod:/models -p 8080:8080 -e PORT=8080 mozilla/translation-service:latest
    WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
    qemu: uncaught target signal 4 (Illegal instruction) - core dumped
    
PoC tool to coerce Windows hosts to authenticate to other machines via MS-EFSRPC EfsRpcOpenFileRaw or other functions.
PoC tool to coerce Windows hosts to authenticate to other machines via MS-EFSRPC EfsRpcOpenFileRaw or other functions.

PetitPotam PoC tool to coerce Windows hosts to authenticate to other machines via MS-EFSRPC EfsRpcOpenFileRaw or other functions :) The tools use the

Jan 4, 2023
Playbit System interface defines an OS-like computing platform which can be implemented on a wide range of hosts

PlaySys The Playbit System interface PlaySys defines an OS-like computing platform which can be implemented on a wide range of hosts like Linux, BSD,

Dec 1, 2022
English Translation Mod for Air Nintendo Switch version

AIR-ENX English translation mod for Nintendo Switch version of "Air" 1.0.1 Current status: Alpha Chapters translation status: Dream 100% Summer 100% A

Sep 6, 2022
Translation layer from ANARI to OSPRay, ANARILibrary and ANARIDevice "ospray".

ANARI-OSPRay Translation layer from Khronos® ANARI™ to Intel® OSPRay: ANARILibrary and ANARIDevice "ospray". Status This is an experimental project, v

Dec 30, 2022
Webusb-libusb - Translation layer from libusb to webusb.

webusb-libusb IMPORTANT: This implementation requires a patched version of Emscripten to work properly. This project is a translation layer from libus

Dec 9, 2022
Unofficial upload of ChinesePython, a translation of the Python programming language in Chinese [Provided by UrduPython engineers]

# Downloaded from SourceForge: https://sourceforge.net/projects/chinesepython/ # (Uploaded as is) ---------------------------------------------------

Feb 12, 2022
A d3d9 and d3d10 to d3d11 translation layer.
A d3d9 and d3d10 to d3d11 translation layer.

DXUP A D3D9 and D3D10 -> D3D11 Translation Layer Get latest build here or tagged builds here. What's the point? The main reason is for DXVK, a D3D11->

Dec 18, 2022
New lateral movement technique by abusing Windows Perception Simulation Service to achieve DLL hijacking code execution.
New lateral movement technique by abusing Windows Perception Simulation Service to achieve DLL hijacking code execution.

BOF - Lateral movement technique by abusing Windows Perception Simulation Service to achieve DLL hijacking ServiceMove is a POC code for an interestin

Nov 14, 2022
This is Script tools from all attack Denial of service by C programming

RemaxDos Paltfrom Attack RemaxDos This is Script tools from all attack Denial of service Remax Box Team !. Features ! Cam overflow Syn Flooding. Smurf

Sep 11, 2022
Basic Windows Service managment API

SvcManager Basic Windows Service managment API A simple C++ Windows Service management API built my me. To be honest, I havent committed anything in a

Sep 8, 2022
Cloud-native high-performance edge/middle/service proxy
Cloud-native high-performance edge/middle/service proxy

Cloud-native high-performance edge/middle/service proxy Envoy is hosted by the Cloud Native Computing Foundation (CNCF). If you are a company that wan

Jan 9, 2023
An example spatial lookup service. In-memory reverse geocode backed by GEOS.

Spatial Lookup Web Service This GEOS example program demonstrates the use of the STRtree index and PreparedGeometry to create a high-performance in-me

Dec 23, 2022
Determine if the WebClient Service (WebDAV) is running on a remote system

GetWebDAVStatus Small project to determine if the Web Client service (WebDAV) is running on a remote system by checking for the presence of the DAV RP

Nov 28, 2022
A basic, MQTT integration point service for the Waveshare 8 channel relay board

relayboard-control A basic, MQTT integration point service for the Waveshare 8 channel relay board. This was built specifically for our own home's rel

Mar 22, 2022
ServiceLocator - Service Locator Pattern Header-Only Library

Service Locator Very fast, header-only C++ Service Locator Pattern library What is the Service Locator Pattern The Service Locator Pattern is a design

Feb 21, 2022
Implements a Windows service (in a DLL) that removes the rounded corners for windows in Windows 11

ep_dwm Implements a Windows service that removes the rounded corners for windows in Windows 11. Tested on Windows 11 build 22000.434. Pre-compiled bin

Dec 29, 2022
Not related to software bugs and exploits; this repo contains snippets of code that demonstrate some interesting functionality or a handy trick.

Proof-of-Concept Not related to software bugs and exploits; this repo contains snippets of code that demonstrate some interesting functionality or a h

Nov 19, 2022
Code repo for infos and demos on the DaFit Magic 3 Smartwatch

Magic3_DaFit Code repo for infos and demos on the DaFit Magic 3 Smartwatch Demos: Magic3_Display_test_by_atc1441 = Minimal full buffer Display example

Dec 28, 2021