Brotli compression format

Brotli

SECURITY NOTE

Please consider updating brotli to version 1.0.9 (latest).

Version 1.0.9 contains a fix to "integer overflow" problem. This happens when "one-shot" decoding API is used (or input chunk for streaming API is not limited), input size (chunk size) is larger than 2GiB, and input contains uncompressed blocks. After the overflow happens, memcpy is invoked with a gigantic num value, that will likely cause the crash.

Introduction

Brotli is a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm, Huffman coding and 2nd order context modeling, with a compression ratio comparable to the best currently available general-purpose compression methods. It is similar in speed with deflate but offers more dense compression.

The specification of the Brotli Compressed Data Format is defined in RFC 7932.

Brotli is open-sourced under the MIT License, see the LICENSE file.

Brotli mailing list: https://groups.google.com/forum/#!forum/brotli

TravisCI Build Status AppVeyor Build Status Fuzzing Status

Build instructions

Vcpkg

You can download and install brotli using the vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install brotli

The brotli port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Autotools-style CMake

configure-cmake is an autotools-style configure script for CMake-based projects (not supported on Windows).

The basic commands to build, test and install brotli are:

$ mkdir out && cd out
$ ../configure-cmake
$ make
$ make test
$ make install

By default, debug binaries are built. To generate "release" Makefile specify --disable-debug option to configure-cmake.

Bazel

See Bazel

CMake

The basic commands to build and install brotli are:

$ mkdir out && cd out
$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./installed ..
$ cmake --build . --config Release --target install

You can use other CMake configuration.

Premake5

See Premake5

Python

To install the latest release of the Python module, run the following:

$ pip install brotli

To install the tip-of-the-tree version, run:

$ pip install --upgrade git+https://github.com/google/brotli

See the Python readme for more details on installing from source, development, and testing.

Benchmarks

Related projects

Disclaimer: Brotli authors take no responsibility for the third party projects mentioned in this section.

Independent decoder implementation by Mark Adler, based entirely on format specification.

JavaScript port of brotli decoder. Could be used directly via npm install brotli

Hand ported decoder / encoder in haxe by Dominik Homberger. Output source code: JavaScript, PHP, Python, Java and C#

7Zip plugin

Dart native bindings

Dart compression framework with fast FFI-based Brotli implementation with ready-to-use prebuilt binaries for Win/Linux/Mac

Owner
Google
Google ❤️ Open Source
Google
Comments
  • libbrotli

    libbrotli

    Hey,

    I created the sub project libbrotli (https://github.com/bagder/libbrotli) a while ago to help build and install a library for brotli encoding/decoding, using only code from this repository. libbrotli is only a meta-project with mostly autotools to build, install and package a "library" for brotli since this original brotli home does not provide that. It only uses compression/decompression source code from the brotli tree.

    This concept seems to resonate with a decent amount of users who appreciate being able to get a library (or two actually) out of a build and install process for use in various projects.

    I would prefer if this functionality was provided by the brotli project itself and I'll offer to merge/translate it over to a pull-request or something if you'll agree this is interesting. I'd prefer to remove myself as a middle man here.

  • add brotli to PyPI repository

    add brotli to PyPI repository

    It would be nice to add Brotli to the official Python Package Index, so that users can download it with a simple pip install brotli.

    We could add just the sdist tarball, or also some pre-compiled wheel packages for Windows and Mac platforms, maybe built automatically via Travis and/or AppVeyor -- like here

    /cc @khaledhosny

  • Ideas for API improvements

    Ideas for API improvements

    I recently rewrote Squash's brotli extension for the new API. IMHO the API is pretty nice, and I love that it's C not C++. That said, I do have a few ideas for improvements:

    • [x] For funcions which return 1 on success and 0 on failure, as well as other booleans (like the is_last argument to BrotliEncoderWriteMetaBlock), please use a bool (or _Bool if you prefer) intsead of an int. Using bool helps make the code more readable and reduce documentation lookups.
    • [x] The decoder puts the state argument at the end, the encoder puts it first. Please make it consistent. FWIW, I prefer the instance to be first (and that's definitely more common).
    • [x] I'd really like to see conformant array parameters. This would require a macro like the one Squash has, but it has the potential to help prevent bugs in software using the API, so I think it would be a good addition. I see you already put the length first, so the change is pretty trivial.
    • [x] It's a bit odd that the everything in the encoder is called BrotliEncoder*, but everything in the decoder is Brotli* (e.g., BrotliState not BrotliDecoderState). I think it would be better to move everything in the decoder to BrotliDecoder*
    • [x] Technically, stuff like _BROTLI_COMMA isn't allowed. C99 (at least, but IIRC C89 too). Anything which starts with an underscore follewed by an uppercase letter or another underscore. In C99, it's in § 7.1.3. I've taken to using an underscore suffix in my code (e.g., BROTLI_COMMA_) to indicate something is really supposed to be internal.
    • [ ] CamelCase for function names is fairly unusual in C. Switching to lowercase_with_underscores would probably be better; CamelCase is usually used for type names (including callbacks like brotli_alloc_func and brotli_free_func). Obviously there is no real standard, but anecdotally it seems like lowercase_with_underscores is the most common…
    • [ ] Add "zeroed-memory alloc" to memory allocator interface; might improve performance.
    • [ ] It would be nice to have 'restrict' on the buffers. Obviously it would also have to be hidden behind a macro (feel free to steal the one from Hedley)
    • [ ] A lot of parameters can/should be annotated with GCC's nonnull attribute (again, Hedley has a macro you can take). This is great for static analyzers, and if you build with ubsan you can get runtime warnings, too. The macro for this has to be variadic, so you might want to hide this behind a check for C99; variadic macros aren't in C89, but all the common compilers have supported them for a while (even MSVC, since VC8 (2005)).
    • [ ] BrotliEncoderMaxCompressedSize could/should be annotated with the const attribute (GCC ≥ 2.5) or noalias declspec (MSVC, since VC8). Again, Hedley. This one is helpful for optimizing compilers, though honestly I doubt excessive calls to BrotliEncoderMaxCompressedSize is a performance bottleneck.

    These are definitely not major issues, I just wanted to bring them up while changing the API is still an option.

  • Very poor compression ratio on TriMesh binary streams compared to LZMA

    Very poor compression ratio on TriMesh binary streams compared to LZMA

    I've long been a promonent of integrating LZMA2/LZMA into the browsers because of its incredible effectiveness for compressing binary data streams. When I saw Brotli I thought that this was likely going to be just as good. It isn't actually great.

    I am a frequent contributor to both http://ThreeJS.org as well as the http://Clara.io online 3D editor. One of the biggest issues we run into is the size of mesh downloads. Right now we are using LZMA.js scripts to do the decompression in worker threads, but this isn't optimal, especially on mobile.

    For example, this real-world large-ish binary trimesh stream, very typical:

    https://d3ijcvgxwtkjmf.cloudfront.net/a4c3c7313b7bdeb68ad46a7e1b761f38z?filename=object-53-batman-tumbler-lw8-12.bingeom

    The original size once downloaded is 6,779,000 bytes (be careful, this stream may be delivered with "Content-Encoding: gzip".)

    Here are the compression results:

    • LZMA
      • Normal: 921,600 bytes
      • Ultra: 920,147 bytes
    • GZip
      • Normal: 2,296,362 bytes
      • Ultra: 2,258,967 bytes
    • Brotli
      • Normal and Ultra: 1,513,459 bytes. (source)

    Brotli is significantly less effective that LZMA in this case -- not just a little but by a huge margin.

    What this means is that we can not replace our LZMA.js scripts with Brotli support. This is pretty bad for us in the 3D community as we are still stuck with JavaScript-based decompression.

  • BrotliDecoderDecompress() crashed if inbuf is part of mmaped file

    BrotliDecoderDecompress() crashed if inbuf is part of mmaped file

    If data points to mmaped file, the following crashed: BrotliDecoderDecompress(len,data+offset,&decoded_size,inBuffer);

    Adding extra copy fixed the issue: memcpy(testbuf,data+offset,len); BrotliDecoderDecompress(len,testbuf,&decoded_size,inBuffer);

    Note: in the first case, data+offset is not aligned. Is alignment needed for BrotliDecoderDecompress()?

  • Binary needed

    Binary needed

    Hi, if there is not going to be binary in 'release' section, could you share inhere how to compile using ICL? https://twitter.com/Sanmayce/status/965935926735196160

    For various reasons I avoid using 'make' and such, a command line compilist (to differentiate from 'compiler', heh-heh) here.

    The idea is we to have a command line tool like legendary PKZIP/PKUNZIP, I would use it on a daily basis, for example, currently I am running several big textual benchmarks:

    • https://github.com/powturbo/TurboBench/issues/10#issuecomment-367445792 To test the parsing prowess of 1GB window, this 900MB DNA set is quite good.
    • https://www.reddit.com/r/datasets/comments/7cise3/reddit_october_comments_are_now_available_with/dqaq9pv/ It is an understatement to call reddit test - big, it is quite literally tera - a teracorpus indeed!

    My wish is to set/present a roofline :P (as opposed to baseline) by using the full power of Brotli with 1GB, currently my testmachine 'Compressionette' (i5-7200u, 8GB DDR4) halved the teratask: zstd-v1.3.3-win64.exe -T2 -12 ... Wanna use bsc with 1024MB block, 7zip with 30bit window, and Zstd with 30bit window as well.

  • decompress brotli in browser?

    decompress brotli in browser?

    Is there a JavaScript Implementation of the decompressor part for in browser usage?

    I found https://github.com/devongovett/brotli.js and created https://github.com/devongovett/brotli.js/issues/2 ... but maybe someone else knows a decompress implementation?

  • Fix for future versions of setuptools

    Fix for future versions of setuptools

    Starting python 3.10, the use of - instead of _ will get a warn (see https://bugs.gentoo.org/796281 for reference)

    Signed-off-by: Marco Scardovi [email protected]

  • Use MSVC intrinsics in Log2FloorNonZero and FindMatchLengthWithLimit

    Use MSVC intrinsics in Log2FloorNonZero and FindMatchLengthWithLimit

    This PR finishes off the work started in https://github.com/google/brotli/pull/618

    Switched compiler intrinsic code to use more generic switches and added MSVC-specific versions of the clz and ctz calls where the GCC __builtin functions are used.

    Preliminary testing shows an almost 10% speed improvement for encoding on Windows x64.

  • Intel Compiler 18 Regression

    Intel Compiler 18 Regression

    Hi! I downloaded the latest 1.0.5 release, built it like so:

    cd brotli
    mkdir 32
    cd 32
    "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\bin\ipsxe-comp-vars.bat" ia32 vs2017
    cmake -G"Visual Studio 15 2017" -T"Intel C++ Compiler 18.0" ..
    msbuild /m /p:Configuration=Release ALL_BUILD.vcxproj
    

    In the latest version now BROTLI_INLINE is undefined, causing the compile to fail:

    screenshot 2018-06-29 19 16 30

    With the same commands above, 1.0.4 builds fine:

    screenshot 2018-06-29 19 21 13

  • Java implementation of Brotli

    Java implementation of Brotli

    Right now the only way to use Brolti in the backend is to use the Java binding for Brotli (JNI). It is not always possible to run native code in backend. Could Google release a pure Java implementation of Brotli to facilitate its adoption. I am mostly interested in the decompression part.

  • Add support for -r (recursive) flag in the Linux CLI

    Add support for -r (recursive) flag in the Linux CLI

    I'm pre-zipping static content before bundling it in a Docker container. This allows me to serve static content gzipped if there's an accept-encoding header that contains gzip without putting extra load on the CPU when serving gzipped content. gzip -rk9 ./build/processedResources/jvm/main/static/* || exit

    I wanted to add support for Brotli if the accept-encoding header contains br and then serve whichever one is smaller, file.ext.br or file.ext.gz brotli -rk9 ./build/processedResources/jvm/main/static/* || exit

    Turns out -k and -9 does exactly the same as gzip, but the -r flag is not supported which caught me by surprise.

    Short-term workaround is to use find ... | args ... | brotli -k9 after I've extensively studied the find and xargs man pages, long-term, it would be nice if I can just do -r and save the hassle of using 3 commands to do one thing.

  • Can't compile Wget2 with static libraries only

    Can't compile Wget2 with static libraries only

    If try to compile wget with static libraries only on macOS, getting error:

    /Library/Developer/CommandLineTools/usr/bin/ranlib: file: .libs/libwget_decompress.a(xsize.o) has no symbols CCLD test_linking_encoding clang: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Undefined symbols for architecture x86_64: "_BrotliDefaultAllocFunc", referenced from: _BrotliDecoderStateInit in libbrotlidec.a(state.c.o) "_BrotliDefaultFreeFunc", referenced from: _BrotliDecoderStateInit in libbrotlidec.a(state.c.o) "_BrotliGetDictionary", referenced from: _BrotliDecoderStateInit in libbrotlidec.a(state.c.o) "_BrotliGetTransforms", referenced from: _BrotliDecoderStateInit in libbrotlidec.a(state.c.o) "_BrotliTransformDictionaryWord", referenced from: _ProcessCommands in libbrotlidec.a(decode.c.o) _SafeProcessCommands in libbrotlidec.a(decode.c.o) "__kBrotliContextLookupTable", referenced from: _BrotliDecoderDecompressStream in libbrotlidec.a(decode.c.o) _SafeDecodeLiteralBlockSwitch in libbrotlidec.a(decode.c.o) _DecodeLiteralBlockSwitch in libbrotlidec.a(decode.c.o) "__kBrotliPrefixCodeRanges", referenced from: _BrotliDecoderDecompressStream in libbrotlidec.a(decode.c.o) _SafeDecodeCommandBlockSwitch in libbrotlidec.a(decode.c.o) _DecodeCommandBlockSwitch in libbrotlidec.a(decode.c.o) _SafeDecodeLiteralBlockSwitch in libbrotlidec.a(decode.c.o) _DecodeLiteralBlockSwitch in libbrotlidec.a(decode.c.o) _SafeDecodeDistanceBlockSwitch in libbrotlidec.a(decode.c.o) _DecodeDistanceBlockSwitch in libbrotlidec.a(decode.c.o) ... ld: symbol(s) not found for architecture x86_64

    Can compile fine if lib folder contains dynamic and static libraries. But wget don't compile brotli libraries staticly, but relay on dynamic libraries.

  • Running the test suite with the autotools build system

    Running the test suite with the autotools build system

    Is it intended to be able to run the test suite when using the undocumented autotools build system?

    After running ./bootstrap and ./configure and make I ran make check but got:

    make: Nothing to be done for `check'.
    

    and trying make test I got:

    make: *** No rule to make target `test'.  Stop.
    

    I see that there is a tests directory with a Makefile in it, but I tried make -C tests test and got:

    make[1]: `brotli' is up to date.
    ./compatibility_test.sh
    Testing decompression of file tests/testdata/*.compressed*
    bin/tmp/*.uncompressed
    ./compatibility_test.sh: line 19: bin/brotli: No such file or directory
    make: *** [test] Error 127
    

    If I use this patch then make -C tests test works:

    --- tests/compatibility_test.sh.orig	2020-08-27 09:12:55.000000000 -0500
    +++ tests/compatibility_test.sh	2022-05-07 08:01:14.000000000 -0500
    @@ -7,10 +7,13 @@
     
     set -o errexit
     
    +cd "$(dirname "${BASH_SOURCE[0]}")/.."
    +
     BROTLI_WRAPPER=$1
    -BROTLI="${BROTLI_WRAPPER} bin/brotli"
    -TMP_DIR=bin/tmp
    +BROTLI="${BROTLI_WRAPPER} ./brotli"
    +TMP_DIR=tmp
     
    +mkdir -p "$TMP_DIR"
     for file in tests/testdata/*.compressed*; do
       echo "Testing decompression of file $file"
       expected=${file%.compressed*}
    --- tests/roundtrip_test.sh.orig	2020-08-27 09:12:55.000000000 -0500
    +++ tests/roundtrip_test.sh	2022-05-07 08:01:14.000000000 -0500
    @@ -6,9 +6,11 @@
     
     set -o errexit
     
    +cd "$(dirname "${BASH_SOURCE[0]}")/.."
    +
     BROTLI_WRAPPER=$1
    -BROTLI="${BROTLI_WRAPPER} bin/brotli"
    -TMP_DIR=bin/tmp
    +BROTLI="${BROTLI_WRAPPER} ./brotli"
    +TMP_DIR=tmp
     INPUTS="""
     tests/testdata/alice29.txt
     tests/testdata/asyoulik.txt
    @@ -19,6 +21,7 @@
     c/dec/decode.c
     """
     
    +mkdir -p "$TMP_DIR"
     for file in $INPUTS; do
       if [ -f $file ]; then
         for quality in 1 6 9 11; do
    
    make[1]: `brotli' is up to date.
    ./compatibility_test.sh
    Testing decompression of file tests/testdata/empty.compressed
    tmp/empty.uncompressed
    Testing decompression of file tests/testdata/ukkonooa.compressed
    tmp/ukkonooa.uncompressed
    ./roundtrip_test.sh
    Roundtrip testing c/enc/encode.c at quality 1
    Roundtrip testing c/enc/encode.c at quality 6
    Roundtrip testing c/enc/encode.c at quality 9
    Roundtrip testing c/enc/encode.c at quality 11
    Roundtrip testing c/common/dictionary.h at quality 1
    Roundtrip testing c/common/dictionary.h at quality 6
    Roundtrip testing c/common/dictionary.h at quality 9
    Roundtrip testing c/common/dictionary.h at quality 11
    Roundtrip testing c/dec/decode.c at quality 1
    Roundtrip testing c/dec/decode.c at quality 6
    Roundtrip testing c/dec/decode.c at quality 9
    Roundtrip testing c/dec/decode.c at quality 11
    

    however I don't know if that change should be necessary or if I'm missing something.

    Maybe what I'm missing is that tests/Makefile is intended to work with the non-autotools Makefile that brotli ships with? If so, maybe instead of modifying tests/Makefile the check target should get some new code so that make check does something equivalent?

  • Autotools build system does not set OS_ defines

    Autotools build system does not set OS_ defines

    The cmake build system sets some custom defines for various OS types in CMakeLists.txt

    if(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
      add_definitions(-DOS_LINUX)
    elseif(${CMAKE_SYSTEM_NAME} MATCHES "FreeBSD")
      add_definitions(-DOS_FREEBSD)
    elseif(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
      add_definitions(-DOS_MACOSX)
    endif()
    

    The undocumented autotools build system does not. Perhaps it should?

  • gcc_libinit_windows.c: In function 'x_cgo_sys_thread_create':

    gcc_libinit_windows.c: In function 'x_cgo_sys_thread_create':

    gcc_libinit_windows.c:58:19: error: implicit declaration of function '_beginthread'; did you mean 'OpenThread'? [-Werror=implicit-function-declaration] 58 | thandle = _beginthread(func, 0, arg); | ^~~~~~~~~~~~ | OpenThread cc1: all warnings being treated as errors

Related tags
Extremely Fast Compression algorithm

LZ4 - Extremely fast compression LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU

Jun 24, 2022
LZFSE compression library and command line tool

LZFSE This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9. LZFSE is a Lempel-

Jun 12, 2022
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Jun 15, 2022
Zstandard - Fast real-time compression algorithm
Zstandard - Fast real-time compression algorithm

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better comp

Jun 17, 2022
A massively spiffy yet delicately unobtrusive compression library.

ZLIB DATA COMPRESSION LIBRARY zlib 1.2.11 is a general purpose data compression library. All the code is thread safe. The data format used by the z

Jun 17, 2022
Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++

LZHAM - Lossless Data Compression Codec Public Domain (see LICENSE) LZHAM is a lossless data compression codec written in C/C++ (specifically C++03),

Jun 15, 2022
A bespoke sample compression codec for 64k intros
A bespoke sample compression codec for 64k intros

pulsejet A bespoke sample compression codec for 64K intros codec pulsejet lifts a lot of ideas from Opus, and more specifically, its CELT layer, which

Apr 6, 2022
A variation CredBandit that uses compression to reduce the size of the data that must be trasnmitted.

compressedCredBandit compressedCredBandit is a modified version of anthemtotheego's proof of concept Beacon Object File (BOF). This version does all t

Apr 9, 2022
Data compression utility for minimalist demoscene programs.

bzpack Bzpack is a data compression utility which targets retrocomputing and demoscene enthusiasts. Given the artificially imposed size limits on prog

Apr 8, 2022
A simple C library implementing the compression algorithm for isosceles triangles.

orvaenting Summary A simple C library implementing the compression algorithm for isosceles triangles. License This project's license is GPL 2 (as of J

Apr 1, 2022
gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

Apr 27, 2022
Advanced DXTc texture compression and transcoding library

crunch/crnlib v1.04 - Advanced DXTn texture compression library Public Domain - Please see license.txt. Portions of this software make use of public d

Jun 9, 2022
Better lossless compression than PNG with a simpler algorithm

Zpng Small experimental lossless photographic image compression library with a C API and command-line interface. It's much faster than PNG and compres

May 27, 2022
Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.

slow5tools Slow5tools is a simple toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format. Abou

Jun 15, 2022
Runtime Archiver plugin for Unreal Engine. Cross-platform archiving and unarchiving directories and files. Currently supports ZIP format.

Runtime Archiver Archiving and dearchiving directories and files Explore the docs » Marketplace . Releases . Support Chat Features Fast speed Easy arc

May 25, 2022
Brotli compression format

SECURITY NOTE Please consider updating brotli to version 1.0.9 (latest). Version 1.0.9 contains a fix to "integer overflow" problem. This happens when

Jun 22, 2022
Brotli compression format

SECURITY NOTE Please consider updating brotli to version 1.0.9 (latest). Version 1.0.9 contains a fix to "integer overflow" problem. This happens when

Jun 24, 2022
Analysing and implementation of lossless data compression techniques like Huffman encoding and LZW was conducted along with JPEG lossy compression technique based on discrete cosine transform (DCT) for Image compression.

PROJECT FILE COMPRESSION ALGORITHMS - Huffman compression LZW compression DCT Aim of the project - Implement above mentioned compression algorithms an

Dec 14, 2021
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

Jun 20, 2022
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

Jun 14, 2022