Brotli compression format

Brotli

SECURITY NOTE

Please consider updating brotli to version 1.0.9 (latest).

Version 1.0.9 contains a fix to "integer overflow" problem. This happens when "one-shot" decoding API is used (or input chunk for streaming API is not limited), input size (chunk size) is larger than 2GiB, and input contains uncompressed blocks. After the overflow happens, memcpy is invoked with a gigantic num value, that will likely cause the crash.

Introduction

Brotli is a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm, Huffman coding and 2nd order context modeling, with a compression ratio comparable to the best currently available general-purpose compression methods. It is similar in speed with deflate but offers more dense compression.

The specification of the Brotli Compressed Data Format is defined in RFC 7932.

Brotli is open-sourced under the MIT License, see the LICENSE file.

Brotli mailing list: https://groups.google.com/forum/#!forum/brotli

TravisCI Build Status AppVeyor Build Status Fuzzing Status

Build instructions

Vcpkg

You can download and install brotli using the vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install brotli

The brotli port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Autotools-style CMake

configure-cmake is an autotools-style configure script for CMake-based projects (not supported on Windows).

The basic commands to build, test and install brotli are:

$ mkdir out && cd out
$ ../configure-cmake
$ make
$ make test
$ make install

By default, debug binaries are built. To generate "release" Makefile specify --disable-debug option to configure-cmake.

Bazel

See Bazel

CMake

The basic commands to build and install brotli are:

$ mkdir out && cd out
$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./installed ..
$ cmake --build . --config Release --target install

You can use other CMake configuration.

Premake5

See Premake5

Python

To install the latest release of the Python module, run the following:

$ pip install brotli

To install the tip-of-the-tree version, run:

$ pip install --upgrade git+https://github.com/google/brotli

See the Python readme for more details on installing from source, development, and testing.

Benchmarks

Related projects

Disclaimer: Brotli authors take no responsibility for the third party projects mentioned in this section.

Independent decoder implementation by Mark Adler, based entirely on format specification.

JavaScript port of brotli decoder. Could be used directly via npm install brotli

Hand ported decoder / encoder in haxe by Dominik Homberger. Output source code: JavaScript, PHP, Python, Java and C#

7Zip plugin

Dart native bindings

Dart compression framework with fast FFI-based Brotli implementation with ready-to-use prebuilt binaries for Win/Linux/Mac

Owner
Google
Google ❤️ Open Source
Google
Comments
  • libbrotli

    libbrotli

    Hey,

    I created the sub project libbrotli (https://github.com/bagder/libbrotli) a while ago to help build and install a library for brotli encoding/decoding, using only code from this repository. libbrotli is only a meta-project with mostly autotools to build, install and package a "library" for brotli since this original brotli home does not provide that. It only uses compression/decompression source code from the brotli tree.

    This concept seems to resonate with a decent amount of users who appreciate being able to get a library (or two actually) out of a build and install process for use in various projects.

    I would prefer if this functionality was provided by the brotli project itself and I'll offer to merge/translate it over to a pull-request or something if you'll agree this is interesting. I'd prefer to remove myself as a middle man here.

  • add brotli to PyPI repository

    add brotli to PyPI repository

    It would be nice to add Brotli to the official Python Package Index, so that users can download it with a simple pip install brotli.

    We could add just the sdist tarball, or also some pre-compiled wheel packages for Windows and Mac platforms, maybe built automatically via Travis and/or AppVeyor -- like here

    /cc @khaledhosny

  • Ideas for API improvements

    Ideas for API improvements

    I recently rewrote Squash's brotli extension for the new API. IMHO the API is pretty nice, and I love that it's C not C++. That said, I do have a few ideas for improvements:

    • [x] For funcions which return 1 on success and 0 on failure, as well as other booleans (like the is_last argument to BrotliEncoderWriteMetaBlock), please use a bool (or _Bool if you prefer) intsead of an int. Using bool helps make the code more readable and reduce documentation lookups.
    • [x] The decoder puts the state argument at the end, the encoder puts it first. Please make it consistent. FWIW, I prefer the instance to be first (and that's definitely more common).
    • [x] I'd really like to see conformant array parameters. This would require a macro like the one Squash has, but it has the potential to help prevent bugs in software using the API, so I think it would be a good addition. I see you already put the length first, so the change is pretty trivial.
    • [x] It's a bit odd that the everything in the encoder is called BrotliEncoder*, but everything in the decoder is Brotli* (e.g., BrotliState not BrotliDecoderState). I think it would be better to move everything in the decoder to BrotliDecoder*
    • [x] Technically, stuff like _BROTLI_COMMA isn't allowed. C99 (at least, but IIRC C89 too). Anything which starts with an underscore follewed by an uppercase letter or another underscore. In C99, it's in § 7.1.3. I've taken to using an underscore suffix in my code (e.g., BROTLI_COMMA_) to indicate something is really supposed to be internal.
    • [ ] CamelCase for function names is fairly unusual in C. Switching to lowercase_with_underscores would probably be better; CamelCase is usually used for type names (including callbacks like brotli_alloc_func and brotli_free_func). Obviously there is no real standard, but anecdotally it seems like lowercase_with_underscores is the most common…
    • [ ] Add "zeroed-memory alloc" to memory allocator interface; might improve performance.
    • [ ] It would be nice to have 'restrict' on the buffers. Obviously it would also have to be hidden behind a macro (feel free to steal the one from Hedley)
    • [ ] A lot of parameters can/should be annotated with GCC's nonnull attribute (again, Hedley has a macro you can take). This is great for static analyzers, and if you build with ubsan you can get runtime warnings, too. The macro for this has to be variadic, so you might want to hide this behind a check for C99; variadic macros aren't in C89, but all the common compilers have supported them for a while (even MSVC, since VC8 (2005)).
    • [ ] BrotliEncoderMaxCompressedSize could/should be annotated with the const attribute (GCC ≥ 2.5) or noalias declspec (MSVC, since VC8). Again, Hedley. This one is helpful for optimizing compilers, though honestly I doubt excessive calls to BrotliEncoderMaxCompressedSize is a performance bottleneck.

    These are definitely not major issues, I just wanted to bring them up while changing the API is still an option.

  • Very poor compression ratio on TriMesh binary streams compared to LZMA

    Very poor compression ratio on TriMesh binary streams compared to LZMA

    I've long been a promonent of integrating LZMA2/LZMA into the browsers because of its incredible effectiveness for compressing binary data streams. When I saw Brotli I thought that this was likely going to be just as good. It isn't actually great.

    I am a frequent contributor to both http://ThreeJS.org as well as the http://Clara.io online 3D editor. One of the biggest issues we run into is the size of mesh downloads. Right now we are using LZMA.js scripts to do the decompression in worker threads, but this isn't optimal, especially on mobile.

    For example, this real-world large-ish binary trimesh stream, very typical:

    https://d3ijcvgxwtkjmf.cloudfront.net/a4c3c7313b7bdeb68ad46a7e1b761f38z?filename=object-53-batman-tumbler-lw8-12.bingeom

    The original size once downloaded is 6,779,000 bytes (be careful, this stream may be delivered with "Content-Encoding: gzip".)

    Here are the compression results:

    • LZMA
      • Normal: 921,600 bytes
      • Ultra: 920,147 bytes
    • GZip
      • Normal: 2,296,362 bytes
      • Ultra: 2,258,967 bytes
    • Brotli
      • Normal and Ultra: 1,513,459 bytes. (source)

    Brotli is significantly less effective that LZMA in this case -- not just a little but by a huge margin.

    What this means is that we can not replace our LZMA.js scripts with Brotli support. This is pretty bad for us in the 3D community as we are still stuck with JavaScript-based decompression.

  • BrotliDecoderDecompress() crashed if inbuf is part of mmaped file

    BrotliDecoderDecompress() crashed if inbuf is part of mmaped file

    If data points to mmaped file, the following crashed: BrotliDecoderDecompress(len,data+offset,&decoded_size,inBuffer);

    Adding extra copy fixed the issue: memcpy(testbuf,data+offset,len); BrotliDecoderDecompress(len,testbuf,&decoded_size,inBuffer);

    Note: in the first case, data+offset is not aligned. Is alignment needed for BrotliDecoderDecompress()?

  • Binary needed

    Binary needed

    Hi, if there is not going to be binary in 'release' section, could you share inhere how to compile using ICL? https://twitter.com/Sanmayce/status/965935926735196160

    For various reasons I avoid using 'make' and such, a command line compilist (to differentiate from 'compiler', heh-heh) here.

    The idea is we to have a command line tool like legendary PKZIP/PKUNZIP, I would use it on a daily basis, for example, currently I am running several big textual benchmarks:

    • https://github.com/powturbo/TurboBench/issues/10#issuecomment-367445792 To test the parsing prowess of 1GB window, this 900MB DNA set is quite good.
    • https://www.reddit.com/r/datasets/comments/7cise3/reddit_october_comments_are_now_available_with/dqaq9pv/ It is an understatement to call reddit test - big, it is quite literally tera - a teracorpus indeed!

    My wish is to set/present a roofline :P (as opposed to baseline) by using the full power of Brotli with 1GB, currently my testmachine 'Compressionette' (i5-7200u, 8GB DDR4) halved the teratask: zstd-v1.3.3-win64.exe -T2 -12 ... Wanna use bsc with 1024MB block, 7zip with 30bit window, and Zstd with 30bit window as well.

  • decompress brotli in browser?

    decompress brotli in browser?

    Is there a JavaScript Implementation of the decompressor part for in browser usage?

    I found https://github.com/devongovett/brotli.js and created https://github.com/devongovett/brotli.js/issues/2 ... but maybe someone else knows a decompress implementation?

  • Fix for future versions of setuptools

    Fix for future versions of setuptools

    Starting python 3.10, the use of - instead of _ will get a warn (see https://bugs.gentoo.org/796281 for reference)

    Signed-off-by: Marco Scardovi [email protected]

  • Use MSVC intrinsics in Log2FloorNonZero and FindMatchLengthWithLimit

    Use MSVC intrinsics in Log2FloorNonZero and FindMatchLengthWithLimit

    This PR finishes off the work started in https://github.com/google/brotli/pull/618

    Switched compiler intrinsic code to use more generic switches and added MSVC-specific versions of the clz and ctz calls where the GCC __builtin functions are used.

    Preliminary testing shows an almost 10% speed improvement for encoding on Windows x64.

  • Intel Compiler 18 Regression

    Intel Compiler 18 Regression

    Hi! I downloaded the latest 1.0.5 release, built it like so:

    cd brotli
    mkdir 32
    cd 32
    "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\bin\ipsxe-comp-vars.bat" ia32 vs2017
    cmake -G"Visual Studio 15 2017" -T"Intel C++ Compiler 18.0" ..
    msbuild /m /p:Configuration=Release ALL_BUILD.vcxproj
    

    In the latest version now BROTLI_INLINE is undefined, causing the compile to fail:

    screenshot 2018-06-29 19 16 30

    With the same commands above, 1.0.4 builds fine:

    screenshot 2018-06-29 19 21 13

  • Java implementation of Brotli

    Java implementation of Brotli

    Right now the only way to use Brolti in the backend is to use the Java binding for Brotli (JNI). It is not always possible to run native code in backend. Could Google release a pure Java implementation of Brotli to facilitate its adoption. I am mostly interested in the decompression part.

  • Possible Integer overflow

    Possible Integer overflow

    Hi,

    When I look at the source code of brotli/c/enc/compress_fragment_two_pass.c line 463, it occurs to me that the insert might encounter integer overflow. As the variable "insert" is equal to (uint32_t)(ip_end - next_emit), the ip_end and next_emit is a pointer which might be 64 bit in 64 bit machine. In this case, the ip_end - next_emit might overflow to a small number when ip_end - next_emit is larger than the 32 bit max value. This might cause the memcpy to copy only partial value of the source "next_emit" to the dest "*literals".

  • include Windows executable

    include Windows executable

    Please include a Windows executable in the distribution here on GitHub. It appears that the last version to include a Windows executable was v1.0.4, four years ago. It is my understanding from your readme that this version contained vulnerabilities.

  • BROTLI_OPERATION_FLUSH and  BROTLI_OPERATION_PROCESS related

    BROTLI_OPERATION_FLUSH and BROTLI_OPERATION_PROCESS related

    I want use the brotli compress with python, but I find in the BrotliCompress\brotli\python\ _brotli.cc, when the BrotliEncoderOperation is set to BROTLI_OPERATION_FLUSH, there is no input string as an argument.

    BROTLI_OPERATION_PROCESS

    static PyObject* brotli_Compressor_process(brotli_Compressor self, PyObject *args) { PyObject ret = NULL; std::vector<uint8_t> output; Py_buffer input; BROTLI_BOOL ok = BROTLI_TRUE;

    BROTLI_OPERATION_FLUSH

    static PyObject* brotli_Compressor_flush(brotli_Compressor *self) { PyObject *ret = NULL; std::vector<uint8_t> output; BROTLI_BOOL ok = BROTLI_TRUE;

    I want to know what is the difference when BrotliEncoderOperation is set to BROTLI_OPERATION_FLUSH and BROTLI_OPERATION_PROCESS when compressing data?

  • `pip install brotli` return a deprecation warning

    `pip install brotli` return a deprecation warning

    For an install (here on a raspberrypi) of brotli 1.0.9 I got a warning from pip which says:

    DEPRECATION: brotli is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559

    • raspberrypi3 with raspbian 10
    • python-3.10.8
    • pip 22.3
  • How to use shared dictionary?

    How to use shared dictionary?

    Hello everyone,

    I see the code for shared dictionaries has been merged, is there a way to use them via the command line? Is there a tutorial somewhere on how to use them from C? (or golang or rust)

    Related to #658

Related tags
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

Nov 16, 2022
Superfast compression library

DENSITY Superfast compression library DENSITY is a free C99, open-source, BSD licensed compression library. It is focused on high-speed compression, a

Nov 24, 2022
data compression library for embedded/real-time systems

heatshrink A data compression/decompression library for embedded/real-time systems. Key Features: Low memory usage (as low as 50 bytes) It is useful f

Nov 21, 2022
Heavily optimized zlib compression algorithm

Optimized version of longest_match for zlib Summary Fast zlib longest_match function. Produces slightly smaller compressed files for significantly fas

Oct 29, 2022
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Nov 10, 2022
Compression abstraction library and utilities

Squash - Compresion Abstraction Library

Nov 22, 2022
Fastest Integer Compression
Fastest Integer Compression

TurboPFor: Fastest Integer Compression TurboPFor: The new synonym for "integer compression" ?? (2019.11) ALL functions now available for 64 bits ARMv8

Nov 17, 2022
Brotli compression format

SECURITY NOTE Please consider updating brotli to version 1.0.9 (latest). Version 1.0.9 contains a fix to "integer overflow" problem. This happens when

Nov 16, 2022
Brotli compression format

SECURITY NOTE Please consider updating brotli to version 1.0.9 (latest). Version 1.0.9 contains a fix to "integer overflow" problem. This happens when

Nov 25, 2022
Analysing and implementation of lossless data compression techniques like Huffman encoding and LZW was conducted along with JPEG lossy compression technique based on discrete cosine transform (DCT) for Image compression.

PROJECT FILE COMPRESSION ALGORITHMS - Huffman compression LZW compression DCT Aim of the project - Implement above mentioned compression algorithms an

Dec 14, 2021
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

Nov 16, 2022
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

Nov 16, 2022
The “Quite OK Image” format for fast, lossless image compression

The “Quite OK Image” format for fast, lossless image compression

Nov 19, 2022
QOY - The "Quite OK YCbCr420A" format for fast, lossless image compression

QOY - The "Quite OK YCbCr420A" format for fast, lossless* image compression ( * colorspace conversion to/from RGBA is lossy, if used ) Single-file MIT

Oct 1, 2022
The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.
The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.

OpenEXR OpenEXR provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the mot

Nov 21, 2022
(Simple String Format) is an syntax of format and a library for parse this.

SSFMT (Simple String Format) is an syntax of format and a library for parse this. SSFMT != {fmt} SSFMT is NOT an API/library for parse {fmt} syntax !

Jan 30, 2022
Extremely Fast Compression algorithm

LZ4 - Extremely fast compression LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU

Nov 19, 2022
LZFSE compression library and command line tool

LZFSE This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9. LZFSE is a Lempel-

Nov 18, 2022
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Nov 18, 2022
Zstandard - Fast real-time compression algorithm
Zstandard - Fast real-time compression algorithm

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better comp

Nov 21, 2022