An implementation of the MessagePack serialization format in C / msgpack.org[C]

CMP

Build Status Coverage Status

CMP is a C implementation of the MessagePack serialization format. It currently implements version 5 of the MessagePack Spec.

CMP's goal is to be lightweight and straightforward, forcing nothing on the programmer.

License

While I'm a big believer in the GPL, I license CMP under the MIT license.

Example Usage

The following examples use a file as the backend, and are modeled after the examples included with the msgpack-c project.

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "cmp.h"

static bool read_bytes(void *data, size_t sz, FILE *fh) {
    return fread(data, sizeof(uint8_t), sz, fh) == (sz * sizeof(uint8_t));
}

static bool file_reader(cmp_ctx_t *ctx, void *data, size_t limit) {
    return read_bytes(data, limit, (FILE *)ctx->buf);
}

static bool file_skipper(cmp_ctx_t *ctx, size_t count) {
    return fseek((FILE *)ctx->buf, count, SEEK_CUR);
}

static size_t file_writer(cmp_ctx_t *ctx, const void *data, size_t count) {
    return fwrite(data, sizeof(uint8_t), count, (FILE *)ctx->buf);
}

static void error_and_exit(const char *msg) {
    fprintf(stderr, "%s\n\n", msg);
    exit(EXIT_FAILURE);
}

int main(void) {
    FILE *fh = NULL;
    cmp_ctx_t cmp = {0};
    uint32_t array_size = 0;
    uint32_t str_size = 0;
    char hello[6] = {0};
    char message_pack[12] = {0};

    fh = fopen("cmp_data.dat", "w+b");

    if (fh == NULL) {
        error_and_exit("Error opening data.dat");
    }

    cmp_init(&cmp, fh, file_reader, file_skipper, file_writer);

    if (!cmp_write_array(&cmp, 2)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    if (!cmp_write_str(&cmp, "Hello", 5)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    if (!cmp_write_str(&cmp, "MessagePack", 11)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    rewind(fh);

    if (!cmp_read_array(&cmp, &array_size)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    /* You can read the str byte size and then read str bytes... */

    if (!cmp_read_str_size(&cmp, &str_size)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    if (str_size > (sizeof(hello) - 1)) {
        error_and_exit("Packed 'hello' length too long\n");
    }

    if (!read_bytes(hello, str_size, fh)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    /*
     * ...or you can set the maximum number of bytes to read and do it all in
     * one call
     */

    str_size = sizeof(message_pack);
    if (!cmp_read_str(&cmp, message_pack, &str_size)) {
        error_and_exit(cmp_strerror(&cmp));
    }

    printf("Array Length: %u.\n", array_size);
    printf("[\"%s\", \"%s\"]\n", hello, message_pack);

    fclose(fh);

    return EXIT_SUCCESS;
}

Advanced Usage

See the examples folder.

Fast, Lightweight, Flexible, and Robust

CMP uses no internal buffers; conversions, encoding and decoding are done on the fly.

CMP's source and header file together are ~4k LOC.

CMP makes no heap allocations.

CMP uses standardized types rather than declaring its own, and it depends only on stdbool.h, stdint.h and string.h.

CMP is written using C89 (ANSI C), aside, of course, from its use of fixed-width integer types and bool.

On the other hand, CMP's test suite requires C99.

CMP only requires the programmer supply a read function, a write function, and an optional skip function. In this way, the programmer can use CMP on memory, files, sockets, etc.

CMP is portable. It uses fixed-width integer types, and checks the endianness of the machine at runtime before swapping bytes (MessagePack is big-endian).

CMP provides a fairly comprehensive error reporting mechanism modeled after errno and strerror.

CMP is thread aware; while contexts cannot be shared between threads, each thread may use its own context freely.

CMP is tested using the MessagePack test suite as well as a large set of custom test cases. Its small test program is compiled with clang using -Wall -Werror -Wextra ... along with several other flags, and generates no compilation errors in either clang or GCC.

CMP's source is written as readably as possible, using explicit, descriptive variable names and a consistent, clear style.

CMP's source is written to be as secure as possible. Its testing suite checks for invalid values, and data is always treated as suspect before it passes validation.

CMP's API is designed to be clear, convenient and unsurprising. Strings are null-terminated, binary data is not, error codes are clear, and so on.

CMP provides optional backwards compatibility for use with other MessagePack implementations that only implement version 4 of the spec.

Building

There is no build system for CMP. The programmer can drop cmp.c and cmp.h in their source tree and modify as necessary. No special compiler settings are required to build it, and it generates no compilation errors in either clang or gcc.

Versioning

CMP's versions are single integers. I don't use semantic versioning because I don't guarantee that any version is completely compatible with any other. In general, semantic versioning provides a false sense of security. You should be evaluating compatibility yourself, not relying on some stranger's versioning convention.

Stability

I only guarantee stability for versions released on the releases page. While rare, both master and develop branches may have errors or mismatched versions.

Backwards Compatibility

Version 4 of the MessagePack spec has no BIN type, and provides no STR8 marker. In order to remain backwards compatible with version 4 of MessagePack, do the following:

Avoid these functions:

  • cmp_write_bin
  • cmp_write_bin_marker
  • cmp_write_str8_marker
  • cmp_write_str8
  • cmp_write_bin8_marker
  • cmp_write_bin8
  • cmp_write_bin16_marker
  • cmp_write_bin16
  • cmp_write_bin32_marker
  • cmp_write_bin32

Use these functions in lieu of their v5 counterparts:

  • cmp_write_str_marker_v4 instead of cmp_write_str_marker
  • cmp_write_str_v4 instead of cmp_write_str
  • cmp_write_object_v4 instead of cmp_write_object

Disabling Floating Point Operations

Thanks to tdragon it's possible to disable floating point operations in CMP by defining CMP_NO_FLOAT. No floating point functionality will be included. Fair warning: this changes the ABI.

Setting Endianness at Compile Time

CMP will honor WORDS_BIGENDIAN. If defined to 0 it will convert data to/from little-endian format when writing/reading. If defined to 1 it won't. If not defined, CMP will check at runtime.

Comments
  • Enable Travis-CI.org to catch compiler issues (C89, future gcc warnings etc)

    Enable Travis-CI.org to catch compiler issues (C89, future gcc warnings etc)

    Hello again,

    thanks for all the recent patches and improvements.

    RE: c89 support https://github.com/camgunz/cmp/issues/36

    How can we prevent this (and other compiler gotchas) in the future? I don't know your workflow but I can guess you might be a busy guy.

    I propose setting up https://travis-ci.org for this project that that will compile camgunz/cmp in a few ways (-std=c89 -Wal -Wextra -Werr) and maybe with Clang and it's static analyzer and beep if warnings occur.

    I can set this up as a fork+PR but in the end you'll have to be comfortable with travis-ci.org having access to your repo.

    Regards,

    n

  • Add Skipping

    Add Skipping

    So #3 proposed adding skipping to CMP, and there's some discussion there.

    After trying to implement a version, it looks like the only way to fully implement this is by creating a SAX-style state machine.

    "WHY!?", you might ask. Well I'll tell you, hopefully I'm wrong.

    It's not a backend-support problem. We can add an optional skip callback, and CMP can just set an error whenever cmp_skip_object is called on a context where that callback is NULL.

    The problem is nested arrays and maps. The naive approach is to just have cmp_skip_object recursively call itself, but that leaves CMP open to stack overflow attacks via specifically-crafted data. I absolutely will not do that.

    The alternative is to have a bunch of state in cmp_ctx_t itself and use the heap. There are a few downsides to this:

    • It makes heap allocations, and it's doing this for every nested array/map. This is, therefore, a vector for memory exhaustion given specifically-crafted data.
    • It's a complicated little state machine, in contrast with the rest of CMP, which is extremely clear.
    • It would cause CMP to depend on the C Standard Library, and thus would require a build system. Not that that's a big deal, but it's still a definite paradigm shift.

    I vote against adding skipping to CMP. To use the example in #3 of an RPC server, let's say you're getting MessagePack data as a stream and that's your CMP backend (error handling omitted):

    uint32_t map_size;
    
    cmp_read_map(&cmp, &map_size)
    
    for (uint32_t i = 0; i < map_size; i++) {
        char *arg_name;
        uint32_t str_size;
    
        cmp_read_str_size(&cmp, &str_size);
        read_netstream_string(&arg_name, str_size);
    
        if (!rpc_arg_is_valid(arg_name)) {
            cmp_read_str_size(&cmp, &str_size);
            skip_netstream_bytes(str_size);
        }
    }
    

    This is pretty simple, and the only thing that would be different if CMP added cmp_skip_next_object is skip_netstream_bytes(str_size) is replaced with cmp_skip_next_object(&cmp).

    The problem is that cmp_skip_next_object might have to skip a map containing 5000 other arrays, each containing 5000 arrays, each containing 5000 arrays that each contain 5000 entries of the Gettysburg Address. Skipping an unwanted string is much simpler than skipping the next object, whatever it might be. Furthermore, skipping can most easily be handled using backend API's designed specifically for that; CMP can add no value there. Therefore, I think adding skipping to CMP is out of scope.


    That said, I'm always open to arguments! :) If this is a feature you're really needing and you've got a cool idea on how to do it, I'm absolutely happy to work on it (or, even better, merge a PR ;) ). I just think it's not feasible.

  • The library is too perfect ;)

    The library is too perfect ;)

    Yes... thats the issue, ok, let me explain: The library is fully fulfilling the new spec, while most other implementations (Perl, JavaScript, to be specific) just support the old types. Which is something you dont notice AS LONG you never have a STR8 ... cause thats literally the only case where the stuff crash (And there are is no BIN support in the old spec, but that is not those implementations use it at all).

    There is actually a very easy simple trick here to fix that, the library just needs to ignore STR8 for the packing and directly make STR16 instead. (2 comments, a define?) Out of the fact that I think many people will run into this problem if they mix the implementations (its REALLY not an obvious problem, took me 1-2 days to find out), i think it would be awesome if by default that would be part of the library (I can make the pull request, i mean its trivial), the question would only be how in my eyes (with define or different), but i think its crucial that the default behaviour just fits into "both ways".... the people still can enforce STR8 and they can still unpack STR8, there is no damage just the killing of problems of the bad state of the msgpack implementations.

    What you think?

    (P.S.: I am working on fixing the situation more general, but its hard to find responsible people ;), this is a first approach that would at least "fix" the situation)

  • Static Analysis warnings

    Static Analysis warnings

    First thank you for the excellent library :)

    In cmp.c there are a lot of warnings reported by our analysis tool. See screenshot. The line numbers may not match, but the function names are mentioned. cmp_issues

    Can you please evaluate if there is any problem with these implicit conversions? It would also be nice if you can cast accordingly to show your intention.

  • Use cmp with char buffer instead of file

    Use cmp with char buffer instead of file

    Hi,

    I'm new to MsgPack and C, but I'm loving your implementation so far. I already have some great results. I'm using it to unpack / pack data from a network socket, which stores it into an unsigned char buffer. I couldn't get cmp to read / write directly on the char buffer instead of a file (your example). Is there a simple way to do that and if yes, could you provide me with a simple example how to do it? Thanks a lot in advance!

  • Add peeking

    Add peeking

    In my own code it would be very convenient if I could "peek" at an object before committing to reading it. What I am imagining is that each cmp_read__() function from the main api would have a corresponding peek method that returns true/false if the corresponding cmp_read__() function would, in principle, be successful.

    if (cmp_peek_str(cmp)) { // It's a string, read it with cmp_read_str } else if (cmp_peek_int(cmp)) { // It's an int, read it with cmp_read_int } else if (cmp_peek_double(cmp)) { // etc.. } else { // error - not supported }

    While you can accomplish something similar with cmp_read_object() by looking at the type of the cmp_object, you can't then take advantage of the logic already embedded in the cmp_read__() functions to know how the CMP_TYPE__ enumerations relate to each other and should be handled.

    The duty of error checking would still fall mostly on the read functions but you could also add a cmp_peek_error() method that returns true/false if nothing can be peeked because of an error.

  • Support compiling without floating point operations

    Support compiling without floating point operations

    We would like to use the CMP library for kernel->user mode communication. But it is not allowed to use floating point instructions in the kernel. Therefore we have to exclude everything FPU related from code during compilation.

  • Added skipping support

    Added skipping support

    This is another attempt to add skipping support (see also #5). This implementation is based on code from CWPack and does not use recursion.

    Before this is merged, two things should be considered:

    1. obj_skip_count is currently a uint32_t and could overflow (e.g. while reading malicious data). Extending the type to uint64_t would help with the problem (at least a bit), but I didn't really want to change it, since I use cmp on an embedded device. Maybe instead, we could check for overflow and exit with an error.
    2. Skipping of strings/bin/ext currently reads one byte at a time, since there is no buffer available. This could be replaced with a skip/seek function pointer. Instead a buffer of configurable size could be added (see #3).
  • Implicitly convert between float and doubles

    Implicitly convert between float and doubles

    When developing an application communicating between Python and C I ran into a problem with floating point numbers: python doesn't distinguish between simple and double precision. I figured it made sense to allow cmp to cast them transparently on read.

  • Ext type and data size are inverted

    Ext type and data size are inverted

    Hello, cmp_write_ext8_marker (and possibly others, as it is the only one I checked) have a slight error. They write the extension type before the data size, where the MessagePack 5 specification says :

    ext 8 stores an integer and a byte array whose length is upto (2^8)-1 bytes:
    +--------+--------+--------+========+
    |  0xc7  |XXXXXXXX|  type  |  data  |
    +--------+--------+--------+========+
    
    where
    * XXXXXXXX is a 8-bit unsigned integer which represents N
    

    Can you fix it, or do you prefer if I do it myself ? Also, I think using some kind of unit testing for CMP would be awesome.

    thank you for this very useful library by the way.

  • cmp_skip_object_limit() doesn't work as documented for nested arrays

    cmp_skip_object_limit() doesn't work as documented for nested arrays

    The function cmp_skip_object_limit() is documented to have its limit be a limit of depth, but it doesn't work properly when you have nested arrays or maps.

    For example, if one had an array of size 10 that contained 10 zero-length arrays within, that should be fine with a limit of depth 2 according to the description. However, because of the way the code is written, each time an array is encountered, the depth is increased, regardless of whether it was nested or not. As a concrete example, a diff for test/test.c is attached. If you apply that, build, and run ./cmptest, the second bit of tests added fails, displaying this behavior.

    Given that cmp.c is written without any heap allocations, I'm not sure what the solution is for tracking the depth here. One could add an arbitrarily-sized array to keep track of some data, but perhaps you can think of a more appropriate solution. Let me know if you need any more info on the specific issue!

  • integer API cleanup

    integer API cleanup

    The read/write functions are not symmetric and confusing:

    bool cmp_write_integer(cmp_ctx_t *ctx, int64_t d);
    
    bool cmp_read_int(cmp_ctx_t *ctx, int32_t *i);
    bool cmp_read_long(cmp_ctx_t *ctx, int64_t *u);
    bool cmp_read_integer(cmp_ctx_t *ctx, int64_t *u);
    

    I would suggest the following:

    • remove int/integer duplicate functions
    • uses distinct names for 32-bit and 64-bit types: int/long or maybe int32/int64
    • add dedicated 32-bit write function (read function is already there) -> this is especially advantageous for 32-bit systems, e.g. microcontrollers, where 64-bit ints are emulated in software
    bool cmp_write_int(cmp_ctx_t *ctx, int32_t d);
    bool cmp_write_long(cmp_ctx_t *ctx, int64_t d);
    
    bool cmp_read_int(cmp_ctx_t *ctx, int32_t *i);
    bool cmp_read_long(cmp_ctx_t *ctx, int64_t *d);
    
  • Add support for zero-copy reading

    Add support for zero-copy reading

    As I see, currently to read a huge binary data, they need to be written to the buffer by reading function. What I would like to have is to get a pointer to memory-mapped binary instead of providing a buffer that will be eventually filled with binary data by reader function.

Related tags
MessagePack implementation for C and C++ / msgpack.org[C/C++]

msgpack for C/C++ It's like JSON but smaller and faster. Overview MessagePack is an efficient binary serialization format, which lets you exchange dat

Nov 24, 2022
Msgpack11 - A tiny MessagePack library for C++11 (msgpack.org[C++11])

What is msgpack11 ? msgpack11 is a tiny MsgPack library for C++11, providing MsgPack parsing and serialization. This library is inspired by json11. Th

Nov 4, 2022
Your binary serialization library

Bitsery Header only C++ binary serialization library. It is designed around the networking requirements for real-time data delivery, especially for ga

Dec 1, 2022
Nov 28, 2022
Cap'n Proto serialization/RPC system - core tools and C++ library
Cap'n Proto serialization/RPC system - core tools and C++ library

Cap'n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except

Dec 1, 2022
A C++11 library for serialization
A C++11 library for serialization

cereal - A C++11 library for serialization cereal is a header-only C++11 serialization library. cereal takes arbitrary data types and reversibly turns

Dec 2, 2022
Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby, Swift

Fast Binary Encoding (FBE) Fast Binary Encoding allows to describe any domain models, business objects, complex data structures, client/server request

Dec 1, 2022
FlatBuffers: Memory Efficient Serialization Library

FlatBuffers FlatBuffers is a cross platform serialization library architected for maximum memory efficiency. It allows you to directly access serializ

Nov 27, 2022
Yet Another Serialization
Yet Another Serialization

YAS Yet Another Serialization - YAS is created as a replacement of boost.serialization because of its insufficient speed of serialization (benchmark 1

Nov 18, 2022
Binary Serialization

Binn Binn is a binary data serialization format designed to be compact, fast and easy to use. Performance The elements are stored with their sizes to

Nov 24, 2022
Simple C++ 20 Serialization Library that works out of the box with aggregate types!

BinaryLove3 Simple C++ 20 Serialization Library that works out of the box with aggregate types! Requirements BinaryLove3 is a c++20 only library.

Sep 2, 2022
Zmeya is a header-only C++11 binary serialization library designed for games and performance-critical applications

Zmeya Zmeya is a header-only C++11 binary serialization library designed for games and performance-critical applications. Zmeya is not even a serializ

Nov 2, 2022
CppSerdes is a serialization/deserialization library designed with embedded systems in mind
CppSerdes is a serialization/deserialization library designed with embedded systems in mind

A C++ serialization/deserialization library designed with embedded systems in mind

Nov 5, 2022
Serialization framework for Unreal Engine Property System that just works!

DataConfig Serialization framework for Unreal Engine Property System that just works! Unreal Engine features a powerful Property System which implemen

Nov 19, 2022
Header-only library for automatic (de)serialization of C++ types to/from JSON.

fuser 1-file header-only library for automatic (de)serialization of C++ types to/from JSON. how it works The library has a predefined set of (de)seria

Oct 20, 2022
Yet Another Serialization
Yet Another Serialization

YAS Yet Another Serialization - YAS is created as a replacement of boost.serialization because of its insufficient speed of serialization (benchmark 1

Sep 7, 2021
C++17 library for all your binary de-/serialization needs

blobify blobify is a header-only C++17 library to handle binary de-/serialization in your project. Given a user-defined C++ struct, blobify can encode

Oct 20, 2022
universal serialization engine

A Universal Serialization Engine Based on compile-time Reflection iguana is a modern, universal and easy-to-use serialization engine developed in c++1

Dec 2, 2022
Yet another JSON/YAML/BSON serialization library for C++.
Yet another JSON/YAML/BSON serialization library for C++.

ThorsSerializer Support for Json Yaml Bson NEW Benchmark Results Conformance mac linux Performance max linux For details see: JsonBenchmark Yet anothe

Oct 27, 2022