CommonMark parsing and rendering library and program in C

cmark

CI tests

cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. (For the JavaScript reference implementation, see commonmark.js.)

It provides a shared library (libcmark) with functions for parsing CommonMark documents to an abstract syntax tree (AST), manipulating the AST, and rendering the document to HTML, groff man, LaTeX, CommonMark, or an XML representation of the AST. It also provides a command-line program (cmark) for parsing and rendering CommonMark documents.

Advantages of this library:

  • Portable. The library and program are written in standard C99 and have no external dependencies. They have been tested with MSVC, gcc, tcc, and clang.

  • Fast. cmark can render a Markdown version of War and Peace in the blink of an eye (127 milliseconds on a ten year old laptop, vs. 100-400 milliseconds for an eye blink). In our benchmarks, cmark is 10,000 times faster than the original Markdown.pl, and on par with the very fastest available Markdown processors.

  • Accurate. The library passes all CommonMark conformance tests.

  • Standardized. The library can be expected to parse CommonMark the same way as any other conforming parser. So, for example, you can use commonmark.js on the client to preview content that will be rendered on the server using cmark.

  • Robust. The library has been extensively fuzz-tested using american fuzzy lop. The test suite includes pathological cases that bring many other Markdown parsers to a crawl (for example, thousands-deep nested bracketed text or block quotes).

  • Flexible. CommonMark input is parsed to an AST which can be manipulated programmatically prior to rendering.

  • Multiple renderers. Output in HTML, groff man, LaTeX, CommonMark, and a custom XML format is supported. And it is easy to write new renderers to support other formats.

  • Free. BSD2-licensed.

It is easy to use libcmark in python, lua, ruby, and other dynamic languages: see the wrappers/ subdirectory for some simple examples.

There are also libraries that wrap libcmark for Go, Haskell, Ruby, Lua, Perl, Python, R and Scala.

Installing

Building the C program (cmark) and shared library (libcmark) requires cmake. If you modify scanners.re, then you will also need re2c (>= 0.14.2), which is used to generate scanners.c from scanners.re. We have included a pre-generated scanners.c in the repository to reduce build dependencies.

If you have GNU make, you can simply make, make test, and make install. This calls cmake to create a Makefile in the build directory, then uses that Makefile to create the executable and library. The binaries can be found in build/src. The default installation prefix is /usr/local. To change the installation prefix, pass the INSTALL_PREFIX variable if you run make for the first time: make INSTALL_PREFIX=path.

For a more portable method, you can use cmake manually. cmake knows how to create build environments for many build systems. For example, on FreeBSD:

mkdir build
cd build
cmake ..  # optionally: -DCMAKE_INSTALL_PREFIX=path
make      # executable will be created as build/src/cmark
make test
make install

Or, to create Xcode project files on OSX:

mkdir build
cd build
cmake -G Xcode ..
open cmark.xcodeproj

The GNU Makefile also provides a few other targets for developers. To run a benchmark:

make bench

For more detailed benchmarks:

make newbench

To run a test for memory leaks using valgrind:

make leakcheck

To reformat source code using clang-format:

make format

To run a "fuzz test" against ten long randomly generated inputs:

make fuzztest

To do a more systematic fuzz test with american fuzzy lop:

AFL_PATH=/path/to/afl_directory make afl

Fuzzing with libFuzzer is also supported but, because libFuzzer is still under active development, may not work with your system-installed version of clang. Assuming LLVM has been built in $HOME/src/llvm/build the fuzzer can be run with:

CC="$HOME/src/llvm/build/bin/clang" LIB_FUZZER_PATH="$HOME/src/llvm/lib/Fuzzer/libFuzzer.a" make libFuzzer

To make a release tarball and zip archive:

make archive

Installing (Windows)

To compile with MSVC and NMAKE:

nmake

You can cross-compile a Windows binary and dll on linux if you have the mingw32 compiler:

make mingw

The binaries will be in build-mingw/windows/bin.

Usage

Instructions for the use of the command line program and library can be found in the man pages in the man subdirectory.

Security

By default, the library will scrub raw HTML and potentially dangerous links (javascript:, vbscript:, data:, file:).

To allow these, use the option CMARK_OPT_UNSAFE (or --unsafe) with the command line program. If doing so, we recommend you use a HTML sanitizer specific to your needs to protect against XSS attacks.

Contributing

There is a forum for discussing CommonMark; you should use it instead of github issues for questions and possibly open-ended discussions. Use the github issue tracker only for simple, clear, actionable issues.

Authors

John MacFarlane wrote the original library and program. The block parsing algorithm was worked out together with David Greenspan. Vicent Marti optimized the C implementation for performance, increasing its speed tenfold. Kārlis Gaņģis helped work out a better parsing algorithm for links and emphasis, eliminating several worst-case performance issues. Nick Wellnhofer contributed many improvements, including most of the C library's API and its test harness.

Owner
CommonMark
A strongly specified, highly compatible implementation of Markdown
CommonMark
Comments
  • Extension support in libcmark

    Extension support in libcmark

    Hello, I always see "extensions" mentioned on discussions of features in CommonMark (for example the discussion about tables).

    Does libcmark itself support actual extensions, and if so is there any guide on how to implement one, and an index of common extensions, or are extensions purely conceptual extensions to the specification, up to individual implementations to add ?

    I'm pretty sure I could just read the code and find out but

    • I'm lazy
    • This question may be useful for someone else wondering the same thing.

    Thanks for all your work on CommonMark / pandoc!

  • Extensions redux

    Extensions redux

    Hi there!

    I've taken the work in #123 and rejigged it a bit. At GitHub we're currently using a Sundown-based parser/renderer, but it's not super extensible. So, we've decided to roll out CommonMark to replace it.

    Here are some of the changes to #123 in this PR:

    • Took out the shared object searcher. I don't think library code should searching and loading objects dynamically at runtime. Instead, you as the user register whatever plugins you might have linked in yourself. (Maybe you loaded it dynamically — that's not something for a Markdown library to do, imo.)
    • Expanded the extension interface enough such that the two existing plugins (table, strikethrough) can be implemented without any changes to the core code. #123 had table specific code in the core (outside of the shared object), which limited its usefulness. This branch means no change to core code for implementing said. Extensions can register their own node types, their own renderer functions for said node types, etc.
    • Fixes the Windows build.
    • Adds tests.
    • Adds an autolink extension.
    • Adds a whitelist extension for the HTML renderer.

    This functionality has all been exposed as opt-in in the Ruby gem commonmarker, which is our primary interface.

  • More sourcepos!

    More sourcepos!

    Found time to work a bit more on https://github.com/jgm/cmark/issues/26 , these commits solely implement the definition of a gap-free source map with no empty extents, parsed in the same pass as the nodes.

    A few things to discuss:

    • Ownership of line-termination characters and trailing whitespace at the end of blocks, I make them belong to the root
    • Ownership of reference definitions, which I also attribute to the root, as there's no proper AST node for them
    • Performance, my initial implementation for this approach doubled make bench time (erg), I now observe a 25 % difference, which seems reasonable to me, not sure how much can be ironed out still, the task is inherently a bit complex, due to potential discontinuities between the n extents constituting a block. Either we consider this a reasonable enough performance hit and hope we find more smart ways to reduce it, or we make it conditional to the SOURCEPOS option as currently.

    Things that are in my opinion out of scope and can be discussed later:

    • Actual API for this
    • "Reverse source map", ie node to extents

    To test this, just enable the call to "print_parser_source_map" in blocks.c:finalize_document , example output for the case that made me revise my approach:

    >     code
    >     more code
    
    0:1 - block_quote (0x9485f0)
    1:2 - block_quote (0x9485f0)
    2:6 - code_block (0x948740)
    6:14 - code_block (0x948740)
    14:15 - block_quote (0x9485f0)
    15:16 - block_quote (0x9485f0)
    16:20 - code_block (0x948740)
    20:30 - code_block (0x948740)
    

    I'm sure there's more to say, but let's keep this short :)

  • Consider usage of the GLib in libcmark

    Consider usage of the GLib in libcmark

    This subject has been briefly discussed in #100 , but I figured a separate issue to sum up arguments for and against that, and discuss whether this would be acceptable would be useful.

    Argument(s) against using the glib

    The main (and only) argument raised against using the glib is that of portability. I would argue its usage would actually help with portability, with respect to things like loading of plugins, or threading.

    Note that I will open another issue at some point regarding multithreading of the inline parsing phase, as I think this phase is very amenable to parallelization, as long as the separation of inline and block parsing is consistently enforced.

    I'm working for an Open Source software consultancy company, collabora, where we routinely deploy glib-based solutions on a wide range of architectures and operating systems, including Windows, and I've never seen any issues with glib's portability.

    I'm writing this from my seldom-used Windows partition, where I've just successfully compiled a version of cmark built against the glib, thanks to the MSYS2 project this has been a completely painless experience. Note that it is also trivial to provide installers using that solution.

    Arguments for using the glib

    Features

    See https://developer.gnome.org/glib/2.48/ for the full list of features, here are a few I think are relevant for cmark, in that they could make its codebase way leaner, and help implement features in a portable manner.

    • Portable basic types: gboolean would for example help get us rid of that code, by the way I'm not even sure how cmark could compile at all when the preprocessor enters the #elif !defined(__cplusplus) preproc branch there, as true and false will not be defined.
    • Standard high-level data structures: Having to implement a poor man's linked list in my work on extensions isn't something I'm satisfied with, more generally the more wheels one reinvents the more surface for bugs one has to maintain. I think (not sure) that reference maps are implemented as a hashtable, one could use a GHashTable for this instead. The AST could be implemented as an N-ary tree etc.
    • String-related utilities: https://developer.gnome.org/glib/2.48/glib-String-Chunks.html https://developer.gnome.org/glib/2.48/glib-Strings.html and https://developer.gnome.org/glib/2.48/glib-String-Utility-Functions.html would let us get rid of cmark_chunk and cmark_strbuf, which I had to expose in the API for extensions.
    • Unicode handling: I think the plethora of functions defined in there would make most if not all of the utf8 handling code in cmark irrelevant.
    • Error handling / Logging: https://developer.gnome.org/glib/2.48/glib-Warnings-and-Assertions.html, https://developer.gnome.org/glib/2.48/glib-Message-Logging.html and https://developer.gnome.org/glib/2.48/glib-Error-Reporting.html would be more than useful to improve ease of debugging of the library for us developers, as well as offer a more advanced error API.
    • Parsing and lexing utilities: https://developer.gnome.org/glib/2.48/glib-Lexical-Scanner.html,https://developer.gnome.org/glib/2.48/glib-regex-syntax.html : I haven't benchmarked these functions, but I'm pretty sure re2c - generated scanners will blow them out of the water performance-wise, however their performance might be adequate for extension implementers who would desire a higher-level lexing / regex interface, and would directly have one available when linking with libcmark.
    • Filesystem-related utilities: https://developer.gnome.org/glib/2.48/glib-File-Utilities.html and https://developer.gnome.org/glib/2.48/glib-URI-Functions.html contain functions that would help simplify some code paths as well.
    • command-line parser to get rid of and improve the equivalent code in the cmark executable.
    • Testing: this wouldn't hurt either :)
    • GApplication: not really needed for now, but one could imagine this being useful if we ever wanted to have a "cmark server".
    • Threading: https://developer.gnome.org/glib/2.48/glib-Thread-Pools.html would help a lot in portably parallelizing inline parsing. I haven't benchmarked this, but I believe there's a significant performance reserve to tap into in that direction, which could be very valuable for one of cmark's announced use cases, which is to be run server-side.
    • Plugin loading: would be extremely useful for extensions obviously.

    Other arguments

    • cmark is already a moderately complex library, which implements internally things that have been standard in the glib for ages. My work on extensions only complexifies it a bit more, with the addition of (linux-only) plugin-loading code, a linked list, exposing API that has nothing to do with cmark's actual job ... Using the glib would drastically reduce the scope of the library, and actually help improve its behaviour and portability. I think this outweighs by far the distribution concerns, which simply need to be addressed once and for all by updating the installing documentation, and possibly providing helpers such as Visual Studio solutions for people insisting on using that piece of technology (I'm afraid I won't be able to help there).
    • The "glib port" is by no means something that needs to happen all at once, it can be done incrementally, and wouldn't interfere with daily development.
    • We could decide at some point to port some of the already kind of object-oriented API to GObject, which would let us offer dynamic introspection capabilities for javascript and python
    • I really really think this is a good idea, if that's any help :)

    @nwellnhof , I know you're concerned with this, but please come to this with an open mind, consider all the things the glib would bring to the table, and evaluate whether this port would really prevent you from using cmark at all, or simply mean spending ten minutes figuring out how to update your bundling of cmark, which could profit to other people using that practice.

  • Parsing ‘* * * * * * … a’ takes quadratic time

    Parsing ‘* * * * * * … a’ takes quadratic time

    $ python -c 'print("* "*10000 + "a")' | time cmark > /dev/null
    1.21user 0.00system 0:01.23elapsed 98%CPU (0avgtext+0avgdata 6048maxresident)k
    0inputs+0outputs (0major+1188minor)pagefaults 0swaps
    $ python -c 'print("* "*20000 + "a")' | time cmark > /dev/null
    7.55user 0.00system 0:07.59elapsed 99%CPU (0avgtext+0avgdata 9968maxresident)k
    0inputs+0outputs (0major+2245minor)pagefaults 0swaps
    $ python -c 'print("* "*40000 + "a")' | time cmark > /dev/null
    41.23user 0.01system 0:41.44elapsed 99%CPU (0avgtext+0avgdata 18848maxresident)k
    0inputs+0outputs (0major+4410minor)pagefaults 0swaps
    

    Related: jgm/commonmark-hs#2, mity/md4c#66.

  • Build fails on Debian Jessie

    Build fails on Debian Jessie

    It seems that cmark 0.30.0 can't be compiled on Debian Jessie (compilation of cmark 0.29.0 succeeds):

    From within a Docker image launched with docker run --rm -it debian:jessie bash:

    $ apt-get update && apt-get install -qy cmake curl g++ python3
    [...]
    $ cd "$(mktemp -d)"
    $ curl -sSL -o - https://github.com/commonmark/cmark/archive/0.30.0.tar.gz | tar xz
    $ cd cmark-*
    $ make -s -j$(nproc) cmake_build
    -- The C compiler identification is GNU 4.9.2
    -- The CXX compiler identification is GNU 4.9.2
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Performing Test HAVE_FLAG_ADDRESS_SANITIZER
    -- Performing Test HAVE_FLAG_ADDRESS_SANITIZER - Failed
    -- Performing Test HAVE_FLAG_SANITIZE_ADDRESS
    -- Performing Test HAVE_FLAG_SANITIZE_ADDRESS - Success
    -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
    -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
    -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
    -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
    -- Performing Test COMPILER_HAS_DEPRECATED_ATTR
    -- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
    -- Looking for stdbool.h
    -- Looking for stdbool.h - found
    -- Performing Test HAVE___BUILTIN_EXPECT
    -- Performing Test HAVE___BUILTIN_EXPECT - Success
    -- Performing Test HAVE___ATTRIBUTE__
    -- Performing Test HAVE___ATTRIBUTE__ - Success
    -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.4.2", minimum required is "3")
    -- Configuring done
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    -- Generating done
    -- Build files have been written to: /tmp/tmp.KfonX7KiZZ/cmark-0.30.0/build
    Makefile:37: recipe for target 'build' failed
    make: *** [build] Error 1
    
  • Provide a CMARK_UNSAFE environment variable for backwards compatibility.

    Provide a CMARK_UNSAFE environment variable for backwards compatibility.

    Making safe mode the default is a noble idea. However, an incompatible change without a backwards compatibility option, after years of unchanging behaviour, is a serious headache.

    cmark 0.28 is still in a lot of still supported OSes/distros. If you use cmark as a part of a build process and your Markdown comes from a trusted/reviewed source, you have to choose between old and new because the old one will complain about an invalid option if you run cmark --unsafe, and the new one will auto-escape HTML without it.

    Environment variables can be an effective backwards compatibility mechanism: for the new versions it will disable safe mode, and for old versions it will have no ill effects.

    This patch adds CMARK_UNSAFE environment variable in addition to the --unsafe option, with the same effect.

    $ ./build/src/cmark 
    <blink>hello world</blink> ^D
    
    <p><!-- raw HTML omitted -->hello world<!-- raw HTML omitted --></p>
    
    $ CMARK_UNSAFE=1 ./build/src/cmark 
    <blink>hello world</blink> ^D
    
    <p><blink>hello world</blink></p>
    

    I'm open to discussion regarding the naming and implementation.

  • buffer: proper safety checks for unbounded memory

    buffer: proper safety checks for unbounded memory

    Hey @jgm! Long time no chat. :)

    I've been doing some security review on the library. I have some concerns about the way we're handling buffer overflows, so here's a proposed commit. Message as follows:


    The previous work for unbounded memory usage and overflows on the buffer API had several shortcomings:

    1. The total size of the buffer was limited by arbitrarily small precision on the storage type for buffer indexes (typedef'd as bufsize_t). This is not a good design pattern in secure applications, particualarly since it requires the addition of helper functions to cast to/from the native size types and the custom type for the buffer, and check for overflows.
    2. The library was calling abort on overflow and memory allocation failures. This is not a good practice for production libraries, since it turns a potential RCE into a trivial, guaranteed DoS to the whole application that is linked against the library. It defeats the whole point of performing overflow or allocation checks when the checks will crash the library and the enclosing program anyway.
    3. The default size limits for buffers were essentially unbounded (capped to the precision of the storage type) and could lead to DoS attacks by simple memory exhaustion (particularly critical in 32-bit platforms). This is not a good practice for a library that handles arbitrary user input.

    Hence, this patchset provides slight (but in my opinion critical) improvements on this area, copying some of the patterns we've used in the past for high throughput, security sensitive Markdown parsers:

    1. The storage type for buffer sizes is now platform native (ssize_t). Ideally, this would be a size_t, but several parts of the code expect buffer indexes to be possibly negative. Either way, switching to a size type is an strict improvement, particularly in 64-bit platforms. All the helpers that assured that values cannot escape the size range have been removed, since they are superfluous.
    2. The overflow checks have been removed. Instead, the maximum size for a buffer has been set to a safe value for production usage (32mb) that can be proven not to overflow in practice. Users that need to parse particularly large Markdown documents can increase this value. A static, compile-time check has been added to ensure that the maximum buffer size cannot overflow on any growth operations.
    3. The library no longer aborts on buffer overflow. The CMark library now follows the convention of other Markdown implementations (such as Hoedown and Sundown) and silently handles buffer overflows and allocation failures by dropping data from the buffer. The result is that pathological Markdown documents that try to exploit the library will instead generate truncated (but valid, and safe) outputs.

    All tests after these small refactorings have been verified to pass.


    NOTE: Regarding 32 bit overflows, generating test cases that crash the library is trivial (any input document larger than 2gb will crash CMark), but most Python implementations have issues with large strings to begin with, so a test case cannot be added to the pathological tests suite, since it's written in Python.

  • smart_punct.txt

    smart_punct.txt

    We have already discussed it commonmark.js, but I spotted test for smart punctuation here. And have the same question. Why does markdown transformer have responsibility to do smth with typography?

  • Remove

    Remove "-rdynamic" flag for static builds

    I ran into problems trying to build the statically linked cmark executable using musl libc that was caused by the "-rdynamic" flag being implicitly added to the build command. The resulting executable would had references to a musl libc shared object instead of being hermetic as expected. I'm not sure if this has any unintended consequences, but I at least have no problems building a dynamically linked executable with this patch applied on my Linux and macOS machines.

  • Implement support for custom memory allocators

    Implement support for custom memory allocators

    Supersedes https://github.com/jgm/cmark/pull/127

    As discussed on the previous PR, here's a proposal on a supporting custom memory allocators. As you can see I've wired up the cmark_mem structure throughout the parser, I believe without increasing complexity or memory usage needlessly. I haven't implemented yet pooling allocators but it should be trivial now that we're passing a "memory" structure everywhere where we allocate memory.

    I'd love feedback on this design. I'm moderately happy with it. The external API hasn't been broken, and the "default APIs" will continue working as before, using a default allocator (system malloc + abort on OOM). Every single node is now also aware of its allocator, which means that we can choose an allocator when creating new nodes and check that you cannot insert a node from a specific allocator into a document tree created by another allocator.

    To offset the slight memory increase in the node structure, I've put some light memory diet -- although we could save quite a few bytes if we did something smarter with the prev, next, etc links in each structure.

    cc @jgm @nwellnhof

  • Parsing of

    Parsing of "____a__!__!___"

    Sorry for the line noise, it was discovered when I was fuzzing my Markdown parser and I haven't been able to reduce it to a smaller case.

    The following Markdown:

    ____a__!__!___
    

    generates the following output:

    __<strong>a</strong>!<strong>!</strong>_
    

    However, none of the rules in https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis seems to prohibit the second _ and the last _ to be paired to form another emphasis, thus becoming:

    _<em><strong>a</strong>!<strong>!</strong></em>
    

    Interestingly, if I replace all _ with * I do get the desired output, and in this particular case the distinction shouldn't matter.

    commonmark.js has the same behavior: https://spec.commonmark.org/dingus/?text=____a__!__!___. But GitHub's Markdown implementation has the behavior I expect: a!!___

  • use an in-tree header for symbol export information

    use an in-tree header for symbol export information

    Relying on CMake's GenerateExportHeader produces a file that is longer than the one being added here, which for all that is just mostly defining macros not used here. And after all that, it only contains macros specific to a single compiler, while failing to consistently handle GNUC (that always supports symbol visibility even for static libraries).

    Replace this with a more targeted header that is easy to read or include into external build systems, and which is also more robust than the one that only exists inside CMake.

  • Tracking backslash escapes?

    Tracking backslash escapes?

    I think this is similar/related to #131 and #292, but one thing I noticed is that bare square brackets do not roundtrip:

    Input:

    [unescaped brackets],  \[escaped brackets\] and [a link](https://example.com)
    

    Output:

    \[unescaped brackets\],  \[escaped brackets\] and [a link](https://example.com)
    

    Is there a way to add an attribute that can track the position of escaped characters in a line?

    This is useful for me because I'm trying to parse and rewrite documents that have reference links in child documents, and I have to backtrack to identify and protect these links from being overwritten.

    (originally reported this in https://github.com/r-lib/commonmark/issues/20)

  • Add vcpkg installation instructions

    Add vcpkg installation instructions

    cmark is available as a port in vcpkg, a C++ library manager that simplifies installation for cmark and other project dependencies. Documenting the install process here will help users get started by providing a single set of commands to build cmark, ready to be included in their projects.

    We also test whether our library ports build in various configurations (dynamic, static) on various platforms (OSX, Linux, Windows: x86, x64) to keep a wide coverage for users.

    I'm a maintainer for vcpkg, and here is what the port script looks like. We try to keep the library maintained as close as possible to the original library. 😊

  • incorrect start_column & end_column

    incorrect start_column & end_column

    The text: - \na

    Creates the following hierarchy:

    • Document
      • List
        • Paragraph
          • Text

    This AST has the following {start_line, end_line, start_column, end_column}

    • Document: {1, 2, 1, 1}
      • List: {1, 2, 1, 1}
        • Paragraph: {1, 2, 3, 1}
          • Text: {2, 2, 3, 3}

    {2, 2, 3, 3} (the Text bounds) exceed the bounds for the document, and violates some assumptions which cause my application to panic.

    I couldn't find any documentation which makes it clear whether this is a bug or not, but my intuition says that this is a bug.

Related tags
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

libpostal: international street address NLP libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP a

Nov 23, 2022
Command-line arguments parsing library.

argparse argparse - A command line arguments parsing library in C (compatible with C++). Description This module is inspired by parse-options.c (git)

Nov 19, 2022
Header only roguelike rendering library.

Header only roguelike rendering library. Support for Opengl33 and Raylib. Features Support for custom glyph atlasses with up to 65655 tiles of custom

Nov 4, 2022
tlRender, or timeline render, is an early stage project for rendering editorial timelines
tlRender, or timeline render, is an early stage project for rendering editorial timelines

tlRender tlRender, or timeline render, is an early stage project for rendering editorial timelines. The project includes libraries for rendering timel

Nov 15, 2022
A docker image where you can run a judge program and a converter for multiple sequence alignment

genocon2021-docker 本リポジトリでは、ジャッジプログラム(eval.c)と Multiple Sequence Alignment (MSA) 変換プログラム(decode_cigar.py)を同梱した Docker イメージを提供しています。 また、サンプル解答プログラム(sam

Sep 20, 2021
XEphem is an interactive astronomy program for all UNIX platforms.
XEphem is an interactive astronomy program for all UNIX platforms.

XEphem is an interactive astronomy program for all UNIX platforms. More screenshots are shown below.

Nov 21, 2022
A System Fetching Program written in C.
A System Fetching Program written in C.

A System Fetching Program written in C.

Oct 10, 2022
A simple program to suspend or hibernate your computer

A simple program to suspend or hibernate your computer. It supports hooks before and after suspending.

Nov 9, 2022
Context Free Grammars to Pushdown Automata Conversion, C++ program
Context Free Grammars to Pushdown Automata Conversion, C++ program

CFG-to-PDA-Conversion Context Free Grammars to Pushdown Automata Conversion, C++ program USF Group Project: Was in charge of Lambda Removal, Unit Remo

Mar 15, 2022
Add colors to your program in C with umbrella.h

☂️ umbrella ☂️ Add colors to your program in C with umbrella.h Using in projects

Jan 18, 2022
Isocline is a pure C library that can be used as an alternative to the GNU readline library
Isocline is a pure C library that can be used as an alternative to the GNU readline library

Isocline: a portable readline alternative. Isocline is a pure C library that can be used as an alternative to the GNU readline library (latest release

Nov 10, 2022
A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename.

whereami A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename. usag

Sep 24, 2022
Locate the current executable and the current module/library on the file system

Where Am I? A drop-in two files library to locate the current executable and the current module on the file system. Supported platforms: Windows Linux

Nov 9, 2022
A small and portable INI file library with read/write support

minIni minIni is a portable and configurable library for reading and writing ".INI" files. At just below 900 lines of commented source code, minIni tr

Nov 22, 2022
The libxo library allows an application to generate text, XML, JSON, and HTML output using a common set of function calls. The application decides at run time which output style should be produced.

libxo libxo - A Library for Generating Text, XML, JSON, and HTML Output The libxo library allows an application to generate text, XML, JSON, and HTML

Nov 20, 2022
A simple and easy-to-use library to enjoy videogames programming

hb-raylib v3.5 Harbour bindings for raylib 3.5, a simple and easy to use library to learn videogames programming raylib v3.5. The project has an educa

Aug 28, 2022
Small header-only C++ library that helps to initialize Vulkan instance and device object

Vulkan Extensions & Features Help, or VkExtensionsFeaturesHelp, is a small, header-only, C++ library for developers who use Vulkan API.

Oct 12, 2022
Haxe bindings for raylib, a simple and easy-to-use library to learn videogame programming
Haxe bindings for raylib, a simple and easy-to-use library to learn videogame programming

Haxe bindings for raylib, a simple and easy-to-use library to learn videogame programming, Currently works only for windows but feel free the expand t

Nov 9, 2022