A header only library that provides parser combinators to C++

CPP Parser Combinators

This is an experimental C++ library for rapidly building parsers. It is inspired by parser combinators in haskell such as attoparsec and, like those libraries, allows for the construction of fully fledged parsers in a few lines of code. There are however the following problems:

  • It gets a little ugly when you want recursive grammars (but no more ugly than c++ normally is)
  • Performance hasn't been tested yet. Not much effort has been put into performance yet
  • It requires C++ 17

The library uses the LGPL-3.0 licence. Because it is a header only library, this is functionally the same as a BSD licence. For more info, see here

Installing

There are no dependencies for building as it is a single header. Simply run make install

You can build the demos with make demo

Quick Tutorial

We are going to create a simple commandline parser. First we must include the header:

#include <cpp_parser/parser.h>

Next we want to define our parser. We have 4 different types of command we want to parse:

  • Short flags, e.g. -f
  • Short arguments, e.g. -a arg or -aarg
  • Long flags, e.g. --flag
  • Long arguments, e.g. --argument arg or --argument=arg

Translating this to the library is quite easy. We will start with the short options:

ParserT<char> ShortFlag = Char('-') >> AnyChar;

ParserTchar,std::string>> ShortArg = ShortFlag & AnyLit
                                               | (ShortFlag >> Char(' ')) & AnyLit
                                               ;

The first line says that to parse a short flag, we parse a '-', discard it and then parse any character. To parse a short argument, we parse a short flag and then any string or we do the same but with a space between the short flag and the string. The & combinator saves the results of the parsers of either side in a tuple. I could have written the last line in a number of different ways, my favorite being:

auto ShortArg = ShortFlag & ((Char(' ') | True) >> AnyLit)

But it essentially means the same thing.

We now will do the same for long arguments:

ParserT LongFlag = Lit("--") >> AnyLit

ParserT> LongArg =
  LongFlag & ((Char(' ') | Char('=')) >> AnyLit)

It works in the same way as the one before but with a double dash and AnyLit instead of AnyChar.

To simplify the next task, we are going to make a datatype that any flag or arg can be placed in:

struct CLOption {
  std::string name;
  bool hasArg;
  std::string arg;
};

We shall now make some functions that convert the output of the parsers to this type:

auto mkSFlag = [](char c){
  return (CLOption){std::string(1,c) ,false,""};
};

auto mkSArg = [](std::tuple<char,std::string>t){
  return (CLOption){std::string(1,std::get<0>(t)), true, std::get<1>(t)};
};

auto mkLFlag = [](std::string s) {
  return (CLOption){s, false, ""};
};

auto mkLArg = [](std::tuplet) {
  return (CLOption){std::get<0>(t), true, std::get<1>(t)};
};

Next we will use fmap to convert all the parsers to parser that output something of type CLOption and join them together.

auto Option = fmap,CLOption>(mkLArg, LongArg)
            | fmap(mkLFlag)(mkLFlag, LongFlag)
            | fmapchar,std::string>,CLOption>(mkSArg, ShortArg)
            | fmap<char,CLOption>(mkSFlag, ShortFlag)
            ;

Now Option is our finished parser! But what does that mean? Option is a function that takes a string_view as an input and outputs something of type ParserRet. Luckily, instead of digging into this type, you can use the parser like this:

std::optional result = Run(Option(some_string_view));

Combinators and types

Below is a short reference. If it doesn't provide enough details, you can consult the source. It's only one file and it isn't long!

Types and results

ParserRet

This is the return type for a parser unit.

parserRet = std::optional>

Optional means that a parser can fail. The first element of the tuple is what was parsed. The second tuple is the part of the string that hasn't been consumed.

ParserT

The parser type:

ParserT = std::function(std::string_view)>;

A parser is a function from a string to a ParserRet

Many

Just an alias for a list

Many = std::list;

M

This is a helper for when using monadic bind. You need to cast lambdas in order for the code to compile. Lambda types are long and so this helper provides some relief!

M = std::function(A)>

Get

Returns the result of a parse or a default value:

T Get(ParserRet result, T def)

Run

Like Get but returns optional values instead of a default value when handling failure:

std::optional Run(ParserRet result)

Sequence

A << B

A << B created a parser out of the parsers A and B. The behaviour is the same as running parser A followed by B. If either fail, the new parsers failes. Else, return the result of parser A.

Consider the example that parsers an integer followed by a space:

Parser intSpace = Integer << Char(' ');

A >> B

A >> B creates a parser out of the parsers A and B. The behaviour of the new parser is equivalent to running A. If it fails, we fail. Else we discard the result and run parser B.

ParserT operator>> (const ParserT& l, const ParserT& r)

A | B

This is the alternative combinator. A | B creates a parser that out of the parsers A and B. The behavior is the same as running A. If it succeeds, return the output, else we return B.

ParserT operator| (const ParserT& l, const ParserT& r)

A & B

A & B creates a parser out of the parsers A and B. It means run A then B storing the output of both in a tuple.

ParserT> operator& (const ParserT& l, const ParserT& r)

>>= (Monadic bind)

Useful way of constructing complex parsers without the need to extract the raw tuples as you go along!

ParserT operator>>= (ParserT xm, M f)

See network_proto.cpp for an example.

Modifying Parsers

fmap

fmap(f, A) means create a new parser from function f and parser A that is equivilent to applying the function f on the result of running A.

ParserT fmap(std::function f, const ParserT& r) {

Tag

Tag(P, t) creates a parser out of parser P and value t that when run, returns the result of P ina tuple with value t

ParserT> Tag(const ParserT& p, T tag)

Replace

Replace(P,t) creates a parser put of parser P and value t that whe n run, will replace the output of parser P with t if successful.

Lists

many

Runs a parser over and over again until it fails.

ParserT> many(const ParserT & p)

Example: We want to parse a comma seporated list.

auto SpaceChars = Char(' ') | Char('\t') | Char('\n') | Char('\r');

auto WhiteSpace = many(SpaceChars);

auto ListItem = // Your parser for a list item

auto InnerList = Char(',') >> WhiteSpace >> ListItem;

auto cons = [](std::tuple> i) {
  auto [head,tail] = i;
  tail.push_front(head);
  return tail;
};

auto List = fmap>>(cons, ListItem & many(InnerList));

many1

same as many, but fails instead of returning the empty list.

Parser Units

False

Parser that always fails

ParserT<bool> False

True

Parser that always succeeds

ParserT<bool> True

Not

Not(P) creates a parser that fails if P succeeds and succeeds if P fails.

ParserT<bool> Not(const ParserT& p)

Const

Const(t) creates a parser that consumes no input, always succeeds and retuns t.

ParserT Const(T t)

Char

Parses the provided character

ParserT<char> Char(char c)

Lit

Parses the provided string literal

ParserT Lit(std::string lit)

AnyChar

Parses any character

ParserT<char> AnyChar

Alpha

Parses any Character that is a letter

ParserT<char> Alpha

Special

A set of characters that aren't parses by AnyLit

ParserT<char> Special

AnyLit

Any String that doesn't contain special

ParserT AnyLit

Satisfy

Takes a function, pred, as input that takes in a char and returns a bool. It parses the char c iff pred(c) == true

ParserT<char> Satisfy(std::function<bool(char)> pred)

TakeWhile

Takes a function, pred, as input that takes in a char and returns a bool. It consumes the input string while pred(s[i]) == true

ParserT TakeWhile(std::function<bool(char)> pred)

Take

Takes an integer length as input. Consumes that length of string.

ParserT Take(int len)

DigitC

Parses a single digit into a character

ParserT DigitC

Digit

Parses a single digit to an int.

ParserT<int> Digit;

Natural

Parses a natural number

ParserT<int> Natural

Integer

parses an integer

ParserT<int> Integer
Owner
Jotron AS
Jotron is aiming to open source a number of existing internal projects and some new ones too. This is where you will be able to find it all
Jotron AS
Similar Resources

A minimal header-only audio synthesis and processing library

Aurora A minimal header-only C++ audio synthesis and processing toolkit. Getting Started Aurora is a collection of header files which can be included

Sep 26, 2022

Header-only library providing unicode aware string support for C++

CsString Introduction CsString is a standalone library which provides unicode aware string support. The CsBasicString class is a templated class which

Aug 20, 2022

Header-only library to instrument scopes in C++

ScopeTimer Header Only Library How to use it? Easy cheesy: If you need milliseconds, just specify it and the compiler will do the rest of the work: #i

Nov 29, 2021

ServiceLocator - Service Locator Pattern Header-Only Library

Service Locator Very fast, header-only C++ Service Locator Pattern library What is the Service Locator Pattern The Service Locator Pattern is a design

Feb 21, 2022

DimensionalAnalysis - A compact C++ header-only library providing compile-time dimensional analysis and unit awareness

Dimwits ...or DIMensional analysis With unITS is a C++14 library for compile-time dimensional analysis and unit awareness. Minimal Example #include i

Jul 8, 2022

🛠️ A simple ECS library made for learning purposes (header-only)

Met ECS A simple Entity Component System library made for learning purposes. It is header-only, so just have to copy the content of the src folder and

Mar 26, 2022

Tiny header-only library providing bitwise operators for enums in C++11

bitflags Tiny header-only library providing bitwise operators for enums in C++11. Getting started Import the operators from namespace avakar::bitflags

Aug 28, 2022

C++11 header-only library that offers small vector, small flat map/set/multimap/multiset.

sfl library This is header-only C++11 library that offers several new containers: small_vector small_flat_set small_flat_map small_flat_multiset small

Nov 15, 2022
Fast C/C++ CSS Parser (Cascading Style Sheets Parser)

MyCSS — a pure C CSS parser MyCSS is a fast CSS Parser implemented as a pure C99 library with the ability to build without dependencies. Mailing List:

Sep 22, 2022
oZKS (Ordered Zero-Knowledge Set) is a library that provides an implementation of an Ordered (and Append Only) Zero-Knowledge Set.

Ordered Zero-Knowledge Set - oZKS Introduction oZKS is a library that provides an implementation of an Ordered (and Append Only) Zero Knowledge Set. A

Oct 12, 2022
GPS parser which read raw GPS messages, selects only the valid ones and sends them to CAN bus

EagleTRT GPS System for Fenice GPS parser which read raw GPS messages, selects only the valid ones and sends them to CAN bus Compiling GPS Logger gps_

Nov 11, 2021
2D physics header-only library for videogames developed in C using raylib library.
2D physics header-only library for videogames developed in C using raylib library.

Physac Physac is a small 2D physics engine written in pure C. The engine uses a fixed time-step thread loop to simluate physics. A physics step contai

Nov 23, 2022
A header-only library for C++(0x) that allows automagic pretty-printing of any container.

cxx-prettyprint =============== A pretty printing library for C++ containers. Synopsis: Simply by including this header-only library in your sourc

Nov 26, 2022
Cross-platform C++11 header-only library for memory mapped file IO

mio An easy to use header-only cross-platform C++11 memory mapping library with an MIT license. mio has been created with the goal to be easily includ

Nov 26, 2022
A C++ header-only library for creating, displaying, iterating and manipulating dates

The ASAP date/time library for beautiful C++ code ASAP is a small, header-only date-time library for C++11 and beyond. It is heavily inspired by my gr

Oct 26, 2022
A Minimal, Header only Modern c++ library for terminal goodies 💄✨
A Minimal, Header only Modern c++ library for terminal goodies 💄✨

rang Colors for your Terminal. Windows Demo Example usage #include "rang.hpp" using namespace std; using namespace rang; int main() { cout << "P

Nov 28, 2022
AssociatedEnum: header-only library for C++ for enumerations with associated values

asenum AssociatedEnum is a header-only library for C++ for enumerations with associated values asenum is C++ implementation of very neat enums from Sw

Oct 30, 2022
Small and dirty header-only library that supports user input with some more advanced features than in the standard lib.

dirty-term Small and dirty header-only library that supports user input with some more advanced features than in the standard lib. This small, lightwe

Apr 24, 2022