Small Extremely Powerful Header Only C++ Lexical Analyzer/String Parser Library



GitHub repo size Lines of code GitHub commit activity
Maintenance Support me on Patreon


lexpp

Small Extremely Powerful Header Only C++ Lexical Analyzer/String Parser Library

Lexpp is made with simplicity and size in mind. The entire library is about 500 lines!

Lexpp is very powerful and can be used for almost all parsing needs!

You can check the examples/ for more elaborate usage.

How to Use

Just place the lexpp.h file in your project include directory.

In one cpp file define LEXPP_IMPLEMENTATION before importing lexpp like this:

#define LEXPP_IMPLEMENTATION
#include "lexpp.h"

You are all done to use lexpp!

Basic Examples

String Parsing

std::string data = "some text to parse! ";
std::vector<std::string> tokens = lexpp::lex(data, " ;\n");

for(std::string& token : tokens){
    std::cout << token << std::endl;
}

Some more string parsing

std::string data = "some text to parse! ";
std::vector<std::string> tokens = lexpp::lex(data, {"<=", "<<", "\n", "::", ",", "}", "{", ";", " "}, false);

for(std::string& token : tokens){
    std::cout << token << std::endl;
}

Using Custom Token Classifier

enum MyTokens{
    Keyword = 0,
    Number,
    String,
    Other
};

static std::string TokenToString(int tok){
switch(tok){
    case Keyword: return "Keyword";
    case Number:  return "Number";
    case String:  return "String";
    case Other:   return "Other";
}
}

Now the Lexing

std::vector<std::string> keywords = {"for", "void", "return", "if", "int"};
std::vector<lexpp::Token> tokens = lexpp::lex(data, {"<=", "<<", "\n", "::", ",", "}", "{", "(", ")" ";", " "}, [keywords](std::string& token, bool* discard, bool is_separator) -> int {
    if(std::find(keywords.begin(), keywords.end(), token) != keywords.end()){
        return MyTokens::Keyword;
    }
    if(is_number(token))
        return MyTokens::Number;
    else
        return MyTokens::String;
}, false);

for(lexpp::Token& token : tokens){
    std::cout << TokenToString(token.type) << " -> " << token.value << std::endl;
}

Using the TokenParser class

We need to extend the TokenParser class to have our cuastom token parser

class MyTokenParser : public lexpp::TokenParser
{
public:
MyTokenParser(std::string data, std::string separators)
:TokenParser(data, separators, false){}

virtual int process_token(std::string& token, bool* discard, bool isSeparator) override
{
    if(std::find(keywords.begin(), keywords.end(), token) != keywords.end())
        return MyTokens::Keyword;
    else if(is_number(token))
        return MyTokens::Number;
    else if(isSeparator)
        return MyTokens::Other;
    else
        return MyTokens::String;
}    

std::vector<std::string> keywords = {"for", "void", "return", "if", "int"};
};

Now using the class with the lexer

std::vector<lexpp::Token> tokens =     lexpp::lex(std::make_shared<MyTokenParser>(data, "\n :,[]{}().\t"));
for(lexpp::Token& token : tokens){
    std::cout << TokenToString(token.type) << " -> " << token.value << std::endl;
}

Making an email parser with lexpp

First a strutto store out data

struct Email{
    std::string name;
    std::string domainFront;
    std::string domainEnd;
    std::string domain;
};

Now we need to make our custom token parser for email parsing

class EmailTokenParser : public lexpp::TokenParser
{
public:
EmailTokenParser(std::string data, std::string separators = "\[email protected]")
:TokenParser(data, separators, true){}

virtual int process_token(std::string& token, bool* discard, bool isSeparator) override
{
    if(isSeparator){
        if(ci == 2){
            currMail.domain = currMail.domainFront + "." + currMail.domainEnd;
            emailIds.push_back(currMail);
            ci = 0;
            *discard = true;
            return 0;  
        }
        if(token.size() <= 0){
            *discard = true;
            return 0;  
        }
        if(token == "\n"){
            ci = 0;
            *discard = true;
            return 0;  
        }
        else if(token == "@"){
            ci = 1;
            *discard = true;
            return 0;                
        }
        else if(token == "."){
            ci = 2;
            *discard = true;
            return 0;                
        }
    }

    if(ci == 0)
        currMail.name = token;
    else if(ci == 1)
        currMail.domainFront = token;
    else if(ci == 2)
        currMail.domainEnd = token;
}    

int ci = 0;
Email currMail;
std::vector<Email> emailIds;
};

Now finallh calling lex

std::shared_ptr<EmailTokenParser> tok_parser = std::make_shared<EmailTokenParser>(data+"\n", "\[email protected]");
lexpp::lex(tok_parser);
for(Email& email : tok_parser->emailIds){
    std::cout << "Email : \nNAME: " << email.name << "\nDOMAIN : " << email.domain << std::endl;
}
Owner
Similar Resources

Simple Stepper Motor Analyzer

 Simple Stepper Motor Analyzer

A DYI stepper motor analyzer. This is a new design that is based on Raspberry Pi Pico and users a compact single PCB design. NOTE: The legacy STM32 based stepper analyzer was moved to this repository https://github.com/zapta/legacy_stepper_motor_analyzer.

Oct 23, 2022

NAND (JEDEC / ONFI) Analyzer for Saleae Logic

NAND (JEDEC / ONFI) Analyzer for Saleae Logic

NandAnalyzer NAND (JEDEC / ONFI) Analyzer for Saleae Logic The plugin was only tested against NV-DDR3 traces (and I use the term "test" lightly). You

Mar 12, 2022

A multimedia framework developed from scratch in C/C++, bundled with test programs and a neat media analyzer.

MiniVideo framework MiniVideo is a multimedia framework developed from scratch in C/C++, bundled with small testing programs and a neat media analyser

Aug 11, 2022

📚 single header utf8 string functions for C and C++

📚 utf8.h A simple one header solution to supporting utf8 strings in C and C++. Functions provided from the C header string.h but with a utf8* prefix

Dec 4, 2022

Fast C/C++ CSS Parser (Cascading Style Sheets Parser)

MyCSS — a pure C CSS parser MyCSS is a fast CSS Parser implemented as a pure C99 library with the ability to build without dependencies. Mailing List:

Sep 22, 2022

Small Header-Only Window and OpenGL Manager.

Small Header-Only Window and OpenGL Manager.

LxDemOWin Linux Demo OpenGL and Window manager A small header-Only Window and OpenGL manager made in C, written in about 2 hours. With some basic code

Oct 23, 2022

Small Header-Only Window and OpenGL Manager.

Small Header-Only Window and OpenGL Manager.

LxDemOWin Linux Demo OpenGL and Window manager A small header-Only Window and OpenGL manager made in C, written in about 2 hours. With some basic code

Dec 11, 2021

RemixDB: A read- and write-optimized concurrent KV store. Fast point and range queries. Extremely low write-amplification.

REMIX and RemixDB The REMIX data structure was introduced in paper "REMIX: Efficient Range Query for LSM-trees", FAST'21. This repository maintains a

Dec 3, 2022

FNC is an Extremely lightweight C++ remake of GNU Cat

FNC is an barebones recreation of GNU CAT in C++ that removes unecessary options, which could be useful if you need to shave down a system to the kilobytes.

Dec 3, 2021
Related tags
Legacy stepper motor analyzer - A DYI minimalist hardware stepper motor analyzer with graphical touch screen.
Legacy stepper motor analyzer - A DYI minimalist hardware stepper motor analyzer with graphical touch screen.

Simple Stepper Motor Analyzer NOTE: This is the legacy STM32 based design which was replaced by the single board, Raspberry Pi Pico design at https://

Oct 23, 2022
C++11 header-only library that offers small vector, small flat map/set/multimap/multiset.

sfl library This is header-only C++11 library that offers several new containers: small_vector small_flat_set small_flat_map small_flat_multiset small

Nov 15, 2022
C Program to input a string and adjust memory allocation according to the length of the string.

C-String C Program to input a string and adjust memory allocation according to the length of the string. With the help of this program, we have replic

Jan 20, 2022
Header-only library providing unicode aware string support for C++

CsString Introduction CsString is a standalone library which provides unicode aware string support. The CsBasicString class is a templated class which

Aug 20, 2022
dwm is an extremely fast, small, and dynamic window manager for X.

dwm - dynamic window manager dwm is an extremely fast, small, and dynamic window manager for X. My Patches This is in the order that I patched everyth

Sep 22, 2022
A header only library that provides parser combinators to C++

This is an experimental C++ library for rapidly building parsers. It is inspired by parser combinators in haskell such as attoparsec and, like those libraries, allows for the construction of fully fledged parsers in a few lines of code.

Jul 24, 2022
Small and dirty header-only library that supports user input with some more advanced features than in the standard lib.

dirty-term Small and dirty header-only library that supports user input with some more advanced features than in the standard lib. This small, lightwe

Apr 24, 2022
Using a RP2040 Pico as a basic logic analyzer, exporting CSV data to read in sigrok / Pulseview

rp2040-logic-analyzer This project modified the PIO logic analyzer example that that was part of the Raspberry Pi Pico examples. The example now allow

Oct 31, 2022