Fast CSV parser and writer for Modern C++

csv2

Table of Contents

CSV Reader

#include <csv2/reader.hpp>

int main() {
  csv2::Reader<delimiter<','>, 
               quote_character<'"'>, 
               first_row_is_header<true>,
               trim_policy::trim_whitespace> csv;
               
  if (csv.mmap("foo.csv")) {
    const auto header = csv.header();
    for (const auto row: csv) {
      for (const auto cell: row) {
        // Do something with cell value
        // std::string value;
        // cell.read_value(value);
      }
    }
  }
}

Performance Benchmark

This benchmark measures the average execution time (of 5 runs after 3 warmup runs) for csv2 to memory-map the input CSV file and iterate over every cell in the CSV. See benchmark/main.cpp for more details.

cd benchmark
g++ -I../include -O3 -std=c++11 -o main main.cpp
./main <csv_file>

Hardware

MacBook Pro (15-inch, 2019)
Processor: 2.4 GHz 8-Core Intel Core i9
Memory: 32 GB 2400 MHz DDR4
Operating System: macOS Catalina version 10.15.3

Results (as of 23 APR 2020)

Dataset File Size Rows Cols Time
Denver Crime Data 111 MB 479,100 19 0.174s
AirBnb Paris Listings 196 MB 141,730 96 0.289s
2015 Flight Delays and Cancellations 574 MB 5,819,079 31 1.047s
StackLite: Stack Overflow questions 870 MB 17,203,824 7 1.505s
Used Cars Dataset 1.4 GB 539,768 25 1.979s
Title-Based Semantic Subject Indexing 3.7 GB 12,834,026 4 5.929s
Bitcoin tweets - 16M tweets 4 GB 47,478,748 9 7.040s
DDoS Balanced Dataset 6.3 GB 12,794,627 85 12.648s
Seattle Checkouts by Title 7.1 GB 34,892,623 11 12.883s
SHA-1 password hash dump 11 GB 2,62,974,241 2 19.505s
DOHUI NOH scaled_data 16 GB 496,782 3213 32.780s

Reader API

Here is the public API available to you:

template <class delimiter = delimiter<','>, 
          class quote_character = quote_character<'"'>,
          class first_row_is_header = first_row_is_header<true>,
          class trim_policy = trim_policy::trim_whitespace>
class Reader {
public:
  
  // Use this if you'd like to mmap and read from file
  bool mmap(string_type filename);

  // Use this if you have the CSV contents in std::string already
  bool parse(string_type contents);

  // Shape
  size_t rows() const;
  size_t cols() const;
  
  // Row iterator
  // If first_row_is_header, row iteration will start
  // from the second row
  RowIterator begin() const;
  RowIterator end() const;

  // Access the first row of the CSV
  Row header() const;
};

Here's the Row class:

// Row class
class Row {
public:
  // Get raw contents of the row
  void read_raw_value(Container& value) const;
  
  // Cell iterator
  CellIterator begin() const;
  CellIterator end() const;
};

and here's the Cell class:

// Cell class
class Cell {
public:
  // Get raw contents of the cell
  void read_raw_value(Container& value) const;
  
  // Get converted contents of the cell
  // Handles escaped content, e.g., 
  // """foo""" => ""foo""
  void read_value(Container& value) const;
};

CSV Writer

This library also provides a basic csv2::Writer class - one that can be used to write CSV rows to file. Here's a basic usage:

#include <csv2/writer.hpp>
#include <vector>
#include <string>
using namespace csv2;

int main() {
    std::ofstream stream("foo.csv");
    Writer<delimiter<','>> writer(stream);

    std::vector<std::vector<std::string>> rows = 
        {
            {"a", "b", "c"},
            {"1", "2", "3"},
            {"4", "5", "6"}
        };

    writer.write_rows(rows);
    stream.close();
}

Writer API

Here is the public API available to you:

template <class delimiter = delimiter<','>>
class Writer {
public:
  
  // Construct using an std::ofstream
  Writer(output_file_stream stream);

  // Use this to write a single row to file
  void write_row(container_of_strings row);

  // Use this to write a list of rows to file
  void write_rows(container_of_rows rows);

Compiling Tests

mkdir build && cd build
cmake -DCSV2_TEST=ON ..
make
cd test
./csv2_test

Generating Single Header

python3 utils/amalgamate/amalgamate.py -c single_include.json -s .

Contributing

Contributions are welcome, have a look at the CONTRIBUTING.md document for more information.

License

The project is available under the MIT license.

Comments
  • Quote character is not getting parsed

    Quote character is not getting parsed

    Hello,

    I have copy pasted the default example to read a CSV file into variable which works fine, but the quote characters are not removed with std::string value; cell.read_value(value);

    My dev environment: OS: Windows 10 Compiler: MSVC 2019

    example csv: a,b,c "Hello", 0.123, "World"

  • could you make a package? (tag/release)

    could you make a package? (tag/release)

    it's easy integrate with xmake package management good for cmake fetchcontent management, too

    example https://github.com/xmake-io/xmake-repo/blob/master/packages/a/abseil/xmake.lua

    i already write a config, if you made the package, i send this PR to xmake repo `package("csv2") set_urls("https://github.com/p-ranav/csv2.git") set_homepage("https://github.com/p-ranav/csv2") set_description("A CSV parser library") set_license("MIT") // add_version()

    on_install(function (package)
        os.cp("include/csv2", package:installdir("include"))
    end)
    on_test(function (package)
        assert(package:has_cxxtypes("csv2::Reader<csv2::delimiter<','>, csv2::quote_character<'\"'>, csv2::first_row_is_header<false>>", 
        {configs = {languages = "c++11"}, includes = "csv2/reader.hpp"}))
    end)
    

    package_end()`

  • Read a single cell. Can you provide sample

    Read a single cell. Can you provide sample

    Hello Pranav, can you provide an example of how to read the contents of a single cell? How to extract the contents that can be a string, an int or a double? I share the request of Amirmasoudabdol, that is to be able to access each cell to read/write the content. Thank you very much. Sergio

  • LICENSE.termcolor doesn't exist

    LICENSE.termcolor doesn't exist

    https://github.com/p-ranav/csv2/blob/68ded29a7af0d6660afc41fb96677462e42578e2/CMakeLists.txt#L56

    I believe this is a copy and paste error and is meant to be LICENSE.mio?

  • warning when compiled with clang 11

    warning when compiled with clang 11

    /n1/env_centos/7.6/include/csv.hpp:6313:20: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted]
    [build]         CSVReader& operator=(CSVReader&& other) = default;
    [build]                    ^
    [build] /n1/env_centos/7.6/include/csv.hpp:6379:23: note: move assignment operator of 'CSVReader' is implicitly deleted because field 'records' has a deleted move assignment operator
    [build]         RowCollection records = RowCollection(100);
    [build]                       ^
    [build] /n1/env_centos/7.6/include/csv.hpp:5984:24: note: copy assignment operator of 'ThreadSafeDeque<csv::CSVRow>' is implicitly deleted because field '_lock' has a deleted copy assignment operator
    [build]             std::mutex _lock;
    [build]                        ^
    [build] /opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/std_mutex.h:95:12: note: 'operator=' has been explicitly marked deleted here
    [build]     mutex& operator=(const mutex&) = delete;
    

    Does it matter?

  • Fix .pc description and url

    Fix .pc description and url

    Fix .pc description and url. PROJECT_DESCRIPTION and PROJECT_URL are not set in the project command so they evaluate to nothing. csv2 requires CMake 3.8 but setting HOMEPAGE_URL in the project command is only available in CMake 3.12 and later and the variables seem to only be used in the .pc file so set the values instead of using substitution.

  • Fix invalid cast in rows()

    Fix invalid cast in rows()

    Hi, I've started using csv2 in my project but got the following error:

    /Users/keichi/Projects/research/csv2/include/csv2/reader.hpp:284:16: error: cannot initialize a variable of type 'char *' with an lvalue
          of type 'const char *const'
        for (char *p = buffer_; (p = (char *)memchr(p, '\n', (buffer_ + buffer_size_) - p)); ++p)
    

    This PR is a quick fix.

  • adding a cell view

    adding a cell view

    Rather than building a full string, returning a string view may be more efficient in some cases (suppose one wants to drop a column). Something like:

    class Cell { [...] std::string_view read_view() const { return std::string_view(buffer_ + start_, end_ - start_); }

    [...] };

    Even for conversion, it's probably enough.

  • Empty file can not be parsed

    Empty file can not be parsed

    I use csv.mmap(filename) to read csv file.

    When the input file is empty (size 0), I expect a zero length output.

      if (csv.mmap(fileName)) {
        for (const auto row : csv)  // should not loop
    

    but I get the following error

    libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: Invalid argument fish: Job 1, '../bin/mp20_client.exe -in 0....' terminated by signal SIGABRT (Abort)

  • Parsing error

    Parsing error

    problem1

    int,string
    1,
    

    One less column is parsed, the last column should be null

    problem2

    int,string,int
    1,"",123
    

    There are 3 columns in total, but only 2 columns can be parsed, '",123' is treated as one column

  • How to solve problem? error: no matching function for call to.

    How to solve problem? error: no matching function for call to.

    In file included from /tmp/tmp.RkhsHRAu07/main.cpp:2:0: /tmp/tmp.RkhsHRAu07/csv2/reader.hpp: In instantiation of ‘bool csv2::Reader<delimiter, quote_character, first_row_is_header, trim_policy>::mmap(StringType&&) [with StringType = const char (&)[8]; delimiter = csv2::delimiter<','>; quote_character = csv2::quote_character<'"'>; first_row_is_header = csv2::first_row_is_header; trim_policy = csv2::trim_policy::trim_characters<' ', '\011'>]’: /tmp/tmp.RkhsHRAu07/main.cpp:13:27: required from here /tmp/tmp.RkhsHRAu07/csv2/reader.hpp:24:11: error: no matching function for call to ‘mio::basic_mmap<(mio::access_mode)0, char>::basic_mmap(const char [8])’ mmap_ = mio::mmap_source(filename); ^ /tmp/tmp.RkhsHRAu07/csv2/reader.hpp:24:11: note: candidates are: In file included from /tmp/tmp.RkhsHRAu07/csv2/reader.hpp:4:0, from /tmp/tmp.RkhsHRAu07/main.cpp:2: /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:216:3: note: mio::basic_mmap<AccessMode, ByteT>::basic_mmap(mio::basic_mmap<AccessMode, ByteT>&&) [with mio::access_mode AccessMode = (mio::access_mode)0; ByteT = char] basic_mmap(basic_mmap &&); ^ /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:216:3: note: no known conversion for argument 1 from ‘const char [8]’ to ‘mio::basic_mmap<(mio::access_mode)0, char>&&’ /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:178:3: note: mio::basic_mmap<AccessMode, ByteT>::basic_mmap() [with mio::access_mode AccessMode = (mio::access_mode)0; ByteT = char] basic_mmap() = default; ^ /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:178:3: note: candidate expects 0 arguments, 1 provided gmake[3]: *** [CMakeFiles/TestReadBigFile.dir/main.o] Error 1 gmake[2]: *** [CMakeFiles/TestReadBigFile.dir/all] Error 2 gmake[1]: *** [CMakeFiles/TestReadBigFile.dir/rule] Error 2 gmake: *** [TestReadBigFile] Error 2

A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

Vince's CSV Parser Motivation Documentation Integration C++ Version Single Header CMake Instructions Features & Examples Reading an Arbitrarily Large

Aug 15, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

BLLIP Reranking Parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006 We request acknowledgement in any publications that

Aug 15, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

Aug 15, 2022
JSON & BSON parser/writer

jbson is a library for building & iterating BSON data, and JSON documents in C++14. \tableofcontents Features # {#features} Header only. Boost license

May 12, 2022
JSONes - c++ json parser & writer. Simple api. Easy to use.

JSONes Just another small json parser and writer. It has no reflection or fancy specs. It is tested with examples at json.org Only standart library. N

Dec 28, 2021
fast-cpp-csv-parser

Fast C++ CSV Parser This is a small, easy-to-use and fast header-only library for reading comma separated value (CSV) files. Features Automatically re

Aug 9, 2022
Fast, gpu-based CSV parser

nvParse Parsing CSV files with GPU Parsing delimiter-separated files is a common task in data processing. The regular way of extracting the columns fr

Jul 19, 2022
Very fast C++ .PNG writer for 24/32bpp images.

fpng Very fast C++ .PNG writer for 24/32bpp images. fpng.cpp was written to see just how fast you can write .PNG's without sacrificing too much compre

Aug 13, 2022
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

Vince's CSV Parser Motivation Documentation Integration C++ Version Single Header CMake Instructions Features & Examples Reading an Arbitrarily Large

Aug 15, 2022
Header-only lock-free synchronization utilities (one writer, many readers).

stupid Header-only lock-free synchronization utilities (one writer, many readers). No queues Base functionality The base functionality of this library

Jun 9, 2022
ZSV/lib: a fast CSV parsing library and standalone utility
ZSV/lib: a fast CSV parsing library and standalone utility

Please note: this code is still alpha / pre-production. Everything here should be considered preliminary. If you like ZSVlib, please give it a star! Z

Aug 16, 2022
Fast C/C++ CSS Parser (Cascading Style Sheets Parser)

MyCSS — a pure C CSS parser MyCSS is a fast CSS Parser implemented as a pure C99 library with the ability to build without dependencies. Mailing List:

Jun 18, 2022
A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON

JSONCONS jsoncons is a C++, header-only library for constructing JSON and JSON-like data formats such as CBOR. For each supported data format, it enab

Aug 15, 2022
Parses existing Chia plotter log files and builds a .csv file containing all the important details

Chia Log Analysis Parses through Chia plotter log files and plops all the juicy details into a CSV file! Build See below for instructions if you prefe

May 10, 2022
Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.
Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.

Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.

Aug 7, 2022
a cpp lib for csv reading and writing

CSV Reader and Writer Author : csl E-Mail : [email protected] OverView Comma separated values (CSV, sometimes called character separated values, becau

Apr 3, 2022
Using a RP2040 Pico as a basic logic analyzer, exporting CSV data to read in sigrok / Pulseview

rp2040-logic-analyzer This project modified the PIO logic analyzer example that that was part of the Raspberry Pi Pico examples. The example now allow

Aug 4, 2022
Simple CSV localization system for Unreal Engine 4
Simple CSV localization system for Unreal Engine 4

BYG Localization We wanted to support fan localization for Industries of Titan and found that Unreal's built-in localization system was not exactly wh

Jul 27, 2022
Lister (Total Commander) plugin to view CSV files
Lister (Total Commander) plugin to view CSV files

csvtab-wlx is a Total Commander plugin to view CSV files. Download the latest version Features Auto-detect codepage and delimiter Column filters Sort

Aug 2, 2022
tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)
tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)

PicoHTTPParser Copyright (c) 2009-2014 Kazuho Oku, Tokuhiro Matsuno, Daisuke Murase, Shigeo Mitsunari PicoHTTPParser is a tiny, primitive, fast HTTP r

Aug 13, 2022