tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)

PicoHTTPParser

Copyright (c) 2009-2014 Kazuho Oku, Tokuhiro Matsuno, Daisuke Murase, Shigeo Mitsunari

PicoHTTPParser is a tiny, primitive, fast HTTP request/response parser.

Unlike most parsers, it is stateless and does not allocate memory by itself. All it does is accept pointer to buffer and the output structure, and setups the pointers in the latter to point at the necessary portions of the buffer.

The code is widely deployed within Perl applications through popular modules that use it, including Plack, Starman, Starlet, Furl. It is also the HTTP/1 parser of H2O.

Check out [test.c] to find out how to use the parser.

The software is dual-licensed under the Perl License or the MIT License.

Usage

The library exposes four functions: phr_parse_request, phr_parse_response, phr_parse_headers, phr_decode_chunked.

phr_parse_request

The example below reads an HTTP request from socket sock using read(2), parses it using phr_parse_request, and prints the details.

char buf[4096], *method, *path;
int pret, minor_version;
struct phr_header headers[100];
size_t buflen = 0, prevbuflen = 0, method_len, path_len, num_headers;
ssize_t rret;

while (1) {
    /* read the request */
    while ((rret = read(sock, buf + buflen, sizeof(buf) - buflen)) == -1 && errno == EINTR)
        ;
    if (rret <= 0)
        return IOError;
    prevbuflen = buflen;
    buflen += rret;
    /* parse the request */
    num_headers = sizeof(headers) / sizeof(headers[0]);
    pret = phr_parse_request(buf, buflen, &method, &method_len, &path, &path_len,
                             &minor_version, headers, &num_headers, prevbuflen);
    if (pret > 0)
        break; /* successfully parsed the request */
    else if (pret == -1)
        return ParseError;
    /* request is incomplete, continue the loop */
    assert(pret == -2);
    if (buflen == sizeof(buf))
        return RequestIsTooLongError;
}

printf("request is %d bytes long\n", pret);
printf("method is %.*s\n", (int)method_len, method);
printf("path is %.*s\n", (int)path_len, path);
printf("HTTP version is 1.%d\n", minor_version);
printf("headers:\n");
for (i = 0; i != num_headers; ++i) {
    printf("%.*s: %.*s\n", (int)headers[i].name_len, headers[i].name,
           (int)headers[i].value_len, headers[i].value);
}

phr_parse_response, phr_parse_headers

phr_parse_response and phr_parse_headers provide similar interfaces as phr_parse_request. phr_parse_response parses an HTTP response, and phr_parse_headers parses the headers only.

phr_decode_chunked

The example below decodes incoming data in chunked-encoding. The data is decoded in-place.

struct phr_chunked_decoder decoder = {}; /* zero-clear */
char *buf = malloc(4096);
size_t size = 0, capacity = 4096, rsize;
ssize_t rret, pret;

/* set consume_trailer to 1 to discard the trailing header, or the application
 * should call phr_parse_headers to parse the trailing header */
decoder.consume_trailer = 1;

do {
    /* expand the buffer if necessary */
    if (size == capacity) {
        capacity *= 2;
        buf = realloc(buf, capacity);
        assert(buf != NULL);
    }
    /* read */
    while ((rret = read(sock, buf + size, capacity - size)) == -1 && errno == EINTR)
        ;
    if (rret <= 0)
        return IOError;
    /* decode */
    rsize = rret;
    pret = phr_decode_chunked(&decoder, buf + size, &rsize);
    if (pret == -1)
        return ParseError;
    size += rsize;
} while (pret == -2);

/* successfully decoded the chunked data */
assert(pret >= 0);
printf("decoded data is at %p (%zu bytes)\n", buf, size);

Benchmark

benchmark results

The benchmark code is from fukamachi/[email protected].

The internals of picohttpparser has been described to some extent in my blog entry.

Owner
Comments
  • Adds an AVX2 option for the parser.

    Adds an AVX2 option for the parser.

    This gives significant performance boost on Haswell CPUs (compared to PCMPESTRI) 1.78X for bench.c up to 2X for some other on i5-4278U, with gcc 4.9.2 -O3 -mavx2 -mbmi2. The change requires the entire parse_headers function to change - instead looking for the next token, it now creates a bitmap of all tokens. When compiled without -mavx2 it uses the old parse_headers. The code was not tested much, so it might contain some bugs, I mostly wrote it to check the performance potential of AVX2 for parsing.

  • Create functions for URL parsing

    Create functions for URL parsing

    I use picohttpparser in one of my projects and I really like simplicity of it.

    One task that comes with http parsing quite often is parsing url, either full url or just "path+query". It would be great to have utility functions that parse url as well.

  • output of phr_decode_chunked ?

    output of phr_decode_chunked ?

    I am calling phr_decode_chunked() as shown in the documentation. If I print the buffer, I see chunk lengths too. If I make the same request with curl binary, it outputs a single body with all the chunks stitched together.

    For example here is the output of the buffer. The ones in bold are the chunk sizes I suppose. (This is a bit messy to read since on the server I am just serving data which is all numbers)

    00000005 3445 00000005 3444 0000000A 3443 3442 00000005 3441 0000011D 3440 3439 3438 3437 3436 3435 3434 3433 3432 3431 3430 3429 3428 3427 3426 3425 3424 3423 3422 3421 3420 3419 3418 3417 3416 3415 3414 3413 3412 3411 3410 3409 3408 3407 3406 3405 3404 3403 3402 3401 3400 3399 3398 3397 3396 3395 3394 3393 3392 3391 3390 3389 3388 3387 3386 3385 3384 00000005 3383 00000069 3382 3381 3380 3379 3378 3377 3376 3375 3374 3373 3372 3371 3370 3369 3368 3367 3366 3365 3364 3363 3362 00000005 3361 00000037 3360 3359 3358 3357 3356 3355 3354 3353 3352 3351 3350 00000005 3349


    I wish there was a full working example in C on how to use this library..

  • stricter validation of header names

    stricter validation of header names

    This PR implements a stricter validation of header names, using the same rules introduced to H2O in https://github.com/h2o/h2o/pull/974. The rule also matches that of Firefox (search nsHttp::IsValidToken in dxr.mozilla.org).

    At the moment, the approach of the PR is to reject handling of a HTTP request with an invalid header name. However, some might want to try to use the request just by ignoring such headers. Note that this function would also be used for decoding an HTTP response sent from upstream servers in case of H2O used as a reverse proxy.

    For that respect, the source code of Firefox states "we skip over mal-formed headers in the hope that we'll still be able to do something useful with the response" (see nsHttpHeaderArray::ParseHeaderLine), however the code looks like that it is actually rejecting such request (see nsHttpTransaction::ParseHead that just bails out when ParseLineSegment (a wrapper function of ParseHeaderLine returns an error).

    Personally, I think rejecting a HTTP request is the way to go, considering the fact that the invalid header might have been processed by an intermediary before being received by an endpoint using picohttpparser, and since the risk of a disagreement between the intermediary and the endpoint cannot be resolved just by dropping the header.

    see also: http://www-archive.mozilla.org/security/announce/2006/mfsa2006-33.html

  • joyent/http-parser

    joyent/http-parser

    Ok, this time for real.

    I have landed https://github.com/joyent/http-parser/pull/200 in http-parser, which should make it quite faster than it was, but still much slower than pico.

    May I ask you to update the graphs?

    Thank you, Fedor.

  • Sse4 integration

    Sse4 integration

    This code is about 20% faster than original on Haswell for bench.c.

    % git co -b tmp origin/kazuho/sse4-integration Branch tmp set up to track remote branch kazuho/sse4-integration from origin. Switched to a new branch 'tmp' has:/home/shigeo/Program/h2o/deps/picohttpparser% gcc -O3 -march=native picohttpparser.c bench.c -g && time ./a.out 2.156u 0.000s 0:02.15 100.0% 0+0k 0+0io 0pf+0w

    % git co sse4-integration Switched to branch 'sse4-integration' has:/home/shigeo/Program/h2o/deps/picohttpparser% gcc -O3 -march=native picohttpparser.c bench.c -g && time ./a.out 1.661u 0.000s 0:01.66 100.0% 0+0k 0+0io 0pf+0w

  • What last_len actually means?

    What last_len actually means?

    The signature of phr_parse_response is:

    /* ditto */
    int phr_parse_response(const char *_buf, size_t len, int *minor_version, int *status, const char **msg, size_t *msg_len,
    struct phr_header *headers, size_t *num_headers, size_t last_len);
    

    How should I provide last_len here? For what last_len is?

  • Questions about how to use picohttpparser?

    Questions about how to use picohttpparser?

    I'd like to use picohttpparser to build a web server using Chez Scheme.

    Anyway, apparently it's possible to only parse the headers. The revelant signature is:

    int phr_parse_headers(const char *buf, size_t len, struct phr_header *headers, size_t *num_headers, size_t last_len);
    

    I am ok with that. But what can I do with the rest of the response in that case? Any pointers?

    I am a newbie regarding building servers and http standard.

    Also If I only parse the headers, what happens to the method and path?

    TIA!

  • Why not hand written Boyer-Moore in is_complete?

    Why not hand written Boyer-Moore in is_complete?

    I was surprised to see is_complete operates on a character at a time, why not this:

    char const*
    find_eom(char const* p, char const* last)
    {
        for(;;)
        {
            if(p + 4 > last)
                return nullptr;
            if(p[3] != '\n')
            {
                if(p[3] == '\r')
                    ++p;
                else
                    p += 4;
            }
            else if(p[2] != '\r')
            {
                p += 4;
            }
            else if(p[1] != '\n')
            {
                p += 2;
            }
            else if(p[0] != '\r')
            {
                p += 2;
            }
            else
            {
                return p + 4;
            }
        }
    }
    
  • Added low_memory error (to indicate no. of headers supplied is not sufficient)

    Added low_memory error (to indicate no. of headers supplied is not sufficient)

    Please consider merging this modification: It tries to provide more useful information to the caller about the case where the parser fails due to the supplied header array not being sufficient.

    This will greatly help adjusting the header-array size (if required) or identify any denial-of-service / overflow kind of attacks.

    The code uses #defines (only inside the .c file) to easily identify the semantics of errors. Please feel free to change the macro names, if required. Also, the header file comment is updated to include this new error number.

    I left the #defines inside the .c file. But if it is ok with you, I really would like to see them inside .h file (so that the callers can use the macro names to compare the error codes, instead of hard-coded -2, -3 etc..). Easy to use in switch case statement. Let me know if the #defines are ok to move into the header file.

    Error number -3 is used for tracking this memory error. So, this should not break compatibility with existing codes built on top of this.

    Thank you for the great work and providing this library.

    • GK (Gopalakrishna)

      http://gk.palem.in/

  • Overly aggressive slowloris check?

    Overly aggressive slowloris check?

    The slowloris check seems to be overly agressive: if I am reading the code correctly, it requires that the entire header arrive by the end of the second call to the parser. The need for this seems to be due to the code not keeping enough state internally.

  • Fix warnings emitted when compiling with -Wsign-conversion

    Fix warnings emitted when compiling with -Wsign-conversion

    These are all long/int to size_t or size_t to ssize_t conversions. I've verified that the conversions are all valid so can be explicitly cast to size_t/ssize_t.

  • Examples

    Examples

    Hello,

    Since the WIKI page is empty, where can I find more information (examples) on basic usage, such as - get the value of a specific header (for example - a custom header X-Test-With, Accept, or the Host header) ?

    And why If I try to use path without path_len and %.*s I get the whole request ?

    Also, how do I do strncmp on headers[k].name or headers[k].value ? Doing it the "normal" way: if (strncmp(headers[i].name, "X-Test-With", 11) == 0) { } -- doesn't detect them at all ....

  • Dead / duplicated code in `is_complete()`

    Dead / duplicated code in `is_complete()`

    Hi

    I'm going through the code in order to educate myself on how these parsers work, and I think I've noticed some dead code in is_complete function:

    https://github.com/h2o/picohttpparser/blob/81fe3d99fd90a55cafb993e53fd3000dbc4d564c/picohttpparser.c#L221-L223

    The while loop above can only terminate from:

    • CHECK_EOF in line 206: https://github.com/h2o/picohttpparser/blob/81fe3d99fd90a55cafb993e53fd3000dbc4d564c/picohttpparser.c#L55-L59
    • EXPECT_CHAR, one line below, in line 207: https://github.com/h2o/picohttpparser/blob/81fe3d99fd90a55cafb993e53fd3000dbc4d564c/picohttpparser.c#L61-L69
    • Return statement in line 217: https://github.com/h2o/picohttpparser/blob/81fe3d99fd90a55cafb993e53fd3000dbc4d564c/picohttpparser.c#L217

    Is this really unreachable or am I missing something? Is there a reason for this code to be there?

    Also, given that EXPECT_CHAR already contains CHECK_EOF, and CHECK_EOF doesn't mutate any state, line 207 duplicates line 206, making the routine check for EOF twice in a row, when *buf == '<CR>'.

  • Support for request body parsing

    Support for request body parsing

    As far as I can see, phr_parse_request has the ability to parse the request headers (+method, url and http version), but not the body, which is essential for POST-requests. It would be really nice, if the parser has an option to extract and return the body

Pushpin is a reverse proxy server written in C++ that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services.
Pushpin is a reverse proxy server written in C++ that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services.

Pushpin is a reverse proxy server written in C++ that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. The project is unique among realtime push solutions in that it is designed to address the needs of API creators. Pushpin is transparent to clients and integrates easily into an API stack.

Aug 4, 2022
Tiny HTTP Server on C, using only standard libraries

hell_o Linux only. Tiny HTTP Server on C, using only standard libraries. It is unfinished yet, going to add working interface and rewrite handler late

Feb 1, 2022
Tiny cross-platform HTTP / HTTPS client library in C.

naett /nɛt:/ Tiny HTTP client library in C. Wraps native HTTP client functionality on macOS, Windows, Linux, iOS and Android in a single, simple non-b

Aug 4, 2022
Pipy is a tiny, high performance, highly stable, programmable proxy written in C++

Pipy is a tiny, high performance, highly stable, programmable proxy. Written in C++, built on top of Asio asynchronous I/O library, Pipy is extremely lightweight and fast, making it one of the best choices for service mesh sidecars.

Aug 9, 2022
http request/response parser for c

HTTP Parser http-parser is not actively maintained. New projects and projects looking to migrate should consider llhttp. This is a parser for HTTP mes

Aug 16, 2022
http request/response parser for c

HTTP Parser http-parser is not actively maintained. New projects and projects looking to migrate should consider llhttp. This is a parser for HTTP mes

Aug 14, 2022
A collection of C++ HTTP libraries including an easy to use HTTP server.
A collection of C++ HTTP libraries including an easy to use HTTP server.

Proxygen: Facebook's C++ HTTP Libraries This project comprises the core C++ HTTP abstractions used at Facebook. Internally, it is used as the basis fo

Aug 8, 2022
cuehttp is a modern c++ middleware framework for http(http/https)/websocket(ws/wss).

cuehttp 简介 cuehttp是一个使用Modern C++(C++17)编写的跨平台、高性能、易用的HTTP/WebSocket框架。基于中间件模式可以方便、高效、优雅的增加功能。cuehttp基于boost.asio开发,使用picohttpparser进行HTTP协议解析。内部依赖了nl

Jul 21, 2022
Gromox - Groupware server backend with MAPI/HTTP, RPC/HTTP, IMAP, POP3 and PHP-MAPI support for grommunio

Gromox is the central groupware server component of grommunio. It is capable of serving as a replacement for Microsoft Exchange and compatibles. Conne

Jul 2, 2022
Jul 27, 2022
Micro http server and client written in C++

httpp Micro http server and client written in C++ The motivation behind this little piece of code is to provide a really simple, yet efficient HTTP se

Aug 4, 2022
A small, minimal HTTP library written in C.

trail - A small, minimal HTTP library written in C. trail is a small, minimal, and easy-to-use HTTP library written in C that supports GET and POST re

Jul 28, 2022
A tiny example how to work with ZigBee stack using JN5169 microcontroller
A tiny example how to work with ZigBee stack using JN5169 microcontroller

Hello NXP JN5169 ZigBee World This is a tiny example how to work with ZigBee stack using JN5169 microcontroller. The example implements a smart switch

Jun 23, 2022
Header-only, event based, tiny and easy to use libuv wrapper in modern C++ - now available as also shared/static library!

Do you have a question that doesn't require you to open an issue? Join the gitter channel. If you use uvw and you want to say thanks or support the pr

Aug 16, 2022
WinINet wrapper - tiny windows HTTPS library, no dependencies.
WinINet wrapper - tiny windows HTTPS library, no dependencies.

WNetWrap A tiny, dependency-free wrapper around WinINet for developers targeting Windows only, who need a lightweight native solution. Inspired by the

Jul 28, 2022
A webserver hosting a bank system for Minecraft, able to be used from web browser or from CC/OC if you're playing modded.

CCash A webserver hosting a bank system for Minecraft, able to be used from web browser or from CC/OC if you're playing modded. Description the curren

Aug 7, 2022
A library with common code used by libraries and tools around the libimobiledevice project

libimobiledevice-glue Library with common code used by the libraries and tools around the libimobiledevice project. Features The main functionality pr

Aug 3, 2022
Our own modification of RakNet used in open.mp

RakNet 2.52 This is a modified version of RakNet 2.52 used in open.mp. Modifications are done to work with SA:MP version of RakNet with some additiona

Jan 3, 2022
A tool for bruteforcing the Camellia key used in the DDON Login Server <-> Client exchange

ddon_common_key_bruteforce Tool for bruteforcing the Camellia key used in the DDON Login Server <-> Client exchange. This works by seeding the PRNG by

Apr 18, 2022