A PostgreSQL extension providing an async networking interface accessible via SQL using a background worker and curl.

Comments
  • feat: libuv

    feat: libuv

    Fixes #46, #13.

    This PR integrates the libuv event loop with curl_multi_socket_action to replace the curl_multi_perform-based approach. Incremental changes wasn't really possible, so this is practically a half-rewrite (sorry about the mess!)

    The main changes are:

    • Use libuv as the event loop. Previously we also have a loop which mostly dealth with TTL and submitting new requests to curl. We integrate these logic into an idle handle - basically these let you run something once per loop iteration.
    • Unlike the previous curl_multi_perform implementation, we now submit requests to curl one by one, each in its own transaction. This fixes #13 and sidesteps the need for the batch_size GUC, so I omitted it.
    • The previous implementation stores per-request information in a global hash table. We sidestep the need for one by instead using a "request context" that is associated with each curl easy handle using CURLOPT_PRIVATE.
    • Handles all memory leaks that I can eyeball (whatever that's worth).
    • The request/response handling logic is mostly left unchanged.
    • Adds a basic .clang-format. Feel free to configure it if you prefer.

    Caveats:

    • Fundamentally, event loops make understanding runtime behavior harder by exploding the state space of the program. Hopefully this is mitigated by judicious elogs, but it's not perfect.
    • There's one more known failure mode, explained here.

    Blockers:

    • replicate & fix the issue with invalid URLs & https
  • some remarks

    some remarks

    1. https://github.com/supabase/pg_net/blob/04dd22fb7cf267d55057deccdf70aed373dc7032/src/worker.c#L434-L436 No! idle_cb function is called in TopMemoryContext context, and it does not pfree palloc'ed like https://github.com/supabase/pg_net/blob/04dd22fb7cf267d55057deccdf70aed373dc7032/src/worker.c#L452 https://github.com/supabase/pg_net/blob/04dd22fb7cf267d55057deccdf70aed373dc7032/src/worker.c#L486 https://github.com/supabase/pg_net/blob/04dd22fb7cf267d55057deccdf70aed373dc7032/src/worker.c#L490 etc
  • To 0.5

    To 0.5

    Small improvements to make pg_net a bit more stable.

    Closes https://github.com/supabase/pg_net/issues/61 and https://github.com/supabase/pg_net/issues/44.

    Also reducing unnecessary work done by pg_net to address cases where there is high amount of requests - ref(slack link)

  • Pathological behavior when `http_request_queue` grows too big

    Pathological behavior when `http_request_queue` grows too big

    Hypothetical scenario:

    • a bug in pg_net causes requests to accumulate in http_request_queue
    • a patch is released
    • all requests are fired off at once
    • DB has trouble keeping up and causes OOM/EMFILE/etc.
    • worker enters crashloop
    • DB instance is forever sad

    I haven't tried reproducing the above scenario, but we should take preventive measures against this. Some possible actions:

    • limiting max amount of requests in progress
    • TRUNCATE all requests on worker restart (so it doesn't enter crashloop)
  • How does pg_net fit into the overall supabase architecture?

    How does pg_net fit into the overall supabase architecture?

    Hey,

    we are making quite heavy use of pg_net, mainly to make a request to an external api from within a trigger. For some of these, we will now migrate to a postgres-based queue (postgraphile-worker / pg-boss) to have better observability.

    During the migration, I began to wonder how pg_net fits into the supabase architecture, for what use-cases it is supposed to be used and what its limitations are, especially regarding performance.

    As an example, we are using segment to collect usage data and want to track events from within the database. One option would be to use pg_net to make a request to the Segment API. We created a simple segement.track function which makes the request using pg_net. When our usage grows, we might get a few events per seconds. How much requests is pg_net supposed to handle? As an alternative, we could either send the requests batch-wise or use a queueing solution and drop pg_net. Further, a webhook or queuing plugin for realtime would probably be a more scalable alternative.

    Is there any official guidance on the scalability and intend of pg_net?

    Best Philipp

  • Handle user referencing private tables

    Handle user referencing private tables

    IIRC, this issue happened because a user created a foreign key to our queue pk

    https://github.com/supabase/pg_net/blob/5ff8b6b6f0d61f4c2ea04e1c2967548474fcd32e/sql/pg_net.sql#L11-L12

    Then the TTL would not work because doing DELETEs failed, this would also kill the worker on every restart.

    Maybe the correct behavior in this case is to warn instead of crashing - though dying might also be good to avoid having excessive rows in the queue.

    Originally posted by @steve-chavez in https://github.com/supabase/pg_net/issues/43#issuecomment-925161384

  • feat: support POST

    feat: support POST

    What kind of change does this PR introduce?

    Feature

    What is the new behavior?

    Support POST requests via net.http_post(). Right now the function is mostly copypasting from http_get(), but I think in the future this should be refactored to use a (public?) generic function net.http().

    @olirice the order of arguments to net.http_post() is a bit out of place since required arguments have to be listed first - let me know if you want them changed. Also I see I'm failing the pytest stuff - anything I should look at? Do you want me to add a test as well? (haven't looked at how it's setup)

    @steve-chavez CMIIW, we currently ignore params and headers in the request right? Atm content_type is handled separately, but I think we should handle it together with headers. In retrospect, I probably should've omitted content_type in the SQL too... (was following pgsql-http) Also, let me know if I've freed the palloc'd stuff properly.

    Additional context

    Closes #6.

    TODOs

    • [x] merge Content-Type into headers
    • [x] convert body to byte array to align with #5
    • [x] http_post fails when body is null
    • [ ] check memory leak
  • >1024 concurrent requests crashes background worker

    >1024 concurrent requests crashes background worker

    When a large number of requests starts at the same time (> 1024) the worker crashes

    Run a webserver locally:

    Process 1

    python3 -m http.server
    

    SQL

    create extension pg_net;
    select net.http_get('http://0.0.0.0:8000') from generate_series(1,5000);
    

    Error

    postgres=# 2021-07-23 16:20:16.307 CDT [32736] ERROR:  CURL error code: 56 
    
  • Handle `Server return ed nothing (no headers, no data)`(CURLE_GOT_NOTHING)

    Handle `Server return ed nothing (no headers, no data)`(CURLE_GOT_NOTHING)

    Once this error is returned, the worker stops processing new requests.

    (needs reproducing first, I need to setup an http server that produces this invalid response)

    We also need to enable more tracing to debug the worker state in production.

  • pg_net causes database to crash if url is null

    pg_net causes database to crash if url is null

    Bug report

    Describe the bug

    If the url is null, the database crashes

    To Reproduce

    supabase init supabase start docker restart <your-db-container-name> (see https://github.com/supabase/cli/issues/137) Go to the sql editor and run the following code

    create extension if not exists pg_net;
    
    create table if not exists test (
      id uuid primary key default gen_random_uuid()
    );
    
    create or replace function test_trigger()
      returns trigger
      language plpgsql
      security invoker
    as
    $$
    begin
      perform
        net.http_post(
            url:=null,
            body:='{"hello": "world"}'::jsonb
        );
      return new;
    end
    $$;
    
    create trigger call_test_trigger
    after insert on test
    for each row execute procedure test_trigger();
    
    insert into test default values;
    

    Result: Connection terminated unexpectedly

    Expected behavior

    An exception is raised telling the user that the url cannot be null.

    System information

    • OS: macOS
  • pg_net and http exts have the same http_method domain

    pg_net and http exts have the same http_method domain

    Bug report

    Describe the bug

    If a user already has http extension on their database and wants to create pg_net extension, or vice versa, then they will get type "http_method" already exists error.

    To Reproduce

    1. create extension http;
    2. create extension pg_net;
    3. See error: type "http_method" already exists

    System information

    • Version of Postgres: 12.5
  • expand pg_net functionality with more operations and other data types

    expand pg_net functionality with more operations and other data types

    Feature request

    expand pg_net functionality with more operations and other data types

    Is your feature request related to a problem? Please describe.

    Yes. I would like to use pg_net to access REST micro-services in an asynchronous manner, when those services rely on other HTTP methods besides just GET, such as PUT, PATCH, and DELETE. Moreover, sometimes those services work with payloads that are not JSON and therefore cannot be passed to a PostgreSQL function as a json or jsonb data type.

    Describe the solution you'd like

    In addition to the existing net.http_get(url text, params jsonb, headers jsonb, timeout_milliseconds int) and net.http_post(url text, params jsonb, headers jsonb, timeout_milliseconds int) functions, I would like for there to be a master function net.http(request http_request, timeout_milliseconds int) function similar to the http.http(request http_request) function in the psql-http extension. Like in that extension, http_request would be a data type that has both a method and a content attribute, the latter being varchar. This would be enough to support other HTTP methods and other payloads.

    Describe alternatives you've considered

    I have considered and even used the synchronous http extension in conjunction with custom tables and the pg_cron extension to (re)implement a pseudo-async processor, but it's cumbersome and duplicative of the work that's in the pg_net extension.

    Additional context

    No other context is relevant.

  • pg_net worker on hosted not running/requests not being executed

    pg_net worker on hosted not running/requests not being executed

    Bug report

    Describe the bug

    After installing the pg_net extension via Database->Extensions GUI the Queries do not get executed, as the background worker is not running. tested this on self-hosted and it's working like intended there

    To Reproduce

    Steps to reproduce the behavior, please provide code snippets or a repository:

    1. Go to Databases -> Click on Extensions -> Enable pg_net
    2. Switch to SQL Tab and execute following query select net.check_worker_is_up()
    3. Failed to run sql query: the pg_net background worker is not up Other way to test:
    4. Go to Databases -> Click on Extensions -> Enable pg_net
    5. Switch to SQL Tab and execute following query select net.http_get('https://filesamples.com/samples/code/json/sample2.json') and take note of the responding id.
    6. select net.http_collect_response(id) - id = result from query before
    7. get error "request matching request_id not found"

    Expected behavior

    check_worker_is_up should return the process id of the running worker. http_collect_response should return the status success and data along with it success (this is on my local supabase cli instance)

    Screenshots

    As you can see, the response is not collect and instead it tells the request failed collect_response

    As you can see here the worker is not running. worker

    Additional context

    This works on local. The error "request matching request_id not found" only comes to be, as a result of the request not being executed and instead "vanishing"

  • Timeout was reached error with incorrect status: SUCCESS

    Timeout was reached error with incorrect status: SUCCESS

    Bug report

    1. Run supabase/postgres docker
    2. Run sql
    CREATE EXTENSION IF NOT EXISTS pg_net;
    select
        net.http_post(
            url:='https://httpbin.org/post',
            body:='{"hello": "world"}'::jsonb
        );
    
    select
      *
    from
      net.http_collect_response(1);
    

    @see

    pg_net: Timeout was reached - Watch Video

    Notice how the status says SUCCESS, but upon inspecting the net._http_response table, it says timeout was reached. Additionally it appears that the request was never made according to the remote server.

  • permission denied for table http_request_queue as an authenticated user

    permission denied for table http_request_queue as an authenticated user

    Bug report

    Describe the bug

    After pg_net got upgraded, it throws a permission denied for table http_request_queue when trying to make a request as an authenticated user.

    To Reproduce

    Upgrade to the latest version and attempt to make a http request using the authenticated role.

    Expected behavior

    Assumption is that any supabase extension works out of the box with the role and auth setup in the database. If it is expected behaviour, it should be documented somewhere.

  • Allow http DELETE

    Allow http DELETE

    Feature request

    Is your feature request related to a problem? Please describe.

    We want to send a delete request to a serverless function within a Nextjs api to trigger the deletion of the entity at an external provider after it was deleted in our database.

    Describe the solution you'd like

    Support http delete in addition to post and get.

    Describe alternatives you've considered

    Adding another endpoint, e.g. /delete, and sending a post request there

  • Change table queue to in-memory queue and add callbacks

    Change table queue to in-memory queue and add callbacks

    Reasons

    • For all the requests to finish, an INSERT on the _http_response table must be done, this reduces throughput. There are cases where the client doesn't care about the response so it doesn't need to be persisted. For example, function hooks don't do anything with the response.
    • As discussed in https://github.com/supabase/pg_net/pull/50#issuecomment-953796674, we don't have to ensure delivery and do retries since we're just an HTTP client, so we don't really need to persist the requests into http_request_queue.
    • The http_request_queue can grow big with many requests and cause production issues(internal link).
    • Users create FKs to our private tables https://github.com/supabase/pg_net/issues/44, making the bg worker crash when trying to do the TTL.

    Proposal

    Drop the _http_response and http_request_queue tables and instead use an in-memory queue, plus add two callbacks:

    create or replace function net.http_get(
       url text,
    -- ..,
       success_cb text default '',
       error_cb text default ''
    )
    returns void -- no id is returned now
    as $$
    -- ...
    $$ language sql;
    

    Which can be used like:

    select net.http_get(
      url := 'http://domain.com',
    , success_cb := $$ insert into my_table values ($1, $2, $3) $$ -- $1=status, $2=headers, $3=body
    , error_cb := $$ do $_$begin raise exception 'Failed request on %: "%"', $1, $2; end$_$; $$ -- $1=url, $2=error_message
    );
    

    Pros

    • The callbacks are optional, so we don't have to start a transaction for each request if there are none provided. Throughput should be greatly increased from this.
    • The error_cb can also be an insert/update on a table, so the request can be retried if needed.

    @olirice @soedirgo WDYT?

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

Jan 7, 2023
PolarDB for PostgreSQL (PolarDB for short) is an open source database system based on PostgreSQL.
PolarDB for PostgreSQL (PolarDB for short) is an open source database system based on PostgreSQL.

PolarDB for PostgreSQL (PolarDB for short) is an open source database system based on PostgreSQL. It extends PostgreSQL to become a share-nothing distributed database, which supports global data consistency and ACID across database nodes, distributed SQL processing, and data redundancy and high availability through Paxos based replication. PolarDB is designed to add values and new features to PostgreSQL in dimensions of high performance, scalability, high availability, and elasticity. At the same time, PolarDB remains SQL compatibility to single-node PostgreSQL with best effort.

Dec 31, 2022
PostgreSQL extension for pgexporter

pgexporter_ext pgexporter_ext is an extension for PostgreSQL to provide additional Prometheus metrics for pgexporter. Features Disk space metrics See

Apr 13, 2022
Distributed PostgreSQL as an extension
Distributed PostgreSQL as an extension

What is Citus? Citus is a PostgreSQL extension that transforms Postgres into a distributed database—so you can achieve high performance at any scale.

Dec 30, 2022
A framework to monitor and improve the performance of PostgreSQL using Machine Learning methods.
A framework to monitor and improve the performance of PostgreSQL using Machine Learning methods.

pg_plan_inspector pg_plan_inspector is being developed as a framework to monitor and improve the performance of PostgreSQL using Machine Learning meth

Dec 27, 2022
Modern cryptography for PostgreSQL using libsodium.

pgsodium pgsodium is an encryption library extension for PostgreSQL using the libsodium library for high level cryptographic algorithms. pgsodium can

Dec 23, 2022
An SQLite binding for node.js with built-in encryption, focused on simplicity and (async) performance

Description An SQLite (more accurately SQLite3MultipleCiphers) binding for node.js focused on simplicity and (async) performance. When dealing with en

May 15, 2022
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Jan 9, 2023
C++11 Lightweight Redis client: async, thread-safe, no dependency, pipelining, multi-platform
C++11 Lightweight Redis client: async, thread-safe, no dependency, pipelining, multi-platform

C++11 Lightweight Redis client: async, thread-safe, no dependency, pipelining, multi-platform

Jan 8, 2023
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

Jan 4, 2023
A lightweight header-only C++11 library for quick and easy SQL querying with QtSql classes.

EasyQtSql EasyQtSql is a lightweight header-only C++11 library for quick and easy SQL querying with QtSql classes. Features: Header only C++11 library

Dec 30, 2022
A type safe SQL template library for C++

sqlpp11 A type safe embedded domain specific language for SQL queries and results in C++ Documentation is found in the wiki So what is this about? SQL

Dec 30, 2022
DuckDB is an in-process SQL OLAP Database Management System
DuckDB is an in-process SQL OLAP Database Management System

DuckDB is an in-process SQL OLAP Database Management System

Jan 3, 2023
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

Jan 2, 2023
A bare-bone SQL implementation

MiniSQL A bare-bone SQL implementation. Project Structure include folder contains header files of all modules. These header files are meant to be shar

Apr 23, 2022
PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider: High-Performance SQL Cluster Engine for distributed big data.

Sep 8, 2022
A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.

QTL QTL is a C ++ library for accessing SQL databases and currently supports MySQL, SQLite, PostgreSQL and ODBC. QTL is a lightweight library that con

Dec 12, 2022
The official C++ client API for PostgreSQL.

libpqxx Welcome to libpqxx, the C++ API to the PostgreSQL database management system. Home page: http://pqxx.org/development/libpqxx/ Find libpqxx on

Jan 3, 2023
Prometheus exporter for PostgreSQL

pgexporter pgexporter is a Prometheus exporter for PostgreSQL. pgexporter will connect to one or more PostgreSQL instances and let you monitor their o

Dec 22, 2022