Modern transactional key-value/row storage library.



Sophia is advanced transactional MVCC key-value/row storage library.

How does it differ from other storages?

Sophia is RAM-Disk hybrid storage. It is designed to provide best possible on-disk performance without degradation in time. It has guaranteed O(1) worst case complexity for read, write and range scan operations.

It adopts to expected write rate, total capacity and cache size. Memory requirements for common HDD and Flash drives can be seen Here.

What is it good for?

For server environment, which requires lowest latency write and read, predictable behaviour, optimized storage schema and transaction guarantees. It can efficiently work with large volumes of ordered data, such as a time-series, analytics, events, logs, counters, metrics, full-text search, common key-value, etc.

Features

  • Full ACID compliancy
  • MVCC engine
  • Optimistic, non-blocking concurrency with N-writers and M-readers
  • Pure Append-Only
  • Unique data storage architecture
  • Fast: O(1) worst for read, write and range scan operations
  • Multi-threaded compaction
  • Multi-databases support (sharing a single write-ahead log)
  • Multi-Statement and Single-Statement Transactions (cross-database)
  • Serialized Snapshot Isolation (SSI)
  • Optimized storage schema (numeric types has zero-cost storage)
  • Can be used to build Secondary Indexes
  • Upsert (fast write-only 'update or insert' operation)
  • Consistent Cursors
  • Prefix search
  • Automatic garbage-collection
  • Automatic key-expire
  • Hot Backup
  • Compression (no fixed-size blocks, no-holes, supported: lz4, zstd)
  • Direct IO support
  • Use mmap or pread access methods
  • Simple and easy to use (minimalistic API, FFI-friendly, amalgamated)
  • Implemented as small C-written library with zero dependencies
  • Carefully tested
  • Open Source Software, BSD

Support

Sophia Documentation and Bindings for the most common languages are available on the website.

Please use Official Sophia Google Group or StackOverflow to ask any general questions.
More information is available Here.

Comments
  • Shared and static library symbol issue

    Shared and static library symbol issue

    For some reason the compiled libraries don't have matching symbols. I'm not familiar with compiling shared and static libraries or I'd send a PR.

    objdump -D libsophia.a | grep "<sp_recover>:" -
    # Outputs the line with the sp_recover function symbol.
    objdump -D libsophia.so.1.1 | grep "<sp_recover>:" -
    # Nothing is outputted.
    
  • Is this project still active?

    Is this project still active?

    Wondering if this project is still seeing active development? Looks like it's been mostly silent for the last year.

    Is there a roadmap, plans for 2.3 or 3.0?

  • Slow

    Slow "get" operations?

    I've written a short benchmarking script that takes a handful of ordered, embedded key-value databases and does various operations on them. I've noticed that Sophia is an order of magnitude slower when doing "get" operations for individual keys (both in order and in random order).

    The way the benchmark operates is to:

    1. Insert 250,000 key/value pairs of the format "00000001" -> "00000001" ... "00250000" -> "00250000".
    2. Iterating from 1 to 250,000, fetch each value at the given key.
    3. Iterating from 1 to 250,000 in random order, fetch each value at the given key.

    I am using vanilla Sophia configuration (2.1.1) with a single string key as the index.

    The times I'm getting look like this:

    berkeleydb
    Sets:         Took 1.1932
    Gets:         Took 1.3449
    Random Gets:  Took 2.3432
    ------------------------------------------------------------
    kc-tree
    Sets:         Took 0.384
    Gets:         Took 0.5559
    Random Gets:  Took 0.7486
    ------------------------------------------------------------
    leveldb
    Sets:         Took 0.5054
    Gets:         Took 0.6866
    Random Gets:  Took 0.9779
    ------------------------------------------------------------
    sophia
    Sets:         Took 0.7066
    Gets:         Took 4.2484
    Random Gets:  Took 5.6793
    ------------------------------------------------------------
    lsm
    Sets:         Took 0.967
    Gets:         Took 0.6051
    Random Gets:  Took 0.8536
    ------------------------------------------------------------
    

    I should note that Sophia seems to be performing about on par with the other databases for writes and (not shown) reading ranges of keys, but when it comes to reading single key/value pairs it is quite slow.

    Is this expected? What's going on here?

  • Multiple databases in the same environment context

    Multiple databases in the same environment context

    Hi, Dmitry! Have a question.

    #include <stdio.h>
    #include "sophia.h"
    
    void x(void) {
        void *env = sp_env();
        void *ctl = sp_ctl(env);
    
        sp_set(ctl, "sophia.path", "./storage");
    
        sp_set(ctl, "db", "x");       /* "x" database */
        void *dbx = sp_get(ctl, "db.x");
    
        sp_open(env);
    
        void *o = NULL;
    
        char key[] = "foo";
        char val[] = "bar";
    
        o = sp_object(dbx);
        sp_set(o, "key",   key, sizeof(key));
        sp_set(o, "value", val, sizeof(val));
        sp_set(dbx, o);
    
        sp_destroy(env);
    }
    
    int y(void) {
        void *env = sp_env();
        void *ctl = sp_ctl(env);
    
        sp_set(ctl, "sophia.path", "./storage");
    
        sp_set(ctl, "db", "y");       /* "y" database */
        void *dby = sp_get(ctl, "db.y");
    
        sp_open(env);
    
        void *o = NULL;
    
        char key[] = "foo";
    
        o = sp_object(dby);
        sp_set(o, "key", key, sizeof(key));
        void *result = sp_get(dby, o);
        if (result) {
            char *value = sp_get(result, "value", NULL);
            printf("%s\n", value);
            sp_destroy(result);
        }
    
        sp_destroy(env);
    }
    
    int main(void) {
        x();
        y();
    }
    /* prints bar */
    

    Why? Shared logs directory between databases? I do not understand how you can set different logs directories for different databases in the same environment context. Is it possible to work with multiple databases in the same environment context?

    Thanks.

  • Added conditional compilation for OSX (Darwin)

    Added conditional compilation for OSX (Darwin)

    Out of the box sophia won't compile for OS X, so I made it work and added the changes back to the Makefiles so that it will compile on both Linux and OS X. The Linux section is unchanged, flag-wise.

  • Protect against spurious wakeups

    Protect against spurious wakeups

  • Serialized Snapshot Isolation (SSI) anomalies

    Serialized Snapshot Isolation (SSI) anomalies

    In sophia homepage, one of the indicated features is: Serialized Snapshot Isolation (SSI) consistency level.

    In some test have experienced some anomalies which shouldn't be allowed under SSI.

    The definition of SSI from https://wiki.postgresql.org/wiki/SSI is:

    With true serializable transactions, if you can show that your transaction will do the right thing if there are no concurrent transactions, it will do the right thing in any mix of serializable transactions or be rolled back with an error.

    [...] problems which can occur with certain combinations of transactions at the REPEATABLE READ transaction isolation level, [...] are avoided at the SERIALIZABLE transaction isolation level.

    Wikipedia says about Snapshot isolation

    A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken.

    It implies that in a SSI transaction not only you can Read Your Own Writes, but reads should be Repeatable.

    This is also reiterated here: Serializable Isolation for Snapshot Databases paper.

    Now it appear that this guarantee is violated in the following test:

    Assuming we have a db with two keys A and B both of value 5. The invariant is: The sum of A + B should be always 10. Nobody should be able to observe a invalid state.

    | Time | Tx1 | Tx2 | Note | |------|-------------|---------------|--------------------| | | | | Initial state | | | | | A=5, B=5 | |------|-------------|---------------|--------------------| | t0 | BEGIN TX | | | | t1 | READ(A)=5 | | | | t2 | | BEGIN TX | | | t3 | | READ(A)=5 | | | | | READ(B)=5 | | | t4 | | WRITE(A)=6 | | | | | WRITE(B)=4 | | | t5 | | COMMIT = OK | | | | | | | | t6 | READ(B)=4 | | INVALID! A+B = 9 |

    However, it is say, that any attempt to write and commit on Tx1 will fail, however a read-only transaction can still observe an invalid state (and potentially side effecting the wrong info).

    Doing further test I've concluded that the this is due to the fact that READ REPEATABLE is not provided. For example:

    | Time | Tx1 | Tx2 | Note | |------|-------------|---------------|--------------------| | | | | Initial state | | | | | A=5, B=5 | |------|-------------|---------------|--------------------| | t1 | READ(A)=5 | | | | t2 | | BEGIN TX | | | t3 | | READ(A)=5 | | | | | WRITE(A)=6 | | | t5 | | COMMIT = OK | | | | | | | | t6 | READ(A)=6 | | INVALID! A changed |

    From this test we can observe that reads are NOT repeatable. This seams to happen for UPDATES and DELETES but not for INSERTS.

    In a Snapshot isolation level Repeatable reads must be guaranteed. This is very important to guarantee that side effecting transaction will see a consistent view of the world.

    As result of this test it seems that the ACTUAL consistency level provided by Sophia (v2.2) is READ COMMITTED with Read-Your-Own-Writes.

  • Compaction mode per db setting

    Compaction mode per db setting

    From http://sophia.systems/v2.1/admin/ram.html

    sp_setstring(env, "db.test.storage", "in-memory", 0);
    
    // this would set the compaction mode for all dbs right?
    sp_setint(env, "compaction.0.mode", 1);
    

    Allow:

    // per db setting?
    sp_setint(env,  "db.another.storage", "compaction.0.mode", 1);
    
    sp_setint(env,  "db.yetanother.storage", "compaction.0.mode", 2);
    
    // while the other dbs would have the default compaction mode
    
  • Don't hardcode GYP file into creating static libraries

    Don't hardcode GYP file into creating static libraries

    Hi there!

    I wanted to build a shared library on OSX (SONAME is dylib here) and since the builtin makefile uses the .so prefix I thought I'd use GYP, but the bundled GYP file is only able to build static libraries. With this change one can build static or shared libraries depending on what the value of 'library' is set to. Here is an example of how to build a shared library on OSX, using plain makefiles as the GYP output:

    
    gyp --depth=. --generator-output out -f make -Dlibrary=shared_library sophia.gyp
    make -C out
    

    @mmalecki: I hope this doesn't break your node binding build process :-)

  • Understanding 'open environment' and 'get db object' invocation order

    Understanding 'open environment' and 'get db object' invocation order

    Hello guys!

    Question

    Does the order of invocation the sp_open(env) and sp_getobject(env, "db.test") statements matter?

    I tried both cases and it worked in each case. So obviously it doesn't but I tried trivial ones. So is there any case when it does matter?

    Why am I concerned

    CRUD example says:

    void *db = sp_getobject(env, "db.test");
    int rc = sp_open(env);
    

    Common Workflow example says:

    sp_open(env);
    void *db = sp_getobject(env, "db.test");
    

    Thank you!

  • Implement reverse ordering for u32/u64

    Implement reverse ordering for u32/u64

    This allows reverse ordering in a simple way. Maybe name "-u32" to "u32_desc"? If accepted, I will come up with some test cases. I have some in rust-sophia already.

  • Dead links in readme

    Dead links in readme

    Front page for v2.1 has two dead links in quickstart section:

    http://sphia.org/documentation.html http://sphia.org/clients.html

    Where do I find getting started and API documentation for v2.1?

  • database with same scheme

    database with same scheme

    I found that the contents of two database with same scheme in the same environment always synced with each other , like they were the same one. Is this the natural behavior? Or I got something wrong?

  • Idea - best effort treatment for certain key-values

    Idea - best effort treatment for certain key-values

    Nowadays many setups do actually require a seamless combination of "ACID" key-value pairs and "best effort" key-value pairs (imagine analogy TCP packets and UDP packets treatment).

    Would you consider adding support for a "tag" to distinguish between an "ACID" request on a key-value pair (or set of pairs) and a "best effort" request on a key-value pair (or set of pairs)?

    I could imagine under high contention loads Sophia would simply ignore some ACID invariants when processing a request tagged with "best effort" or in the worst case simply drop the whole request (pretty much like routers and switches do with UDP).

  • CURP - fast replication nearly for free

    CURP - fast replication nearly for free

    There is a "new" approach to very fast replication called CURP (Consistent Unordered Replication Protocol).

    It doesn't have any of the caveats discussed and referenced in https://github.com/pmwkaa/sophia/issues/31 and https://github.com/pmwkaa/sophia/issues/51 and shall not be difficult to implement.

    Would you consider implementing such low hanging fruit?

  • Iteration with multipart key

    Iteration with multipart key

    Hi Dima,

    I started driver Sophia.cr for Sophia on Crystal Lang. In near future i want to build TSDB over Sophia.cr for use in monitoring and alerting for own projects.

    I tried but did not understand C code which support iteration by multipart key. I found driver Sophy for Python where Coleifer implemented Json, MsgPack and UUID types. It is beautiful and gives good opportunities, and I would like to implement the same types and little more.

    Do I understand correctly that Sophia pack and store multipart key as bytes with delimiter (in scheme) and unpacks during the iteration process? How can I participate in this process? Could you explain this?

Related tags
Spacex Storage is an offchain storage work inspector of Mannheim Network running inside TEE enclave.

Spacex Storage Spacex Storage is an offchain storage work inspector of Mannheim Network running inside TEE enclave. Contribution Thank you for conside

Mar 22, 2022
A high performance, shared memory, lock free, cross platform, single file, no dependencies, C++11 key-value store
A high performance, shared memory, lock free, cross platform, single file, no dependencies, C++11 key-value store

SimDB A high performance, shared memory, lock free, cross platform, single file, no dependencies, C++11 key-value store. SimDB is part of LAVA (Live A

Jun 25, 2022
⌨️ Personal key mapping for The Key.
⌨️ Personal key mapping for The Key.

The Key Personal key mapping for The Key. This firmware configures: The first key to be the mute key on single tap, and the pause/play key on double t

Dec 25, 2021
Allows to swap the Fn key and left Control key and other tweaks on Macbook Pro and Apple keyboards in GNU/Linux

A patched hid-apple kernel module UPDATE August 2020: swap_fn_leftctrl is now built-in in Linux 5.8 ?? UPDATE Jun 2020: New feature added (swap_fn_f13

Aug 6, 2022
A modern-day Boss Key software tool. Switch instantly from work to play & play to work with Bosky.

Bosky By: Seanpm2001, Bosky-dev Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrika

Nov 11, 2021
Intuitive & Powerful C++20 consteval metaprogramming library(via value).

meta-value-list This library provides a bunch of consteval toolsets to do metaprogramming, and provides the pipeline syntactic sugar for function comb

Aug 5, 2022
Intuitive & Powerful C++20 consteval metaprogramming library(via value).

meta-list This library provides a bunch of consteval toolsets to do metaprogramming, and provides the pipeline syntactic sugar for function combinatio

Aug 5, 2022
Aug 13, 2022
This repository is for everyone for Hacktoberfest 2021. Anyone can contribute anything for your Swags (T- Shirt), must be relevant that can add some value to this repository.
This repository is for everyone for Hacktoberfest 2021. Anyone can contribute anything for your Swags (T- Shirt), must be relevant that can add some value to this repository.

Hacktober Fest 2021 For Everyone! Upload Projects or Different Types of Programs in any Language Use this project to make your first contribution to a

Dec 21, 2021
An AI for playing NES Tetris at a high level. Based primarily on search & heuristic, with high quality board evaluation through value iteration.

StackRabbit An AI that plays NES Tetris at a high level. Primarily based on search & heuristic, with high-quality board eval through value iteration.

Aug 6, 2022
🔍 A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.
🔍 A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.

?? ImHex A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM. Supporting If you like my work, plea

Aug 10, 2022
Poseidon OS (POS) is a light-weight storage OS

Poseidon OS Poseidon OS (POS) is a light-weight storage OS that offers the best performance and valuable features over storage network. POS exploits t

Jul 22, 2022
MinIO C++ Client SDK for Amazon S3 Compatible Cloud Storage

The MinIO C++ Client SDK provides simple APIs to access any Amazon S3 compatible object storage.

Aug 9, 2022
XTAO Unified Distributed Storage

Anna - A branch project from CEPH Anna is a XTAO project branched from CEPH distributed storage. CEPH is a nice opensource project for unified distrib

Nov 12, 2021
A simple console client for pCloud cloud storage.

pCloud Console Client A simple console client for pCloud cloud storage. Project Information pCloud Console Client was forked from the project initiall

Jun 23, 2022
The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.
The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.

OpenEXR OpenEXR provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the mot

Aug 13, 2022
A demonstration PoC for CVE-2022-21877 (storage spaces controller memory leak)
A demonstration PoC for CVE-2022-21877 (storage spaces controller memory leak)

POC CVE-2022-21877 This repository contains a POC for the CVE-2022-21877, found by Quang Linh, working at STAR Labs. This is an information leak found

Mar 8, 2022