Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability


English | 中文
A distributed, scalable, lightning-fast graph database

build docker image workflow make nebula packages workflow nebula star nebula fork codecov

What is Nebula Graph

Nebula Graph is an open-source graph database capable of hosting super large-scale graphs with billions of vertices (nodes) and trillions of edges, with milliseconds of latency. It delivers enterprise-grade high performance to simplify the most complex data sets imaginable into meaningful and useful information.

Below is the architecture of Nebula Graph:

image

Compared with other graph database solutions, Nebula Graph has the following advantages:

  • Symmetrically distributed
  • Storage and computing separation
  • Horizontal scalability
  • Strong data consistency by RAFT protocol
  • SQL-like query language
  • Role-based access control for higher level security

Quick start

Read the Getting started guide to quickly get going with Nebula Graph.

Please note that you must install Nebula Graph by installing source code, rpm/deb packages or docker compose, before you can actually start using it. If you prefer a video tutorial, visit our YouTube channel.

In case you encounter any problem, be sure to ask us on our official forum.

Documentation

Visualization Tool: Nebula Graph Studio

Visit Nebula Graph Studio for visual exploration of graph data on a web UI.

Supported Clients

Licensing

Nebula Graph is under Apache 2.0 license. So you can freely download, modify, and deploy the source code to meet your needs. You can also freely deploy Nebula Graph as a back-end service to support your SaaS deployment.

In order to prevent cloud providers from monetizing the project without contributing back, we added Commons Clause 1.0 to the project. As mentioned, we are fully committed to the open source community. We would love to hear your thoughts on the licensing model and are willing to make it more suitable for the community.

Contributing

Contributions are warmly welcomed and greatly appreciated. Here are a few ways you can contribute:

Getting help & Contact

In case you encounter any problems playing around Nebula Graph, please reach out for help:

If you like Nebula Graph, please leave us a star. nebula star

Comments
  • Spark sstfile generator

    Spark sstfile generator

    Reopen a new PR after this repo changes from private to public, replacing PR#208

    A spark job which does the following things:

    parsing an input mapping file to map a hive table to a tag/edge, in which the table's PK(logically) should be identified use nebula native client to encode a tag's key and values define a custom hadoop OutputFormat and RecordWriter, which should generate a sub dir for one partition per worker in specified sst file output dir

  • Improve heartbeat between metad and storaged by MetaClient

    Improve heartbeat between metad and storaged by MetaClient

    1. Add the LastUpdateTimeMan to record the latest metadata change(insert, update, delete) time on the MetaServer side.
    2. Enrich the information(lastUpdateTime, partitions leaderDist) of heartbeat exchange between MetaServer and MetaClient at storage node.
    3. It could optimize the load data logic inside meta client by lastUpdateTime, which makes metadata synchronization smarter and more timely.
    4. Remove sendHeartBeat_ and load_data_interval_secs from MetaClient class, and add the bool inStoraged_ to indicate whether a metaclient is in storaged.

    close #1173 close #1060

  • Support show create tag/edge xxx, show create space xxx SQL

    Support show create tag/edge xxx, show create space xxx SQL

    Support show create tag/edge xxx, show create space xxx SQL Close #439

    This PR contain TTL_schema (PR #422). Because it depends on the attributes of the schema.

  • Ttl schema

    Ttl schema

    TTL feature in syntax and meta schema. Contains the following content:

    create tag/edge
    alter tag/edge
    show create tag/edge
    
    describe tag/edge  only display column information, not display schema attributes.
    

    show create tag/edge in PR #439

  • Access rights control

    Access rights control

    Need to establish the entire access rights control mechanism. Here is the current thoughts

    1. There is one God user admin, like root of Linux systems. admin is the only person who can create a Graph Space

    2. All other access rights are based on Graph Space. There are three types of rights for each Graph Space: Admin, User, Guest

    Admin is an administrator for a given Graph Space. An Admin can delete the Graph Space, can manage the schema in the Graph Space, and can access (read and write) the data in the Graph Space

    User is a normal user for a given Graph Space. A User can access (read and write) the data in the Graph Space

    Guest is a read-only role for a given Graph Space. A Guest cannot modify the data in the Graph Space

    1. A normal user will associate with a default Graph Space, which is the current Graph Space as soon as the user logs on

    2. A normal user can have different access right on different Graph Spaces

  •     Issue#192 Support multiple edge types in GO statement

    Issue#192 Support multiple edge types in GO statement

    Summary: Implemented over multiple edge types(include over "*" ) for Support all edge type in Go statement.

    1. Modification of the storage interface: 1.1 Modify the GetNeighborsRequest structure, pass a list of edge_type to the storage layer. 1.2 And add an over_all_edges tag to mark "over *". Modify the PropDef structure to represent both tag_id and edge_type 1.3 Added the EdgeData structure to represent the edge data returned to the client.
    2. When "over *", the current implementation was first getting all the types of edges from the meta, and then call the storage interface.
    3. When the yield statement does not exist, delete the Deleted the default rename("dst_" to "id").
    4. Delete the getOutBound and getInBound interfaces, because the edges passed to the storage may have both in and out.
    5. Added a map of edge type to edge name (toEdgeName) and a map of spaceid to all edgetype for metaclient.
    6. Returns the default value of the corresponding type when some properties do not exist.
  • Add config manager to store configurations that could be changed during the runtime

    Add config manager to store configurations that could be changed during the runtime

    GflagsManager is used to maintain all gflags in our program. It works like this:

    1. When we start-up, we will parse all gflags passed in, and try to register on meta server.
    2. We can list all gflags, get some specified gflag, or update some mutable glfags in console.
    3. GlfagsManager will poll gflags from meta server periodically, and update those gflags that value is updated. So for mutable gflags, we can change them dynamically. (e.g. load_data_interval_secs)

    How to use config manager

    There are configs for graph/meta/storage in our system. We can set config of one specified module, or we can set the config of same name to same value in different module. See example belows.

    in console

    • We can update those mutable variables like this:

    Set and get a specified module's config.

    UPDATE VARIABLES graph:load_data_interval_secs=10
    GET VARIABLES graph:load_data_interval_secs
    

    it will show like this:

    ==============================================================
    | module |                    name |  type |    mode | value |
    ==============================================================
    |  GRAPH | load_data_interval_secs | INT64 | MUTABLE |    10 |
    --------------------------------------------------------------
    

    Set and get config of same name to all module

    UPDATE VARIABLES graph:load_data_interval_secs=10
    GET VARIABLES graph:load_data_interval_secs
    

    it will show like this:

    ===============================================================
    |  module |                    name |  type |    mode | value |
    ===============================================================
    |   GRAPH | load_data_interval_secs | INT64 | MUTABLE |    10 |
    ---------------------------------------------------------------
    |    META | load_data_interval_secs | INT64 | MUTABLE |    10 |
    ---------------------------------------------------------------
    | STORAGE | load_data_interval_secs | INT64 | MUTABLE |    10 |
    ---------------------------------------------------------------
    
    • For immutable variables, we can only get.

    Or we can list configs in all module or single module

    SHOW VARIABLES
    SHOW VARIABLES graph
    

    in code

    For those immutable gflags (used to initialize or never changes), GflagsManager will not effect them, and the code use them just as the way before (FLAGS_xxx);

    For those mutable gflags, GflagsManager will update its value once get new value from meta server. But if we want to use mutable gflags, we may have conflicts, in gflags it says: These programmatic ways to access flags are thread-safe, but direct access is only thread-compatible. So as for mutalbe gflags, we need to get its value by calling GetCommandLineOption, see the example below:

         std::string flag;
         gflags::GetCommandLineOption("load_config_interval_secs", &flag);
         size_t delayMS = stoi(flag) * 1000 + folly::Random::rand32(900);
    

    GetCommandLineOption only gives you a value in string, so you may need to transform to the type you want.

    Some issues for now

    1. For those mutable gflgas, we need to specify its name in hard code, any better ideas?
    2. When console print result, type of different rows must be the same, so I transform all values to string for now. (when we list all variables, it contains value of different type)
    3. For now all uint32 and int32 glags are converted to int64 for now. (The reason is we can only parse int64, double, bool, string in console)
  • Add prefix bloom filter support

    Add prefix bloom filter support

    Currently, The DataModel of Vertex and Edge are as follows

    (1)Vertex: type(1byte)_PartID(3bytes)_VertexID(8bytes)_TagID(4bytes)_Timestamp(8bytes) (2)Edge: type(1byte)_PartID(3bytes)_VertexID(8bytes)_EdgeType(4bytes)_Rank(8bytes)_otherVertexId(8bytes)_Timestamp(8bytes)

    We can use “type(1byte)_PartID(3bytes)_VertexID(8bytes)” as the value for prefix bloom filter, QueryEdgePropsProcessor and QueryVertexPropsProcessor can benefit from this a lot

  • Implement create space, drop space, add hosts, show hosts , remove hosts via MetaClient

    Implement create space, drop space, add hosts, show hosts , remove hosts via MetaClient

    #200 Support some admin sentences in Query Engine. Implement the following SQL via MetaClient.: create space drop space add hosts show hosts remove hosts

  • Fixed use of precompiled header

    Fixed use of precompiled header

    Currently, PCH does not take effect when building in the out-of-source way, since the .gch file, by design, must be at the same directory with the original header. This PR resolves this problem by generating the .gch file in the source tree, i.e. ${CMAKE_CURRENT_SOURCE_DIR}.

    At the mean time, the former implementation of FindPCHSupport does not handle the compile options properly, so that the options used for PCH generation and source compilation are different, which is erroneous. This PR also addresses this issue.

  • Update docker related configuration

    Update docker related configuration

    • Export ports of nebula services in vesoft/nebula-graph docker image
    • Reset log level in docker image
    • Set stderr threshold in nebula graph configure file rather than graph daemon implementation
    • Update CI configuration for package and build nebula
    • Change package script parameter about package type: RPM or DEB

    reference

  • Create auto_cherry_pick.yml

    Create auto_cherry_pick.yml

    Trigger conditions:

    1. When merging, check the pr label with "cherry-pick-".
    2. When add PR label with "cherry-pick-", check the merge status of the PR.
  • Added tuned profile for Nebula Graph

    Added tuned profile for Nebula Graph

    To make the system settings more consistent and durable at reboot, we take the advantage of the tuned service.

    TBD: Build a separate noarch package nebula-tuned-profile.

  • fix some rocksdb api will fail in PlainTable

    fix some rocksdb api will fail in PlainTable

    What type of PR is this?

    • [X] bug
    • [ ] feature
    • [X] enhancement

    What problem(s) does this PR solve?

    Issue(s) number:

    Description:

    This problem is found in nebula-ng. Some of our rocksdb api is not well verified in PlainTable, for example, whether the data is in sst will effect the api result. The main problem is that:

    1. Since PlainTable only support prefix-based Seek, we need to specify prefix_same_as_start not only in prefix, but also range and rangeWithPrefix.
    2. And the length of prefix_extractor need to be modified. Because within PlainTable, if the prefix bloom filter is not inserted, you can't be read by prefix any more. So to make sure every data could be read, we use the minimum length we use in prefix, which is 4.

    This is a simple way to fix the problem. In nebula-ng, I will consider refactor it a bit.

    How do you solve it?

    Special notes for your reviewer, ex. impact of this fix, design document, etc:

    Checklist:

    Tests:

    • [X] Unit test(positive and negative cases)
    • [ ] Function test
    • [ ] Performance test
    • [ ] N/A

    Affects:

    • [ ] Documentation affected (Please add the label if documentation needs to be modified.)
    • [ ] Incompatibility (If it breaks the compatibility, please describe it and add the label.)
    • [ ] If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
    • [ ] Performance impacted: Consumes more CPU/Memory

    Release notes:

    Please confirm whether to be reflected in release notes and how to describe: Not related.

  • [UT] leader_transfer_test failed.

    [UT] leader_transfer_test failed.

    Please check the FAQ documentation before raising an issue

    Describe the bug (required) See https://github.com/vesoft-inc/nebula/runs/6398765297?check_suite_focus=true

    Your Environments (required)

    • OS: uname -a
    • Compiler: g++ --version or clang++ --version
    • CPU: lscpu
    • Commit id (e.g. a3ffc7d8)

    How To Reproduce(required)

    Steps to reproduce the behavior:

    1. Step 1
    2. Step 2
    3. Step 3

    Expected behavior

    Additional context

  • fix_predicate_in_where

    fix_predicate_in_where

    What type of PR is this?

    • [x] bug
    • [ ] feature
    • [ ] enhancement

    What problem(s) does this PR solve?

    Issue(s) number:

    close https://github.com/vesoft-inc/nebula/issues/4241

    Description:

    How do you solve it?

    Special notes for your reviewer, ex. impact of this fix, design document, etc:

    Checklist:

    Tests:

    • [ ] Unit test(positive and negative cases)
    • [ ] Function test
    • [ ] Performance test
    • [ ] N/A

    Affects:

    • [ ] Documentation affected (Please add the label if documentation needs to be modified.)
    • [ ] Incompatibility (If it breaks the compatibility, please describe it and add the label.)
    • [ ] If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
    • [ ] Performance impacted: Consumes more CPU/Memory

    Release notes:

    Please confirm whether to be reflected in release notes and how to describe:

    ex. Fixed the bug .....

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

?? ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

May 13, 2022
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

May 13, 2022
FEDB is a NewSQL database optimised for realtime inference and decisioning application
FEDB is a NewSQL database optimised for realtime inference and decisioning application

FEDB is a NewSQL database optimised for realtime inference and decisioning applications. These applications put real-time features extracted from multiple time windows through a pre-trained model to evaluate new data to support decision making. Existing in-memory databases cost hundreds or even thousands of milliseconds so they cannot meet the requirements of inference and decisioning applications.

May 14, 2022
Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.
Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.

Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.

May 11, 2022
The MongoDB Database

The MongoDB Database

May 15, 2022
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB

May 20, 2022
RediSearch is a Redis module that provides querying, secondary indexing, and full-text search for Redis.
RediSearch is a Redis module that provides querying, secondary indexing, and full-text search for Redis.

A query and indexing engine for Redis, providing secondary indexing, full-text search, and aggregations.

May 17, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

May 14, 2022
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

May 17, 2022
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

May 13, 2022
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

May 14, 2022
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

May 14, 2022
MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

Copyright (c) 2000, 2021, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the

May 11, 2022
Kunlun distributed DBMS is a NewSQL OLTP relational distributed database management system

Kunlun distributed DBMS is a NewSQL OLTP relational distributed database management system. Application developers can use Kunlun to build IT systems that handles terabytes of data, without any effort on their part to implement data sharding, distributed transaction processing, distributed query processing, crash safety, high availability, strong consistency, horizontal scalability. All these powerful features are provided by Kunlun.

May 5, 2022
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

May 11, 2022
Open source Altium Database Library with over 147,000 high quality components and full 3d models.
Open source Altium Database Library with over 147,000 high quality components and full 3d models.

Open source Altium Database Library with over 147,000 high quality components and full 3d models.

May 19, 2022
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

May 16, 2022
Distributed, Encrypted, Fractured File System - A custom distributed file system written in C with FUSE

A custom FUSE-based filesystem that distributes encrypted shards of data across machines on a local network, allowing those files to be accessible from any machine.

Sep 30, 2021
mpiFileUtils - File utilities designed for scalability and performance.

mpiFileUtils provides both a library called libmfu and a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files.

Apr 22, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

May 17, 2022