Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability


English | 中文
A distributed, scalable, lightning-fast graph database

build docker image workflow make nebula packages workflow nebula star nebula fork codecov

What is Nebula Graph

Nebula Graph is an open-source graph database capable of hosting super large-scale graphs with billions of vertices (nodes) and trillions of edges, with milliseconds of latency. It delivers enterprise-grade high performance to simplify the most complex data sets imaginable into meaningful and useful information.

Below is the architecture of Nebula Graph:

image

Compared with other graph database solutions, Nebula Graph has the following advantages:

  • Symmetrically distributed
  • Storage and computing separation
  • Horizontal scalability
  • Strong data consistency by RAFT protocol
  • SQL-like query language
  • Role-based access control for higher level security

Quick start

Read the Getting started guide to quickly get going with Nebula Graph.

Please note that you must install Nebula Graph by installing source code, rpm/deb packages or docker compose, before you can actually start using it. If you prefer a video tutorial, visit our YouTube channel.

In case you encounter any problem, be sure to ask us on our official forum.

Documentation

Visualization Tool: Nebula Graph Studio

Visit Nebula Graph Studio for visual exploration of graph data on a web UI.

Supported Clients

Licensing

Nebula Graph is under Apache 2.0 license. So you can freely download, modify, and deploy the source code to meet your needs. You can also freely deploy Nebula Graph as a back-end service to support your SaaS deployment.

In order to prevent cloud providers from monetizing the project without contributing back, we added Commons Clause 1.0 to the project. As mentioned, we are fully committed to the open source community. We would love to hear your thoughts on the licensing model and are willing to make it more suitable for the community.

Contributing

Contributions are warmly welcomed and greatly appreciated. Here are a few ways you can contribute:

Getting help & Contact

In case you encounter any problems playing around Nebula Graph, please reach out for help:

If you like Nebula Graph, please leave us a star. nebula star

Comments
  • Spark sstfile generator

    Spark sstfile generator

    Reopen a new PR after this repo changes from private to public, replacing PR#208

    A spark job which does the following things:

    parsing an input mapping file to map a hive table to a tag/edge, in which the table's PK(logically) should be identified use nebula native client to encode a tag's key and values define a custom hadoop OutputFormat and RecordWriter, which should generate a sub dir for one partition per worker in specified sst file output dir

  • Improve heartbeat between metad and storaged by MetaClient

    Improve heartbeat between metad and storaged by MetaClient

    1. Add the LastUpdateTimeMan to record the latest metadata change(insert, update, delete) time on the MetaServer side.
    2. Enrich the information(lastUpdateTime, partitions leaderDist) of heartbeat exchange between MetaServer and MetaClient at storage node.
    3. It could optimize the load data logic inside meta client by lastUpdateTime, which makes metadata synchronization smarter and more timely.
    4. Remove sendHeartBeat_ and load_data_interval_secs from MetaClient class, and add the bool inStoraged_ to indicate whether a metaclient is in storaged.

    close #1173 close #1060

  • Support show create tag/edge xxx, show create space xxx SQL

    Support show create tag/edge xxx, show create space xxx SQL

    Support show create tag/edge xxx, show create space xxx SQL Close #439

    This PR contain TTL_schema (PR #422). Because it depends on the attributes of the schema.

  • Ttl schema

    Ttl schema

    TTL feature in syntax and meta schema. Contains the following content:

    create tag/edge
    alter tag/edge
    show create tag/edge
    
    describe tag/edge  only display column information, not display schema attributes.
    

    show create tag/edge in PR #439

  • Access rights control

    Access rights control

    Need to establish the entire access rights control mechanism. Here is the current thoughts

    1. There is one God user admin, like root of Linux systems. admin is the only person who can create a Graph Space

    2. All other access rights are based on Graph Space. There are three types of rights for each Graph Space: Admin, User, Guest

    Admin is an administrator for a given Graph Space. An Admin can delete the Graph Space, can manage the schema in the Graph Space, and can access (read and write) the data in the Graph Space

    User is a normal user for a given Graph Space. A User can access (read and write) the data in the Graph Space

    Guest is a read-only role for a given Graph Space. A Guest cannot modify the data in the Graph Space

    1. A normal user will associate with a default Graph Space, which is the current Graph Space as soon as the user logs on

    2. A normal user can have different access right on different Graph Spaces

  •     Issue#192 Support multiple edge types in GO statement

    Issue#192 Support multiple edge types in GO statement

    Summary: Implemented over multiple edge types(include over "*" ) for Support all edge type in Go statement.

    1. Modification of the storage interface: 1.1 Modify the GetNeighborsRequest structure, pass a list of edge_type to the storage layer. 1.2 And add an over_all_edges tag to mark "over *". Modify the PropDef structure to represent both tag_id and edge_type 1.3 Added the EdgeData structure to represent the edge data returned to the client.
    2. When "over *", the current implementation was first getting all the types of edges from the meta, and then call the storage interface.
    3. When the yield statement does not exist, delete the Deleted the default rename("dst_" to "id").
    4. Delete the getOutBound and getInBound interfaces, because the edges passed to the storage may have both in and out.
    5. Added a map of edge type to edge name (toEdgeName) and a map of spaceid to all edgetype for metaclient.
    6. Returns the default value of the corresponding type when some properties do not exist.
  • Add config manager to store configurations that could be changed during the runtime

    Add config manager to store configurations that could be changed during the runtime

    GflagsManager is used to maintain all gflags in our program. It works like this:

    1. When we start-up, we will parse all gflags passed in, and try to register on meta server.
    2. We can list all gflags, get some specified gflag, or update some mutable glfags in console.
    3. GlfagsManager will poll gflags from meta server periodically, and update those gflags that value is updated. So for mutable gflags, we can change them dynamically. (e.g. load_data_interval_secs)

    How to use config manager

    There are configs for graph/meta/storage in our system. We can set config of one specified module, or we can set the config of same name to same value in different module. See example belows.

    in console

    • We can update those mutable variables like this:

    Set and get a specified module's config.

    UPDATE VARIABLES graph:load_data_interval_secs=10
    GET VARIABLES graph:load_data_interval_secs
    

    it will show like this:

    ==============================================================
    | module |                    name |  type |    mode | value |
    ==============================================================
    |  GRAPH | load_data_interval_secs | INT64 | MUTABLE |    10 |
    --------------------------------------------------------------
    

    Set and get config of same name to all module

    UPDATE VARIABLES graph:load_data_interval_secs=10
    GET VARIABLES graph:load_data_interval_secs
    

    it will show like this:

    ===============================================================
    |  module |                    name |  type |    mode | value |
    ===============================================================
    |   GRAPH | load_data_interval_secs | INT64 | MUTABLE |    10 |
    ---------------------------------------------------------------
    |    META | load_data_interval_secs | INT64 | MUTABLE |    10 |
    ---------------------------------------------------------------
    | STORAGE | load_data_interval_secs | INT64 | MUTABLE |    10 |
    ---------------------------------------------------------------
    
    • For immutable variables, we can only get.

    Or we can list configs in all module or single module

    SHOW VARIABLES
    SHOW VARIABLES graph
    

    in code

    For those immutable gflags (used to initialize or never changes), GflagsManager will not effect them, and the code use them just as the way before (FLAGS_xxx);

    For those mutable gflags, GflagsManager will update its value once get new value from meta server. But if we want to use mutable gflags, we may have conflicts, in gflags it says: These programmatic ways to access flags are thread-safe, but direct access is only thread-compatible. So as for mutalbe gflags, we need to get its value by calling GetCommandLineOption, see the example below:

         std::string flag;
         gflags::GetCommandLineOption("load_config_interval_secs", &flag);
         size_t delayMS = stoi(flag) * 1000 + folly::Random::rand32(900);
    

    GetCommandLineOption only gives you a value in string, so you may need to transform to the type you want.

    Some issues for now

    1. For those mutable gflgas, we need to specify its name in hard code, any better ideas?
    2. When console print result, type of different rows must be the same, so I transform all values to string for now. (when we list all variables, it contains value of different type)
    3. For now all uint32 and int32 glags are converted to int64 for now. (The reason is we can only parse int64, double, bool, string in console)
  • Add prefix bloom filter support

    Add prefix bloom filter support

    Currently, The DataModel of Vertex and Edge are as follows

    (1)Vertex: type(1byte)_PartID(3bytes)_VertexID(8bytes)_TagID(4bytes)_Timestamp(8bytes) (2)Edge: type(1byte)_PartID(3bytes)_VertexID(8bytes)_EdgeType(4bytes)_Rank(8bytes)_otherVertexId(8bytes)_Timestamp(8bytes)

    We can use “type(1byte)_PartID(3bytes)_VertexID(8bytes)” as the value for prefix bloom filter, QueryEdgePropsProcessor and QueryVertexPropsProcessor can benefit from this a lot

  • Implement create space, drop space, add hosts, show hosts , remove hosts via MetaClient

    Implement create space, drop space, add hosts, show hosts , remove hosts via MetaClient

    #200 Support some admin sentences in Query Engine. Implement the following SQL via MetaClient.: create space drop space add hosts show hosts remove hosts

  • Fixed use of precompiled header

    Fixed use of precompiled header

    Currently, PCH does not take effect when building in the out-of-source way, since the .gch file, by design, must be at the same directory with the original header. This PR resolves this problem by generating the .gch file in the source tree, i.e. ${CMAKE_CURRENT_SOURCE_DIR}.

    At the mean time, the former implementation of FindPCHSupport does not handle the compile options properly, so that the options used for PCH generation and source compilation are different, which is erroneous. This PR also addresses this issue.

  • Update docker related configuration

    Update docker related configuration

    • Export ports of nebula services in vesoft/nebula-graph docker image
    • Reset log level in docker image
    • Set stderr threshold in nebula graph configure file rather than graph daemon implementation
    • Update CI configuration for package and build nebula
    • Change package script parameter about package type: RPM or DEB

    reference

  • The result of property-accessing expression is incorrect

    The result of property-accessing expression is incorrect

    Describe the bug (required) As title.

    Your Environments (required) nebula v3.3.0

    How To Reproduce(required)

    ([email protected]) [nba]> match (v:player)-[e]->(n) with {key:v} as m return m.key.player.age limit 3
    +------------------+
    | m.key.player.age |
    +------------------+
    | __NULL__         |
    | __NULL__         |
    | __NULL__         |
    +------------------+
    Got 3 rows (time spent 17007/17372 us)
    

    Expected behavior Get the correct property.

  • Fix update sessions when leader change happens

    Fix update sessions when leader change happens

    What type of PR is this?

    • [x] bug
    • [ ] feature
    • [ ] enhancement

    What problem(s) does this PR solve?

    Issue(s) number:

    Close https://github.com/vesoft-inc/nebula-ent/issues/2152 https://github.com/vesoft-inc/nebula-ent/issues/2176

    Description:

    UpdateSessions() should deal with errors like leader change.

    How do you solve it?

    Special notes for your reviewer, ex. impact of this fix, design document, etc:

    Checklist:

    Tests:

    • [ ] Unit test(positive and negative cases)
    • [ ] Function test
    • [ ] Performance test
    • [ ] N/A

    Affects:

    • [ ] Documentation affected (Please add the label if documentation needs to be modified.)
    • [ ] Incompatibility (If it breaks the compatibility, please describe it and add the label.)
    • [ ] If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
    • [ ] Performance impacted: Consumes more CPU/Memory

    Release notes:

    Please confirm whether to be reflected in release notes and how to describe:

    ex. Fixed the bug .....

  • `skip`+`limit` return results, some data is missing

    `skip`+`limit` return results, some data is missing

    • NebulaGraph version is v3.3

    When using the official basketballplayer data set + skip + limit in Cypher to do data pagination, I found a weird rule. As long as the number of skip is less than / equal to one-half of the limit, there will be data loss, which is less than one-third After about one hour, all data is lost:

    image

    image

    image

    image

    statements as below:

    ([email protected]) [basketballplayer]> match  ()-[e]->() return e  skip 100 limit 34
    +-----------------------------------------------------------------------+
    | e                                                                     |
    +-----------------------------------------------------------------------+
    | [:follow "player109"->"player125" @0 {degree: 90}]                    |
    | [:serve "player111"->"team200" @0 {end_year: 2018, start_year: 2016}] |
    +-----------------------------------------------------------------------+
    Got 2 rows (time spent 12.682ms/20.443416ms)
    
    Mon, 09 Jan 2023 06:48:09 UTC
    
    ([email protected]) [basketballplayer]> match  ()-[e]->() return e  skip 100 limit 3
    +---+
    | e |
    +---+
    +---+
    Empty set (time spent 7.91ms/8.722916ms)
    
    Mon, 09 Jan 2023 06:48:23 UTC
    
    ([email protected]) [basketballplayer]> match  ()-[e]->() return e  skip 100 limit 30
    +---+
    | e |
    +---+
    +---+
    Empty set (time spent 24.068ms/31.6455ms)
    
    Mon, 09 Jan 2023 06:49:42 UTC
    

    ref: https://discuss.nebula-graph.com.cn/t/topic/11863

  • Rethink variable scope validation for path pattern

    Rethink variable scope validation for path pattern

    Please check the FAQ documentation before raising an issue

    Describe the bug (required)

    At present, the variable validation in path pattern only happens in match clause, but when the path pattern placed in the expression context, the variable's behavior will not be expected.

    for example:

    MATCH (v:player{name: 'Tim Duncan'})-[e:like*0..2]-(v2)
    WHERE size([i in e WHERE (v)-[i]-(v2) | i])>1
    RETURN count(*) AS cnt
    
    //------
    
    MATCH (v:player{name: 'Tim Duncan'})-[e:like*0..2]-(v2)-[i]-(v3)
    WHERE size([i in e WHERE (v)-[i]-(v2) | i])>1
    RETURN count(*) AS cnt
    

    above tests are from the PR #5215 , and report the following error:

    [ERROR (-1004)]: SyntaxError: syntax is ambiguous near `(v)-[i]-(v2)'
    

    Your Environments (required)

    • OS: uname -a
    • Compiler: g++ --version or clang++ --version
    • CPU: lscpu
    • Commit id (e.g. a3ffc7d8)

    How To Reproduce(required)

    Steps to reproduce the behavior:

    1. Step 1
    2. Step 2
    3. Step 3

    Expected behavior

    Additional context

  • Create edge index failed

    Create edge index failed

    Describe the bug The edge index could not be created successfully as expected.

    Environments nebula 3.3.0

    How To Reproduce

    ([email protected]) [test]>  
      """
          CREATE EDGE IF NOT EXISTS E2(
            id int NOT NULL DEFAULT 0 COMMENT "primary key",
            name string NOT NULL,
            createDate DATETIME, location geography(polygon),
            isVisited bool COMMENT "kHop search flag",
            nickName TIME DEFAULT time()
            )
            TTL_DURATION = 100, TTL_COL = "id", COMMENT = "TAG B";
          """
    Execution succeeded (time spent 2535/2793 us)
    
    ([email protected]) [test]>  CREATE EDGE INDEX IF NOT EXISTS idx_E_3 on E2(isVisited, id, nickName, createDate, name);
    [ERROR (-1005)]: Invalid param!
    
    ([email protected]) [test]>  CREATE EDGE INDEX IF NOT EXISTS idx_E_3 on E2(isVisited, id, nickName, createDate);
    Execution succeeded (time spent 1863/2149 us)
    
    

    Expected behavior The index should be created successfully.

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

?? ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Jan 9, 2023
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

Jan 4, 2023
FEDB is a NewSQL database optimised for realtime inference and decisioning application
FEDB is a NewSQL database optimised for realtime inference and decisioning application

FEDB is a NewSQL database optimised for realtime inference and decisioning applications. These applications put real-time features extracted from multiple time windows through a pre-trained model to evaluate new data to support decision making. Existing in-memory databases cost hundreds or even thousands of milliseconds so they cannot meet the requirements of inference and decisioning applications.

Jan 2, 2023
Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.
Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.

Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.

Jan 8, 2023
The MongoDB Database

The MongoDB Database

Jan 1, 2023
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB

Jan 5, 2023
RediSearch is a Redis module that provides querying, secondary indexing, and full-text search for Redis.
RediSearch is a Redis module that provides querying, secondary indexing, and full-text search for Redis.

A query and indexing engine for Redis, providing secondary indexing, full-text search, and aggregations.

Jan 5, 2023
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

Dec 24, 2022
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

Jan 4, 2023
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Jan 9, 2023
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

Dec 14, 2022
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

Dec 14, 2022
MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

Copyright (c) 2000, 2021, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the

Dec 26, 2022
Kunlun distributed DBMS is a NewSQL OLTP relational distributed database management system

Kunlun distributed DBMS is a NewSQL OLTP relational distributed database management system. Application developers can use Kunlun to build IT systems that handles terabytes of data, without any effort on their part to implement data sharding, distributed transaction processing, distributed query processing, crash safety, high availability, strong consistency, horizontal scalability. All these powerful features are provided by Kunlun.

Dec 26, 2022
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

Jan 7, 2023
Open source Altium Database Library with over 147,000 high quality components and full 3d models.
Open source Altium Database Library with over 147,000 high quality components and full 3d models.

Open source Altium Database Library with over 147,000 high quality components and full 3d models.

Dec 29, 2022
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

Jan 2, 2023
Distributed, Encrypted, Fractured File System - A custom distributed file system written in C with FUSE

A custom FUSE-based filesystem that distributes encrypted shards of data across machines on a local network, allowing those files to be accessible from any machine.

Nov 2, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Jan 5, 2023
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

Jan 8, 2023