StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.

StarRocks

StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.

Technology

  • Native vectorized SQL engine: StarRocks adopts vectorization technology to make full use of the parallel computing power of CPU, achieving sub-second query returns in multi-dimensional analyses, which is 5 to 10 times faster than previous systems.
  • Simple architecture: StarRocks does not rely on any external systems. The simple architecture makes it easy to deploy, maintain and scale out. StarRocks also provides high availability, reliability, scalability and fault tolerance.
  • Standard SQL: StarRocks supports ANSI SQL syntax (fully supportted TPC-H and TPC-DS). It is also compatible with the MySQL protocol. Various clients and BI software can be used to access StarRocks.
  • Smart query optimization: StarRocks can optimize complex queries through CBO (Cost Based Optimizer). With a better execution plan, the data analysis efficiency will be greatly improved.
  • Realtime update: The updated model of StarRocks can perform upsert/delete operations according to the primary key, and achieve efficient query while concurrent updates.
  • Intelligent materialized view: The materialized view of StarRocks can be automatically updated during the data import and automatically selected when the query is executed.
  • Convenient query federation: StarRocks allows direct access to data from Hive, MySQL and Elasticsearch without importing.

User cases

  • StarRocks supports not only high concurrency & low latency points queries, but also high throughput ad-hoc queries.
  • StarRocks unified batch and near real-time streaming data ingestion.
  • Pre-aggregations, flat tables, star and snowflake schemas are supported and all run at enhanced speed.
  • StarRocks hybridizes serving and analytical processing(HSAP) in a easy way. The minimalist architectural design reduces the complexity and maintenance cost of StarRocks and increases its reliability and scalability.

Install

Download the current release here.
For detailed instructions, please refer to deploy.

Links

LICENSE

Code in this repository is provided under the Elastic License 2.0. Some portions are available under open source licenses. Please see our FAQ.

Contributing to StarRocks

A big thanks for your attention to StarRocks! In order to accept your pull request, please follow the CONTRIBUTING.md.

Comments
  • [BugFix] fix race condition of workgroup scheduling (backport #12604)

    [BugFix] fix race condition of workgroup scheduling (backport #12604)

    This is an automatic backport of pull request #12604 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

  • [Feature]Encode integers/binary per column for exchange

    [Feature]Encode integers/binary per column for exchange

    What type of PR is this:

    • [ ] BugFix
    • [x] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    encode integers/binary per column for exchange, controlled by transmission_encode_level, if transmission_encode_level & 2, intergers are encode by streamvbyte, in order or not; if transmission_encode_level & 4, binary columns are compressed by lz4. if transmission_encode_level & 1, enable adaptive encoding.

    e.g., if transmission_encode_level = 7, SR will adaptively encode numbers and string columns according to the proper encoding ratio(< 0.9); if transmission_encode_level = 6, SR will force encoding numbers and string columns.

    in short, for transmission_encode_level, 2 for encoding integers or types supported by integers, 4 for encoding string, json and object columns are left to be supported later.

    NOTE:

    to be compatible with older version, during downgrading/upgrading across this PR, transmission_encode_level must be 0 and waiting already running queries done, then replace binary.

    Problem Summary(Required) :

    encode integers for exchange to reduce data size but costing a little CPU.

    effects:

    mysql> set transmission_encode_level =7;
    mysql> select max(l_linenumber), max(l_orderkey),max(l_partkey),max(l_tax),max(l_discount),max(l_extendedprice),max(l_quantity),max(l_shipdate),min(l_suppkey),max(l_commitdate) from orders join lineitem on o_orderkey=l_partkey  where    l_orderkey< 500000000;
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    | max(l_linenumber) | max(l_orderkey) | max(l_partkey) | max(l_tax) | max(l_discount) | max(l_extendedprice) | max(l_quantity) | max(l_shipdate) | min(l_suppkey) | max(l_commitdate) |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    |                 7 |       499999974 |       20000000 |       0.08 |            0.10 |            104798.50 |           50.00 | 1998-12-01      |              1 | 1998-10-31        |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    1 row in set (4.32 sec)
    
    mysql> set transmission_encode_level =0;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select max(l_linenumber), max(l_orderkey),max(l_partkey),max(l_tax),max(l_discount),max(l_extendedprice),max(l_quantity),max(l_shipdate),min(l_suppkey),max(l_commitdate) from orders join lineitem on o_orderkey=l_partkey  where    l_orderkey< 500000000;
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    | max(l_linenumber) | max(l_orderkey) | max(l_partkey) | max(l_tax) | max(l_discount) | max(l_extendedprice) | max(l_quantity) | max(l_shipdate) | min(l_suppkey) | max(l_commitdate) |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    |                 7 |       499999974 |       20000000 |       0.08 |            0.10 |            104798.50 |           50.00 | 1998-12-01      |              1 | 1998-10-31        |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    1 row in set (5.48 sec)
    

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
  • [Refactor] Refactor event based compaction framework

    [Refactor] Refactor event based compaction framework

    What type of PR is this:

    • [ ] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [x] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #10503

    Problem Summary(Required) :

    This PR only contain refactor of event base compaction framework

    1. optimize CompactionManager's update_tablet(), avoid multiple calls and fully copy
    2. support compaction on missing version tablet
    3. optimize default base compaction score calculation strategy
    4. optimize compaction scheduler avoid create unnecessary compaction task
    5. optimize cumulative compaction scheduler interval
    6. fix duplicate compaction bug

    the default compaction strategy still need optimize after some test, meanwhile we will support size-tiered compaction strategy in subsequent PRs.

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function
  • [Doc] add date functions and update other docs

    [Doc] add date functions and update other docs

    What type of PR is this:

    • [ ] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #

    Problem Summary(Required) :

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
  • [BugFix] fix left join on big chunk

    [BugFix] fix left join on big chunk

    What type of PR is this:

    • [x] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixed https://github.com/StarRocks/StarRocksTest/issues/1355

    Problem Summary(Required) :

    Left outer join on small left side and big right side:

    • Add _probe_row_finished indicates whether the probe row is finished, which needs to emit the left row if unmatched

    Test case:

    • https://github.com/StarRocks/StarRocksTest/pull/1360

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
  • [BugFix] Fix the page selection bug in late materialization for large columns

    [BugFix] Fix the page selection bug in late materialization for large columns

    What type of PR is this:

    • [x] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #

    Problem Summary(Required) :

    The page selection in late materialization depends on next_row context, which only be updated after the total selection procedure now. As this selection procedure may contain multiple select operations for large column, the context cannot be updated in time in the subsequent operations, which causes the unexpected selection result.

    This PR advances the next_row context in each selection operation.

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function
  • [BugFix] string_to_float_internal may lose some precision

    [BugFix] string_to_float_internal may lose some precision

    For 2.97, string_to_float_internal will calculated by 2 + 97 / 100, which is 2.9699999999999998, acutulay it should be calculated by by 297 / 100, which is 2.970000000002

    Signed-off-by: xyz [email protected]

    What type of PR is this:

    • [x] bug
    • [ ] feature
    • [ ] enhancement
    • [ ] refactor
    • [ ] others

    Which issues of this PR fixes :

    Fixes #9633

    Problem Summary(Required) :

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
  • [Doc] modify varchar length limit (backport #12723)

    [Doc] modify varchar length limit (backport #12723)

    This is an automatic backport of pull request #12723 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

  • [Enhancement] Add tables_config in  information_schema db

    [Enhancement] Add tables_config in information_schema db

    What type of PR is this:

    • [ ] bug
    • [ ] feature
    • [x] enhancement
    • [ ] refactor
    • [ ] others

    Which issues of this PR fixes :

    Fixes #9498

    Problem Summary(Required) :

    Add information_schema.tables_config to show tables config. This table contains columns such as PRIMARY_KEY PARTITION_KEY

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
  • [Feature]Drop lake table

    [Feature]Drop lake table

    What type of PR is this:

    • [ ] bug
    • [x] feature
    • [ ] enhancement
    • [ ] refactor
    • [ ] others

    Which issues of this PR fixes :

    Fixes #

    Problem Summary(Required) :

    drop lake table: 1.use a daemon thread to drop lake tablet and delete shard 2.persist shard infos in image and journal

  • [BugFix] array_append/remove solve only null array

    [BugFix] array_append/remove solve only null array

    Signed-off-by: fzhedu [email protected]

    What type of PR is this:

    • [x] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #https://github.com/StarRocks/starrocks/issues/10123

    Problem Summary(Required) :

    array_append/remove should directly return for only null array.

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function
  • [Enhancement] Instead of using def for null values, use empty to represent null (backport #16265)

    [Enhancement] Instead of using def for null values, use empty to represent null (backport #16265)

    This is an automatic backport of pull request #16265 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

  • [Enhancement][Cherry-pick][Branch-2.4] Make schema changed table not to be moved to trash

    [Enhancement][Cherry-pick][Branch-2.4] Make schema changed table not to be moved to trash

    What type of PR is this:

    • [ ] BugFix
    • [ ] Feature
    • [x] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #13651

    Problem Summary(Required) :

    When a table schema change is complete, the old table is moved to the trash directory, which takes up a lot of disk space. This PR fixes this issue.

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function

    Bugfix cherry-pick branch check:

    • [ ] I have checked the version labels which the pr will be auto backported to target branch
      • [ ] 2.5
      • [ ] 2.4
      • [ ] 2.3
      • [ ] 2.2
  • [BugFix] DictMappingRewriter support compoundPredicate (backport #15737)

    [BugFix] DictMappingRewriter support compoundPredicate (backport #15737)

    This is an automatic backport of pull request #15737 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

  • [Enhancement] Use thrift 0.17.0

    [Enhancement] Use thrift 0.17.0

    What type of PR is this:

    • [x] Enhancement

    Problem Summary(Required) :

    In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could send short messages which would result in a large memory allocation, potentially leading to denial of service.

    Bump up the version to the latest 0.17.0.

    Tested both insert and select queries working for

    • FE w/ thrift 0.13.0 and BE w/ thrift 0.17.0
    • FE w/ thrift 0.17.0 and BE w/ thrift 0.13.0

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function

    Bugfix cherry-pick branch check:

    • [ ] I have checked the version labels which the pr will be auto backported to target branch
      • [X] 2.5
      • [ ] 2.4
      • [ ] 2.3
      • [ ] 2.2
  • Rename data distribution type name

    Rename data distribution type name

    Refactoring request

    Current DataPartition name is chaos, not so straightforward:

    • UNPARTITIONED means broadcast
    • RANDOM: means round robin
    • HASH_PARTITIONED means partitioned by hash
Related tags
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

Jan 8, 2023
Velox is a new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
Velox is a new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Velox is a C++ database acceleration library which provides reusable, extensible, and high-performance data processing components

Jan 8, 2023
A very fast lightweight embedded database engine with a built-in query language.

upscaledb 2.2.1 Fr 10. Mär 21:33:03 CET 2017 (C) Christoph Rupp, [email protected]; http://www.upscaledb.com This is t

Dec 30, 2022
A redis module, similar to redis zset, but you can set multiple scores for each member to support multi-dimensional sorting

TairZset: Support multi-score sorting zset Introduction Chinese TairZset is a data structure developed based on the redis module. Compared with the na

Dec 1, 2022
The database built for IoT streaming data storage and real-time stream processing.
The database built for IoT streaming data storage and real-time stream processing.

The database built for IoT streaming data storage and real-time stream processing.

Dec 26, 2022
A mini database for learning database

A mini database for learning database

Nov 14, 2022
SiriDB is a highly-scalable, robust and super fast time series database

SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without a global index and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy analysis over large amounts of time series.

Jan 9, 2023
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

Jan 2, 2023
以简单、易用、高性能为目标、开源的时序数据库,支持Linux和Windows, Time Series Database
以简单、易用、高性能为目标、开源的时序数据库,支持Linux和Windows, Time Series Database

松果时序数据库(pinusdb) 松果时序数据库是一款针对中小规模(设备数少于10万台,每天产生的数据量少于10亿条)场景设计的时序数据库。以简单、易用、高性能为设计目标。使用SQL语句进行交互,拥有极低的学习、使用成本, 提供了丰富的功能、较高的性能。 我们的目标是成为最简单、易用、健壮的单机时序

Nov 19, 2022
A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.

QTL QTL is a C ++ library for accessing SQL databases and currently supports MySQL, SQLite, PostgreSQL and ODBC. QTL is a lightweight library that con

Dec 12, 2022
ObjectBox C and C++: super-fast database for objects and structs

ObjectBox Embedded Database for C and C++ ObjectBox is a superfast C and C++ database for embedded devices (mobile and IoT), desktop and server apps.

Dec 23, 2022
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Jan 9, 2023
ESE is an embedded / ISAM-based database engine, that provides rudimentary table and indexed access.

Extensible-Storage-Engine A Non-SQL Database Engine The Extensible Storage Engine (ESE) is one of those rare codebases having proven to have a more th

Dec 22, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

Dec 24, 2022
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

Jan 4, 2023
Config and tools for config of tasmota devices from mysql database

tasmota-sql Tools for management of tasmota devices based on mysql. The tasconfig command can load config from tasmota and store in sql, or load from

Jan 8, 2022
Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform.

serverless-sqlite Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform. NOTES: This repository is still in t

May 12, 2022
Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Dec 31, 2022
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

Jan 2, 2023