Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB

Scylla

Slack Twitter

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

  • The users mailing list and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
  • The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.
Owner
ScyllaDB
ScyllaDB, the The Real-Time Big Data Database
ScyllaDB
Comments
  • c-s latency caused by high latency from peer node

    c-s latency caused by high latency from peer node

    1. Start 2 nodes n1, n2 using recent scylla master 1fd701e
    2. Enable slow query curl -X POST "http://127.0.0.1:10000/storage_service/slow_query?enable=true&fast=false&threshold=80000" curl -X POST "http://127.0.0.2:10000/storage_service/slow_query?enable=true&fast=false&threshold=80000"
    3. Start c-s cassandra-stress write no-warmup cl=TWO n=5000000 -schema 'replication(factor=2)' -port jmx=6868 -mode cql3 native -rate threads=200 -col 'size=FIXED(5) n=FIXED(8)' -pop seq=1500000000..2500000000
    4. Run repair to make c-s latency high to trigger the slow query tracing

    See the following trace, node 127.0.0.2 applies the write very fast (less than 100us), while the remote node 127.0.0.1 took 295677 us. This means the 300ms c-s latency seen by the client (c-s) were mostly contributed by the remote node. Due to the tracing issues I reported here https://github.com/scylladb/scylla/issues/9403, we do not know where the time was spent on the remote took. It might be disk or network or cpu contention. But I have a feeling, the contention is from network when repair runs since we do not have a network scheduler. So the theory is that the remote node applies the write very quickly, but either the network rpc message to send the request or response are contented, so in the end, node 127.0.0.2 got the response with a high latency.

    cqlsh> SELECT * from system_traces.events WHERE session_id=ea0a5cc0-2021-11ec-be32-b254958ec4a2;
    
     session_id                           | event_id                             | activity                                                                                           | scylla_parent_id | scylla_span_id  | source    | source_elapsed | thread
    --------------------------------------+--------------------------------------+----------------------------------------------------------------------------------------------------+------------------+-----------------+-----------+----------------+---------
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a770a-2021-11ec-be32-b254958ec4a2 |                                                                                    Checking bounds |                0 | 373048741859841 | 127.0.0.2 |              0 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a770f-2021-11ec-be32-b254958ec4a2 |                                                                             Processing a statement |                0 | 373048741859841 | 127.0.0.2 |              0 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a781b-2021-11ec-be32-b254958ec4a2 | Creating write handler for token: -6493410074079723942 natural: {127.0.0.1, 127.0.0.2} pending: {} |                0 | 373048741859841 | 127.0.0.2 |             27 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a782e-2021-11ec-be32-b254958ec4a2 |                                  Creating write handler with live: {127.0.0.1, 127.0.0.2} dead: {} |                0 | 373048741859841 | 127.0.0.2 |             29 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a7850-2021-11ec-be32-b254958ec4a2 |                                                                 X Sending a mutation to /127.0.0.1 |                0 | 373048741859841 | 127.0.0.2 |             32 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a786a-2021-11ec-be32-b254958ec4a2 |                                                                     X Executing a mutation locally |                0 | 373048741859841 | 127.0.0.2 |             35 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a7993-2021-11ec-be32-b254958ec4a2 |                                                            Z Finished executing a mutation locally |                0 | 373048741859841 | 127.0.0.2 |             65 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a799c-2021-11ec-be32-b254958ec4a2 |                                                                     Got a response from /127.0.0.2 |                0 | 373048741859841 | 127.0.0.2 |             65 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea0a79e0-2021-11ec-be32-b254958ec4a2 |                                                        Z Finished Sending a mutation to /127.0.0.1 |                0 | 373048741859841 | 127.0.0.2 |             72 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea3794ed-2021-11ec-be32-b254958ec4a2 |                                                                     Got a response from /127.0.0.1 |                0 | 373048741859841 | 127.0.0.2 |         295677 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea3794f3-2021-11ec-be32-b254958ec4a2 |                                       Delay decision due to throttling: do not delay, resuming now |                0 | 373048741859841 | 127.0.0.2 |         295677 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea3797f8-2021-11ec-be32-b254958ec4a2 |                                                                    Mutation successfully completed |                0 | 373048741859841 | 127.0.0.2 |         295755 | shard 0
     ea0a5cc0-2021-11ec-be32-b254958ec4a2 | ea379808-2021-11ec-be32-b254958ec4a2 |                                                               Done processing - preparing a result |                0 | 373048741859841 | 127.0.0.2 |         295756 | shard 0
    
    (13 rows)
    
  • Node stuck 12 hours in decommission

    Node stuck 12 hours in decommission

    Installation details Scylla version (or git commit hash): 3.1.0.rc5-0.20190902.623ea5e3d Cluster size: 4 OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-02055ad6b0af5669b

    We see that Thrift and CQL ports are closed but nodetool command is stuck

    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] compaction - Compacted 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-280840-big-Data.db:level=2,
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-279118-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] compaction - Compacting [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-280714-big-Data.db:level=1, /var/lib/scy
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:31 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] LeveledManifest - Adding high-level (L3) /var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-277690-big-Data.db to ca
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] storage_service - DECOMMISSIONING: unbootstrap done
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] storage_service - Thrift server stopped
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] storage_service - CQL server stopped
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] storage_service - DECOMMISSIONING: shutdown rpc and cql server done
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] storage_service - DECOMMISSIONING: stop batchlog_manager done
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] gossip - My status = LEFT
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] gossip - No local state or state is in silent shutdown, not announcing shutdown
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] storage_service - DECOMMISSIONING: stop_gossiping done
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 12] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 10] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 9] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 8] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 9] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 13] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.113.188:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 6] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 13] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 4] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 13] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.113.188:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 4] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 13] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 2] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 3] rpc - client 10.0.113.188:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 7] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 2] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 3] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 5] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 5] rpc - client 10.0.113.188:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 5] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.87.51:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.113.188:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 0] rpc - client 10.0.63.72:7001: client connection dropped: read: Connection reset by peer
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 6] compaction - Compacted 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-286124-big-Data.db:level=2,
    Sep 04 22:43:32 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 6] compaction - Compacting [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-285942-big-Data.db:level=1, ]
    Sep 04 22:43:34 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 6] compaction - Compacted 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-286138-big-Data.db:level=2,
    Sep 04 22:43:34 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 6] compaction - Compacting [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-285956-big-Data.db:level=1, ]
    Sep 04 22:43:34 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 7] compaction - Compacted 9 sstables to [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-282149-big-Data.db:level=3,
    Sep 04 22:43:34 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 7] compaction - Compacting [/var/lib/scylla/data/keyspace1/standard1-2b9793b0ce8f11e98b9a000000000009/mc-278747-big-Data.db:level=3, /var/lib/scy
    Sep 04 22:43:34 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 10] compaction - Compacting [/var/lib/scylla/data/system/large_rows-40550f66085839a09430f27fc08034e9/mc-4252-big-Data.db:level=0, /var/lib/scylla
    Sep 04 22:43:35 ip-10-0-142-68.eu-west-1.compute.internal scylla[46048]:  [shard 10] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/large_rows-40550f66085839a09430f27fc08034e9/mc-4266-big-Data.db:level=0, ].
    

    nodetool stuck more than 12 hours

    [[email protected] centos]# ps -fp 119286
    UID         PID   PPID  C STIME TTY          TIME CMD
    centos   119286   1759  0 Sep04 ?        00:00:00 /bin/sh /usr/bin/nodetool -u cassandra -pw cassandra decommission
    [[email protected] centos]# date
    Thu Sep  5 12:57:52 UTC 2019
    [[email protected] centos]#
    

    Probably related to issue with nodetool drain stuck #4891 and old issue #961

  • resharding + alternator LWT -> Scylla service takes 36 minutes to start

    resharding + alternator LWT -> Scylla service takes 36 minutes to start

    Installation details

    Kernel Version: 5.13.0-1021-aws

    Scylla version (or git commit hash): 2022.1~rc3-20220406.5cc3b678c with build-id 48dfae0735cd8efc4ae2f5c777beaee2a1e89f4a

    Cluster size: 4 nodes (i3.4xlarge)

    Scylla Nodes used in this run:

    • alternator-48h-2022-1-db-node-81cb61d9-5 (34.241.246.188 | 10.0.2.75) (shards: 14)
    • alternator-48h-2022-1-db-node-81cb61d9-4 (52.30.41.107 | 10.0.3.6) (shards: 14)
    • alternator-48h-2022-1-db-node-81cb61d9-3 (52.214.185.121 | 10.0.1.89) (shards: 14)
    • alternator-48h-2022-1-db-node-81cb61d9-2 (34.242.68.250 | 10.0.1.112) (shards: 14)
    • alternator-48h-2022-1-db-node-81cb61d9-1 (176.34.90.117 | 10.0.0.237) (shards: 14)

    OS / Image: ami-071c70d20f0fdbb2c (aws: eu-west-1)

    Test: longevity-alternator-200gb-48h-test

    Test id: 81cb61d9-8d3f-45ae-8b50-f7882b4a6af8

    Test name: longevity/longevity-alternator-200gb-48h-test

    Test config file(s):

    Issue description

    At 2022-04-16 09:34:34.496 a restart with resharding nemesis has started on node 4. The nemesis shuts down the scylla service, edits the murmur3_partitioner_ignore_msb_bits config value to force resharding, and starts the scylla service again exepcting the initialization to take 5 minutes at most. When we start the service, however, it took 36 minutes for scylla to start:

    2022-04-16T09:36:11+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - installing SIGHUP handler
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - Scylla version 2022.1.rc3-0.20220406.5cc3b678c with build-id 48dfae0735cd8efc4ae2f5c777beaee2a1e89f4a starting ...
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting prometheus API server
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting tokens manager
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting effective_replication_map factory
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting migration manager notifier
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting lifecycle notifier
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - creating tracing
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - creating snitch
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting API server
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - Scylla API server listening on 127.0.0.1:10000 ...
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - initializing storage service
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting gossiper
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - seeds={10.0.0.237}, listen_address=10.0.3.6, broadcast_address=10.0.3.6
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - initializing storage service
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting per-shard database core
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - creating and verifying directories
    2022-04-16T09:36:12+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting database
    2022-04-16T09:40:31+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting storage proxy
    2022-04-16T09:40:31+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting migration manager
    2022-04-16T09:40:31+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting query processor
    2022-04-16T09:40:31+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - initializing batchlog manager
    2022-04-16T09:40:31+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - loading system sstables
    2022-04-16T09:40:31+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - loading non-system sstables
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting view update generator
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - setting up system keyspace
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting commit log
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - initializing migration manager RPC verbs
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - initializing storage proxy RPC verbs
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting streaming service
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting hinted handoff manager
    2022-04-16T09:50:19+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting messaging service
    2022-04-16T09:51:20+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting CDC Generation Management service
    2022-04-16T09:51:20+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting CDC log service
    2022-04-16T09:51:20+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting storage service
    2022-04-16T09:51:20+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting sstables loader
    2022-04-16T10:07:47+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting system distributed keyspace
    2022-04-16T10:11:47+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting tracing
    2022-04-16T10:11:48+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - SSTable data integrity checker is disabled.
    2022-04-16T10:11:48+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting auth service
    2022-04-16T10:11:50+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting batchlog manager
    2022-04-16T10:11:50+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting load meter
    2022-04-16T10:11:50+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting cf cache hit rate calculator
    2022-04-16T10:11:50+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting view update backlog broker
    2022-04-16T10:11:53+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - Waiting for gossip to settle before accepting client requests...
    2022-04-16T10:12:06+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - allow replaying hints
    2022-04-16T10:12:07+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - Launching generate_mv_updates for non system tables
    2022-04-16T10:12:07+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting the view builder
    2022-04-16T10:12:25+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting native transport
    2022-04-16T10:12:26+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - starting the expiration service
    2022-04-16T10:12:27+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - serving
    2022-04-16T10:12:27+00:00 alternator-48h-2022-1-db-node-81cb61d9-4 !    INFO |  [shard 0] init - Scylla version 2022.1.rc3-0.20220406.5cc3b678c initialization completed.
    

    Namely, the loading phases took way longer than usual.

    • Restore Monitor Stack command: $ hydra investigate show-monitor 81cb61d9-8d3f-45ae-8b50-f7882b4a6af8
    • Restore monitor on AWS instance using Jenkins job
    • Show all stored logs command: $ hydra investigate show-logs 81cb61d9-8d3f-45ae-8b50-f7882b4a6af8

    Logs:

    db-cluster: https://cloudius-jenkins-test.s3.amazonaws.com/81cb61d9-8d3f-45ae-8b50-f7882b4a6af8/20220424_100353/db-cluster-81cb61d9.tar.gz loader-set: https://cloudius-jenkins-test.s3.amazonaws.com/81cb61d9-8d3f-45ae-8b50-f7882b4a6af8/20220424_100353/loader-set-81cb61d9.tar.gz monitor-set: https://cloudius-jenkins-test.s3.amazonaws.com/81cb61d9-8d3f-45ae-8b50-f7882b4a6af8/20220424_100353/monitor-set-81cb61d9.tar.gz

    Jenkins job URL

  • Coredumps during restart_then_repair_node nemesis

    Coredumps during restart_then_repair_node nemesis

    This is Scylla's bug tracker, to be used for reporting bugs only. If you have a question about Scylla, and not a bug, please ask it in our mailing-list at [email protected] or in our slack channel.

    • [x] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

    Installation details Scylla version (or git commit hash):3.1.0.rc4-0.20190826.e4a39ed31 Cluster size:4 OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-0ececa5cacea302a8

    During restart_then_repair_node, the target node (# 5) suffered from streaming exceptions:

    (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=26510 node=Node longevity-large-partitions-4d-3-1-db-node-49dc20d4-5 [52.50.193.198 | 10.0.133.1] (seed: False)
    2019-08-27T22:03:51+00:00  ip-10-0-133-1 !WARNING | scylla: [shard 0] range_streamer - Bootstrap with 10.0.10.203 for keyspace=scylla_bench failed, took 773.173 seconds: streaming::stream_exception (Stream failed)
    

    While 2 other nodes suffered from semaphore timeouts (could be related to #4615)

    (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=14442 node=Node longevity-large-partitions-4d-3-1-db-node-49dc20d4-4 [34.245.137.134 | 10.0.178.144] (seed: False)
    2019-08-27T22:06:09+00:00  ip-10-0-178-144 !ERR     | scylla: [shard 7] storage_proxy - Exception when communicating with 10.0.178.144: seastar::semaphore_timed_out (Semaphore timedout)
    

    and created coredumps like so:

    (CoreDumpEvent Severity.CRITICAL): node=Node longevity-large-partitions-4d-3-1-db-node-49dc20d4-4 [34.245.137.134 | 10.0.178.144] (seed: False)
    corefile_urls=
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.4406.1566942687000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.4406.1566942687000000.gz.aa
    backtrace=           PID: 4406 (scylla)
               UID: 996 (scylla)
               GID: 1001 (scylla)
            Signal: 6 (ABRT)
         Timestamp: Tue 2019-08-27 21:51:27 UTC (1min 55s ago)
      Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 500 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-11
        Executable: /opt/scylladb/libexec/scylla
     Control Group: /
           Boot ID: 9f0393fe20f04dfab829e5bb5cc4bdad
        Machine ID: df877a200226bc47d06f26dae0736ec9
          Hostname: ip-10-0-178-144.eu-west-1.compute.internal
          Coredump: /var/lib/systemd/coredump/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.4406.1566942687000000
           Message: Process 4406 (scylla) of user 996 dumped core.
                    
                    Stack trace of thread 4430:
                    #0  0x00007f95cfcc953f raise (libc.so.6)
                    #1  0x00007f95cfcb395e abort (libc.so.6)
                    #2  0x00000000040219ab on_allocation_failure (scylla)
    

    Here I'll add links to all of those kind of coredumps, knowing that there's currently a bit of an issue with uploading them, hoping that one of them uploaded correctly:

    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.16744.1566943317000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.16744.1566943317000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.17150.1566944278000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.17150.1566944278000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.17686.1566944862000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.17686.1566944862000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.18167.1566945503000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.18167.1566945503000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.18731.1566946375000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.18731.1566946375000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.19423.1566947108000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.19423.1566947108000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20078.1566947830000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20078.1566947830000000.gz.aa
    
    
    (CoreDumpEvent Severity.CRITICAL): node=Node longevity-large-partitions-4d-3-1-db-node-49dc20d4-4 [34.245.137.134 | 10.0.178.144] (seed: False)
    corefile_urls=
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20078.1566947830000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20078.1566947830000000.gz.aa
    backtrace=           PID: 20078 (scylla)
               UID: 996 (scylla)
               GID: 1001 (scylla)
            Signal: 6 (ABRT)
         Timestamp: Tue 2019-08-27 23:17:10 UTC (1min 57s ago)
      Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 500 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-11
        Executable: /opt/scylladb/libexec/scylla
     Control Group: /
           Boot ID: 9f0393fe20f04dfab829e5bb5cc4bdad
        Machine ID: df877a200226bc47d06f26dae0736ec9
          Hostname: ip-10-0-178-144.eu-west-1.compute.internal
          Coredump: /var/lib/systemd/coredump/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20078.1566947830000000
           Message: Process 20078 (scylla) of user 996 dumped core.
                    
                    Stack trace of thread 20089:
                    #0  0x00007fa119b2853f raise (libc.so.6)
                    #1  0x00007fa119b1295e abort (libc.so.6)
                    #2  0x0000000000469b8e _ZN8logalloc18allocating_section7reserveEv (scylla)
    

    Different backtraces and translation during the run:

    Aug 27 22:03:29 ip-10-0-10-203.eu-west-1.compute.internal scylla[5160]:  [shard 10] seastar - Failed to allocate 851968 bytes
    0x00000000041808b2
    0x000000000406d935
    0x000000000406dc35
    0x000000000406dce3
    0x00007f7420b4602f
    /opt/scylladb/libreloc/libc.so.6+0x000000000003853e
    /opt/scylladb/libreloc/libc.so.6+0x0000000000022894
    0x00000000040219aa
    0x0000000004022a0e
    0x000000000131bcb3
    0x000000000137d78f
    0x000000000131725f
    0x000000000136c8b1
    0x00000000014555b5
    0x0000000001296442
    0x000000000145c35a
    0x000000000406ae21
    0x000000000406b01e
    0x000000000414d06d
    0x00000000041776ab
    0x00000000040390dd
    /opt/scylladb/libreloc/libpthread.so.0+0x000000000000858d
    /opt/scylladb/libreloc/libc.so.6+0x00000000000fd6a2
    
    

    translated:

    void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /usr/include/boost/program_options/variables_map.hpp:146
    seastar::backtrace_buffer::append_backtrace() at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) print_with_backtrace at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:1104
    seastar::print_with_backtrace(char const*) at /usr/include/boost/program_options/variables_map.hpp:146
    sigabrt_action at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:5012
     (inlined by) _FUN at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:5008
    ?? ??:0
    ?? ??:0
    ?? ??:0
    seastar::memory::on_allocation_failure(unsigned long) at memory.cc:?
    operator new(unsigned long) at ??:?
     (inlined by) operator new(unsigned long) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/memory.cc:1674
    seastar::circular_buffer<sstables::promoted_index_block, std::allocator<sstables::promoted_index_block> >::expand(unsigned long) at crtstuff.c:?
     (inlined by) seastar::circular_buffer<sstables::promoted_index_block, std::allocator<sstables::promoted_index_block> >::expand(unsigned long) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/circular_buffer.hh:301
    sstables::promoted_index_blocks_reader::process_state(seastar::temporary_buffer<char>&, sstables::promoted_index_blocks_reader::m_parser_context&) at crtstuff.c:?
     (inlined by) seastar::circular_buffer<sstables::promoted_index_block, std::allocator<sstables::promoted_index_block> >::maybe_expand(unsigned long) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/circular_buffer.hh:331
     (inlined by) void seastar::circular_buffer<sstables::promoted_index_block, std::allocator<sstables::promoted_index_block> >::emplace_back<position_in_partition, position_in_partition, unsigned long&, unsigned long&, std::optional<sstables::deletion_time> >(position_in_partition&&, position_in_partition&&, unsigned long&, unsigned long&, std::optional<sstables::deletion_time>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/circular_buffer.hh:391
     (inlined by) sstables::promoted_index_blocks_reader::process_state(seastar::temporary_buffer<char>&, sstables::promoted_index_blocks_reader::m_parser_context&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/./sstables/index_entry.hh:416
    data_consumer::continuous_data_consumer<sstables::promoted_index_blocks_reader>::process(seastar::temporary_buffer<char>&) at crtstuff.c:?
     (inlined by) sstables::promoted_index_blocks_reader::process_state(seastar::temporary_buffer<char>&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/./sstables/index_entry.hh:456
     (inlined by) data_consumer::continuous_data_consumer<sstables::promoted_index_blocks_reader>::process(seastar::temporary_buffer<char>&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/consumer.hh:404
    seastar::future<> seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::promoted_index_blocks_reader> >(std::reference_wrapper<sstables::promoted_index_blocks_reader>&&)::{lambda()#1}>(seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::promoted_index_blocks_reader> >(std::reference_wrapper<sstables::promoted_index_blocks_reader>&&)::{lambda()#1}) at crtstuff.c:?
     (inlined by) seastar::future<seastar::consumption_result<char> > std::__invoke_impl<seastar::future<seastar::consumption_result<char> >, sstables::promoted_index_blocks_reader&, seastar::temporary_buffer<char> >(std::__invoke_other, sstables::promoted_index_blocks_reader&, seastar::temporary_buffer<char>&&) at /usr/include/c++/8/bits/invoke.h:60
     (inlined by) std::__invoke_result<sstables::promoted_index_blocks_reader&, seastar::temporary_buffer<char> >::type std::__invoke<sstables::promoted_index_blocks_reader&, seastar::temporary_buffer<char> >(sstables::promoted_index_blocks_reader&, seastar::temporary_buffer<char>&&) at /usr/include/c++/8/bits/invoke.h:96
     (inlined by) std::result_of<sstables::promoted_index_blocks_reader& (seastar::temporary_buffer<char>&&)>::type std::reference_wrapper<sstables::promoted_index_blocks_reader>::operator()<seastar::temporary_buffer<char> >(seastar::temporary_buffer<char>&&) const at /usr/include/c++/8/bits/refwrap.h:319
     (inlined by) seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::promoted_index_blocks_reader> >(std::reference_wrapper<sstables::promoted_index_blocks_reader>&&)::{lambda()#1}::operator()() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/iostream-impl.hh:227
     (inlined by) seastar::future<> seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::promoted_index_blocks_reader> >(std::reference_wrapper<sstables::promoted_index_blocks_reader>&&)::{lambda()#1}>(seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::promoted_index_blocks_reader> >(std::reference_wrapper<sstables::promoted_index_blocks_reader>&&)::{lambda()#1}) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:285
    sstables::index_reader::advance_upper_past(position_in_partition_view) at crtstuff.c:?
     (inlined by) seastar::future<> seastar::input_stream<char>::consume<sstables::promoted_index_blocks_reader>(sstables::promoted_index_blocks_reader&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/iostream-impl.hh:236
     (inlined by) data_consumer::continuous_data_consumer<sstables::promoted_index_blocks_reader>::consume_input() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/consumer.hh:377
     (inlined by) sstables::index_entry::get_next_pi_blocks() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/./sstables/index_entry.hh:614
     (inlined by) sstables::index_reader::advance_upper_past(position_in_partition_view) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/index_reader.hh:582
    seastar::future<bool> seastar::futurize<seastar::future<bool> >::apply<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}>(sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}&&, std::tuple<>&&) [clone .constprop.7996] at sstables.cc:?
     (inlined by) seastar::apply_helper<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#1}&&, std::tuple) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/apply.hh:35
     (inlined by) auto seastar::apply<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}>(sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/apply.hh:43
     (inlined by) seastar::future<bool> seastar::futurize<seastar::future<bool> >::apply<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}>(sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1392
    _ZN7seastar12continuationIZZNS_6futureIJEE9then_implIZN8sstables12index_reader34advance_lower_and_check_if_presentEN3dht18ring_position_viewESt8optionalI26position_in_partition_viewEEUlvE_NS1_IJbEEEEET0_OT_ENKUlvE_clEvEUlSF_E_JEE15run_and_disposeEv at crtstuff.c:?
     (inlined by) seastar::future<bool> seastar::future<>::then<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}, seastar::future<bool> >(sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const::{lambda()#1}&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:917
     (inlined by) sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}::operator()() const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/index_reader.hh:775
     (inlined by) seastar::apply_helper<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#1}&&, std::tuple<>) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/apply.hh:35
     (inlined by) auto seastar::apply<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}>(sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}&&, std::tuple<>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/apply.hh:43
     (inlined by) seastar::future<bool> seastar::futurize<seastar::future<bool> >::apply<sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}>(sstables::index_reader::advance_lower_and_check_if_present(dht::ring_position_view, std::optional<position_in_partition_view>)::{lambda()#1}&&, std::tuple<>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1392
     (inlined by) _ZZZN7seastar6futureIJEE9then_implIZN8sstables12index_reader34advance_lower_and_check_if_presentEN3dht18ring_position_viewESt8optionalI26position_in_partition_viewEEUlvE_NS0_IJbEEEEET0_OT_ENKUlvE_clEvENUlSE_E_clINS_12future_stateIJEEEEEDaSE_ at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:950
     (inlined by) _ZN7seastar12continuationIZZNS_6futureIJEE9then_implIZN8sstables12index_reader34advance_lower_and_check_if_presentEN3dht18ring_position_viewESt8optionalI26position_in_partition_viewEEUlvE_NS1_IJbEEEEET0_OT_ENKUlvE_clEvEUlSF_E_JEE15run_and_disposeEv at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:377
    seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /usr/include/boost/program_options/variables_map.hpp:146
    seastar::reactor::run_some_tasks() at /usr/include/boost/program_options/variables_map.hpp:146
    seastar::reactor::run_some_tasks() at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) seastar::reactor::run() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:4243
    seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::{lambda()#3}::operator()() const at /usr/include/boost/program_options/variables_map.hpp:146
    std::function<void ()>::operator()() const at /usr/include/c++/8/bits/std_function.h:687
     (inlined by) seastar::posix_thread::start_routine(void*) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/posix.cc:52
    
    Aug 27 22:18:38 ip-10-0-10-203.eu-west-1.compute.internal scylla[31878]:  [shard 0] seastar - Failed to allocate 131072 bytes
    0x00000000041808b2
    0x000000000406d935
    0x000000000406dc35
    0x000000000406dce3
    0x00007f2e2d4c002f
    /opt/scylladb/libreloc/libc.so.6+0x000000000003853e
    /opt/scylladb/libreloc/libc.so.6+0x0000000000022894
    0x00000000040219aa
    0x00000000040235f4
    0x0000000004124fac
    0x0000000000cf0f7d
    0x0000000000cf1027
    0x00000000036ec4bb
    0x0000000004000a85
    0x00000000040030f4
    0x0000000001523df0
    0x0000000001581d82
    0x00000000015a5249
    0x00000000015a7e14
    0x0000000001094cdf
    0x000000000109798d
    0x000000000109872d
    0x000000000109b983
    0x000000000109d0d5
    0x00000000010c786f
    0x00000000010985c1
    0x000000000109b983
    0x000000000109d0d5
    0x00000000010e9aae
    0x00000000010eaa19
    0x0000000000e52db4
    0x000000000406ae21
    0x000000000406b01e
    0x000000000414d06d
    0x0000000003fd51d6
    0x0000000003fd6922
    0x00000000007d9d69
    /opt/scylladb/libreloc/libc.so.6+0x0000000000024412
    0x000000000083a1fd
    
    

    Translation:

    void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /usr/include/boost/program_options/variables_map.hpp:146
    seastar::backtrace_buffer::append_backtrace() at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) print_with_backtrace at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:1104
    seastar::print_with_backtrace(char const*) at /usr/include/boost/program_options/variables_map.hpp:146
    sigabrt_action at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:5012
     (inlined by) _FUN at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:5008
    ?? ??:0
    ?? ??:0
    ?? ??:0
    seastar::memory::on_allocation_failure(unsigned long) at memory.cc:?
    __libc_posix_memalign at ??:?
     (inlined by) posix_memalign at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/memory.cc:1601
    seastar::temporary_buffer<unsigned char>::aligned(unsigned long, unsigned long) at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) seastar::file::read_state<unsigned char>::read_state(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/file.hh:481
     (inlined by) seastar::shared_ptr_no_esft<seastar::file::read_state<unsigned char> >::shared_ptr_no_esft<unsigned long&, unsigned long&, unsigned long&, unsigned int&, unsigned int&>(unsigned long&, unsigned long&, unsigned long&, unsigned int&, unsigned int&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/shared_ptr.hh:164
     (inlined by) seastar::lw_shared_ptr<seastar::file::read_state<unsigned char> > seastar::lw_shared_ptr<seastar::file::read_state<unsigned char> >::make<unsigned long&, unsigned long&, unsigned long&, unsigned int&, unsigned int&>(unsigned long&, unsigned long&, unsigned long&, unsigned int&, unsigned int&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/shared_ptr.hh:266
     (inlined by) seastar::lw_shared_ptr<seastar::file::read_state<unsigned char> > seastar::make_lw_shared<seastar::file::read_state<unsigned char>, unsigned long&, unsigned long&, unsigned long&, unsigned int&, unsigned int&>(unsigned long&, unsigned long&, unsigned long&, unsigned int&, unsigned int&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/shared_ptr.hh:416
     (inlined by) seastar::posix_file_impl::dma_read_bulk(unsigned long, unsigned long, seastar::io_priority_class const&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:2352
    checked_file_impl::dma_read_bulk(unsigned long, unsigned long, seastar::io_priority_class const&)::{lambda()#1}::operator()() const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/sstring.hh:257
     (inlined by) auto do_io_check<checked_file_impl::dma_read_bulk(unsigned long, unsigned long, seastar::io_priority_class const&)::{lambda()#1}, , void>(std::function<void (std::__exception_ptr::exception_ptr)> const&, checked_file_impl::dma_read_bulk(unsigned long, unsigned long, seastar::io_priority_class const&)::{lambda()#1}&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/disk-error-handler.hh:73
    checked_file_impl::dma_read_bulk(unsigned long, unsigned long, seastar::io_priority_class const&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/sstring.hh:257
    tracking_file_impl::dma_read_bulk(unsigned long, unsigned long, seastar::io_priority_class const&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/reader_concurrency_semaphore.cc:184
    seastar::future<seastar::temporary_buffer<char> > seastar::file::dma_read_bulk<char>(unsigned long, unsigned long, seastar::io_priority_class const&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/file.hh:421
     (inlined by) seastar::file_data_source_impl::issue_read_aheads(unsigned int)::{lambda()#1}::operator()() const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/fstream.cc:256
     (inlined by) seastar::future<seastar::temporary_buffer<char> > seastar::futurize<seastar::future<seastar::temporary_buffer<char> > >::apply<seastar::file_data_source_impl::issue_read_aheads(unsigned int)::{lambda()#1}>(seastar::file_data_source_impl::issue_read_aheads(unsigned int)::{lambda()#1}&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/future.hh:1402
     (inlined by) seastar::file_data_source_impl::issue_read_aheads(unsigned int) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/fstream.cc:255
    seastar::file_data_source_impl::get() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/fstream.cc:173
    seastar::data_source::get() at /usr/include/c++/8/variant:1356
     (inlined by) seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> >(std::reference_wrapper<sstables::data_consume_rows_context_m>&&)::{lambda()#1}::operator()() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/iostream-impl.hh:206
     (inlined by) seastar::future<> seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> >(std::reference_wrapper<sstables::data_consume_rows_context_m>&&)::{lambda()#1}>(seastar::future<> seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> >(std::reference_wrapper<sstables::data_consume_rows_context_m>&&)::{lambda()#1}) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:285
    sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const at crtstuff.c:?
     (inlined by) seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context_m>(sstables::data_consume_rows_context_m&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/iostream-impl.hh:236
     (inlined by) data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::consume_input() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/consumer.hh:377
     (inlined by) sstables::data_consume_context<sstables::data_consume_rows_context_m>::read() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/data_consume_context.hh:98
     (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/partition.cc:479
     (inlined by) seastar::apply_helper<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/apply.hh:35
     (inlined by) auto seastar::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/apply.hh:43
     (inlined by) seastar::future<> seastar::futurize<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1392
     (inlined by) seastar::future<> seastar::future<>::then_impl<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:936
     (inlined by) seastar::future<> seastar::future<>::then<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:917
     (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/partition.cc:480
    seastar::future<> seastar::do_until<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}) at crtstuff.c:?
     (inlined by) seastar::future<> seastar::futurize<void>::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}&>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1385
     (inlined by) seastar::future<> seastar::do_until<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#1}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:507
     (inlined by) sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/sstables/partition.cc:481
     (inlined by) seastar::future<> seastar::do_void_futurize_helper<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}&>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1359
     (inlined by) seastar::future<> seastar::futurize<void>::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}&>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1385
     (inlined by) seastar::future<> seastar::do_until<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:507
    sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at crtstuff.c:?
    flat_mutation_reader::impl::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) flat_mutation_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/flat_mutation_reader.hh:337
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:308
     (inlined by) apply<mutation_reader_merger::prepare_next(seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)>, mutation_reader_merger::reader_and_last_fragment_kind&> at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1402
     (inlined by) futurize_apply<mutation_reader_merger::prepare_next(seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)>, mutation_reader_merger::reader_and_last_fragment_kind&> at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1474
     (inlined by) parallel_for_each<mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::prepare_next(seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)> > at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:129
    parallel_for_each<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4>&, mutation_reader_merger::prepare_next(seastar::lowres_clock::time_point)::<lambda(mutation_reader_merger::reader_and_last_fragment_kind)> > at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) mutation_reader_merger::prepare_next(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:307
    mutation_reader_merger::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
    mutation_fragment_merger<mutation_reader_merger>::fetch(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) mutation_fragment_merger<mutation_reader_merger>::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:120
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:489
    repeat<combined_mutation_reader::fill_buffer(seastar::lowres_clock::time_point)::<lambda()> > at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) combined_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:500
    flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/flat_mutation_reader.hh:391
     (inlined by) restricting_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda(flat_mutation_reader&)#1}::operator()(flat_mutation_reader&) const at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:637
     (inlined by) _ZN27restricting_mutation_reader11with_readerIZNS_11fill_bufferENSt6chrono10time_pointIN7seastar12lowres_clockENS1_8durationIlSt5ratioILl1ELl1000EEEEEEEUlR20flat_mutation_readerE_EEDcT_S9_ at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:610
     (inlined by) restricting_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:641
    flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) mutation_reader_merger::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:384
    mutation_fragment_merger<mutation_reader_merger>::fetch(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) mutation_fragment_merger<mutation_reader_merger>::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:120
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:489
    repeat<combined_mutation_reader::fill_buffer(seastar::lowres_clock::time_point)::<lambda()> > at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:768
     (inlined by) combined_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/mutation_reader.cc:500
    flat_mutation_reader::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /usr/include/c++/8/bits/unique_ptr.h:81
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/flat_mutation_reader.cc:681
    apply<flat_multi_range_mutation_reader<Generator>::fill_buffer(seastar::lowres_clock::time_point) [with Generator = make_flat_multi_range_reader(schema_ptr, mutation_source, const partition_range_vector&, const query::partition_slice&, const seastar::io_priority_class&, tracing::trace_state_ptr, mutation_reader::forwarding)::adapter]::<lambda()>&> at /usr/include/c++/8/bits/unique_ptr.h:81
     (inlined by) apply<flat_multi_range_mutation_reader<Generator>::fill_buffer(seastar::lowres_clock::time_point) [with Generator = make_flat_multi_range_reader(schema_ptr, mutation_source, const partition_range_vector&, const query::partition_slice&, const seastar::io_priority_class&, tracing::trace_state_ptr, mutation_reader::forwarding)::adapter]::<lambda()>&> at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future.hh:1385
     (inlined by) do_until<flat_multi_range_mutation_reader<Generator>::fill_buffer(seastar::lowres_clock::time_point) [with Generator = make_flat_multi_range_reader(schema_ptr, mutation_source, const partition_range_vector&, const query::partition_slice&, const seastar::io_priority_class&, tracing::trace_state_ptr, mutation_reader::forwarding)::adapter]::<lambda()>, flat_multi_range_mutation_reader<Generator>::fill_buffer(seastar::lowres_clock::time_point) [with Generator = make_flat_multi_range_reader(schema_ptr, mutation_source, const partition_range_vector&, const query::partition_slice&, const seastar::io_priority_class&, tracing::trace_state_ptr, mutation_reader::forwarding)::adapter]::<lambda()> > at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:507
     (inlined by) fill_buffer at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/flat_mutation_reader.cc:682
    _ZN7seastar8internal8repeaterIZZ19fragment_and_freeze20flat_mutation_readerSt8functionIFNS_6futureIJNS_10bool_classINS_18stop_iteration_tagEEEEEE15frozen_mutationbEEmENKUlRT_RT0_E_clIS2_28fragmenting_mutation_freezerEEDaSD_SF_EUlvE_E15run_and_disposeEv at frozen_mutation.cc:?
     (inlined by) flat_mutation_reader::operator()(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/flat_mutation_reader.hh:337
     (inlined by) operator() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/frozen_mutation.cc:259
     (inlined by) run_and_dispose at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/include/seastar/core/future-util.hh:218
    seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /usr/include/boost/program_options/variables_map.hpp:146
    seastar::reactor::run_some_tasks() at /usr/include/boost/program_options/variables_map.hpp:146
    seastar::reactor::run_some_tasks() at /usr/include/boost/program_options/variables_map.hpp:146
     (inlined by) seastar::reactor::run() at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../src/core/reactor.cc:4243
    seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/future.hh:768
    seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at /jenkins/workspace/scylla-3.1/relocatable-pkg/scylla/seastar/build/release/../../include/seastar/core/future.hh:768
    main at crtstuff.c:?
    ?? ??:0
    _start at ??:?
    
    (CoreDumpEvent Severity.CRITICAL): node=Node longevity-large-partitions-4d-3-1-db-node-49dc20d4-4 [34.245.137.134 | 10.0.178.144] (seed: False)
    corefile_urls=
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20758.1566948577000000.gz/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20758.1566948577000000.gz.aa
    backtrace=           PID: 20758 (scylla)
               UID: 996 (scylla)
               GID: 1001 (scylla)
            Signal: 6 (ABRT)
         Timestamp: Tue 2019-08-27 23:29:37 UTC (1min 52s ago)
      Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 500 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-11
        Executable: /opt/scylladb/libexec/scylla
     Control Group: /
           Boot ID: 9f0393fe20f04dfab829e5bb5cc4bdad
        Machine ID: df877a200226bc47d06f26dae0736ec9
          Hostname: ip-10-0-178-144.eu-west-1.compute.internal
          Coredump: /var/lib/systemd/coredump/core.scylla.996.9f0393fe20f04dfab829e5bb5cc4bdad.20758.1566948577000000
           Message: Process 20758 (scylla) of user 996 dumped core.
                    
                    Stack trace of thread 20769:
                    #0  0x00007f179a95053f raise (libc.so.6)
                    #1  0x00007f179a93a95e abort (libc.so.6)
                    #2  0x0000000000469b8e _ZN8logalloc18allocating_section7reserveEv (scylla)
                    #3  0x00000000071c0d93 n/a (n/a)
    
    

    The other node's coredumps:

    (CoreDumpEvent Severity.CRITICAL): node=Node longevity-large-partitions-4d-3-1-db-node-49dc20d4-1 [63.35.248.143 | 10.0.10.203] (seed: True)
    corefile_urls=
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.5160.1566943409000000.gz/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.5160.1566943409000000.gz.aa
    backtrace=           PID: 5160 (scylla)
               UID: 996 (scylla)
               GID: 1001 (scylla)
            Signal: 6 (ABRT)
         Timestamp: Tue 2019-08-27 22:03:29 UTC (1min 55s ago)
      Command Line: /usr/bin/scylla --blocked-reactor-notify-ms 500 --abort-on-lsa-bad-alloc 1 --abort-on-seastar-bad-alloc --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-11
        Executable: /opt/scylladb/libexec/scylla
     Control Group: /
           Boot ID: 3f7c927968ca4130a5cfc4b02933017f
        Machine ID: df877a200226bc47d06f26dae0736ec9
          Hostname: ip-10-0-10-203.eu-west-1.compute.internal
          Coredump: /var/lib/systemd/coredump/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.5160.1566943409000000
           Message: Process 5160 (scylla) of user 996 dumped core.
                    
                    Stack trace of thread 5170:
                    #0  0x00007f742044b53f raise (libc.so.6)
                    #1  0x00007f742043595e abort (libc.so.6)
                    #2  0x00000000040219ab on_allocation_failure (scylla)
    

    Other download locaions:

    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.31878.1566944318000000.gz/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.31878.1566944318000000.gz.aa
    https://storage.cloud.google.com/upload.scylladb.com/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.32438.1566945161000000.gz/core.scylla.996.3f7c927968ca4130a5cfc4b02933017f.32438.1566945161000000.gz.aa
    
    

    relevant journalctl logs of the nodes can be found on scratch.scylladb.com/shlomib/longevity-large-partitions-4d-db-cluster.tar

  • Performance regression of 780% in p99th latency compared to 2.2.0 for 100% read test

    Performance regression of 780% in p99th latency compared to 2.2.0 for 100% read test

    Installation details Scylla version (or git commit hash): 2.3.rc0-0.20180722.a77bb1fe3 Cluster size: 3 OS (RHEL/CentOS/Ubuntu/AWS AMI): AWS AMI (ami-905252ef) instance type: i3.4xlarge

    test_latency_read results showing 780% regression in p99th latency compared to 2.2.0:

    Version | Op rate total | Latency mean | Latency 99th percentile -- | -- | -- | -- 2.2.0 |  39997.0 [2018-07-19 10:26:37] | 1.4 [2018-07-19 10:26:37] |  3.1 [2018-07-19 10:26:37] 2.3.0 | 37200.0 (6% Regression) | 8.2 (485% Regression) | 27.3 (780% Regression)

    2.3.0 p99th latency looks abnormal and reaches peaks of ~400ms: screen shot 2018-07-25 at 1 26 42

    Test is populating 1TB of data and then start a c-s read command: cassandra-stress read no-warmup cl=QUORUM duration=50m -schema 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate 'threads=100 limit=10000/s' -errors ignore -col 'size=FIXED(1024) n=FIXED(1)' -pop 'dist=gauss(1..1000000000,500000000,50000000)' (During the first part of the test we can still see compactions that are leftovers of the write population)

    Full screenshot: screencapture-34-230-6-17-3000-dashboard-db-scylla-per-server-metrics-nemesis-master-test-latency-2-3-2018-07-25-01_31_03

  • test_latency_mixed_with_nemesis - latency during

    test_latency_mixed_with_nemesis - latency during "steady state" get to 20 ms without heavy stalls

    Installation details Scylla version (or git commit hash): 4.4.dev-0.20210114.32fd38f34 with build-id 0642bb3b142094f1092b0d276f6efa858081fe96 Cluster size: 3 OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-012cafbb2dc4f1e4d (eu-west-1)

    running mixed workload with the command: cassandra-stress mixed no-warmup cl=QUORUM duration=350m -schema 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate 'threads=50 throttle=3500/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..250000000,125000000,12500000)'

    during the steady state, the only stalls detected were:

    2021-01-15T06:40:39+00:00  perf-latency-nemesis-perf-v10-db-node-9420ec57-2 !INFO    | scylla: Reactor stalled for 6 ms on shard 5.
    2021-01-15T06:48:16+00:00  perf-latency-nemesis-perf-v10-db-node-9420ec57-3 !INFO    | scylla: Reactor stalled for 6 ms on shard 5.
    2021-01-15T06:51:27+00:00  perf-latency-nemesis-perf-v10-db-node-9420ec57-2 !INFO    | scylla: Reactor stalled for 6 ms on shard 4.
    2021-01-15T06:58:25+00:00  perf-latency-nemesis-perf-v10-db-node-9420ec57-2 !INFO    | scylla: Reactor stalled for 6 ms on shard 6.
    2021-01-15T06:59:50+00:00  perf-latency-nemesis-perf-v10-db-node-9420ec57-2 !INFO    | scylla: Reactor stalled for 6 ms on shard 7.
    2021-01-15T07:07:13+00:00  perf-latency-nemesis-perf-v10-db-node-9420ec57-2 !INFO    | scylla: Reactor stalled for 6 ms on shard 5.
    

    the values for the steady state latency are: Metric name | Metric value -----------------| ------------------ "c-s P95" | "5.40" "c-s P99" |"19.10" "Scylla P99_read - node-3" | "19.20" "Scylla P99_write - node-1" | "13.76" "Scylla P99_read - node-2" | "23.66" "Scylla P99_write - node-2" | "13.79" "Scylla P99_read - node-1" | "23.55" "Scylla P99_write - node-3" | "1.56"

    there is a live monitor here

    here is a live snapshot (if monitor dies)

    from the monitor, we can see: c-s latency Screenshot from 2021-01-28 15-30-08 (copy)

    c-s_max

    and Scylla latency: read_99th

    write_99th

    comparing with the original document where we checked these values, we have: for Scylla 4.1:

    Metric name | read value | write value -----------------|----------------|--------------- Mean | 0.9 ms | 0.4 ms P95 | 7.8 ms | 1.4 ms P99 | 48.2 ms | 2.5 ms Max | 71 ms | 71 ms

    for Scylla 666.development-0.20200910.02ee0483b: Metric name | read value | write value -----------------|----------------|--------------- Mean | 0.7 ms | 0.3 ms P95 | 3.6 ms | 0.9 ms P99 | 6 ms | 1.2 ms Max | 16.8 ms | 16.8 ms

    all the nodes logs can be downloaded here

    even c-s 95th is too high for a steady state time: c-s_95th

  • Permanent read/write fails after

    Permanent read/write fails after "bad_alloc"

    This is Scylla's bug tracker, to be used for reporting bugs only. If you have a question about Scylla, and not a bug, please ask it in our mailing-list at [email protected] or in our slack channel.

    • [*] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

    Installation details Scylla version (or git commit hash): 3.2.4 Cluster size: 5+5 (multi DC) OS (RHEL/CentOS/Ubuntu/AWS AMI): C7.5

    Platform (physical/VM/cloud instance type/docker): bare metal Hardware: sockets=2 x Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz cores=40 hyperthreading=yes memory= 6x32GB DDR4 2666MHz Disks: RAID 10 of 10HDDs 14TB each for data, RAID 1 SSD 1TB for clogs

    Hi!

    The problem started with errors like "exception during mutation write to 10.161.180.24: std::bad_alloc (std::bad_alloc)" and led to one shard constantly failing a lot of (probably all) write/read operations until scylla-server was manually restarted. I guess that this can be due to having large partitions so here is what we have on that (we have 2 CFs):

    becca/events histograms
    Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                                  (micros)          (micros)           (bytes)
    50%             2.00             16.00          47988.50               770                29
    75%             2.00             20.00          79061.50              5722               215
    95%             6.00             33.00         185724.05             88148              2759
    98%             8.00             36.00         239365.28            182785              5722
    99%            10.00             46.73         295955.11            263210              8239
    Min             0.00              1.00             20.00                73                 2
    Max            24.00          29492.00        2051039.00         464228842           5839588
    
    becca/events_by_ip histograms
    Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                                  (micros)          (micros)           (bytes)
    50%             0.00             16.00              0.00              6866               179
    75%             0.00             19.75              0.00             29521               770
    95%             0.00             33.00              0.00            315852              8239
    98%             0.00             41.00              0.00            785939             20501
    99%             0.00             48.43              0.00           1629722             42510
    Min             0.00              1.00              0.00                73                 0
    Max             0.00          19498.00              0.00         386857368           4866323
    

    Anyway if some big query arrived and failed I do not quite understand why all subsequent queries failed until the node was restarted.

    Logs: https://cloud.mail.ru/public/C3AZ/RxPZyKUV6

    Dashboard (by shard)

    Снимок экрана 2020-05-09 в 20 11 38 Снимок экрана 2020-05-09 в 17 41 49
  • Cassandra Stress times out: BusyPoolException: no available connection and timed out after 5000 MILLISECONDS / using shard-aware driver, get the 1tb longevity to overload

    Cassandra Stress times out: BusyPoolException: no available connection and timed out after 5000 MILLISECONDS / using shard-aware driver, get the 1tb longevity to overload

    Installation details Scylla version (or git commit hash): 4.3.rc2-0.20201126.bc922a743 with build-id 840fd4b3f6304765c03e886269b1c2550bf23e53 Cluster size: 4 OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-09f30667ba6e09e9b (eu-west-1) Scenario: 1tb-7days

    Half an hour into the stress' run, at 15:15, a consistent BusyPoolException from three of the four nodes, which continued throughout the entire remaining run of the stress:

    15:15:22.497 [Thread-641] DEBUG c.d.driver.core.RequestHandler - [1227134168-0] Error querying 10.0.0.5/10.0.0.5:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [10.0.0.5/10.0.0.5:9042] Pool is busy (no available connection and the queue has reached its max size 256)
    ...
    15:32:59.650 [cluster1-nio-worker-21] DEBUG c.d.driver.core.RequestHandler - [540726118-0] Error querying 10.0.0.5/10.0.0.5:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [10.0.0.5/10.0.0.5:9042] Pool is busy (no available connection and timed out after 5000 MILLISECONDS)
    
    15:25:50.717 [Thread-177] DEBUG c.d.driver.core.RequestHandler - [544250492-0] Error querying 10.0.3.37/10.0.3.37:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [10.0.3.37/10.0.3.37:9042] Pool is busy (no available connection and the queue has reached its max size 256)
    ...
    15:32:59.638 [cluster1-nio-worker-29] DEBUG c.d.driver.core.RequestHandler - [640744570-0] Error querying 10.0.1.149/10.0.1.149:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [10.0.1.149/10.0.1.149:9042] Pool is busy (no available connection and timed out after 5000 MILLISECONDS)
    
    15:32:59.638 [cluster1-nio-worker-29] DEBUG c.d.driver.core.RequestHandler - [640744570-0] Error querying 10.0.1.149/10.0.1.149:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [10.0.1.149/10.0.1.149:9042] Pool is busy (no available connection and timed out after 5000 MILLISECONDS)
    

    At the same time, the stress experienced consistent WriteTimeoutException, since the stress failed to achieve quorum:

    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 1 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 0 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 0 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 0 acknowledged the write)
    com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency QUORUM (2 replica were required but only 0 acknowledged the write)
    
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.1.149/10.0.1.149:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.0.5/10.0.0.5:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.1.149/10.0.1.149:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.0.5/10.0.0.5:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.0.5/10.0.0.5:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.1.149/10.0.1.149:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.0.5/10.0.0.5:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.3.37/10.0.3.37:9042] Timed out waiting for server response
    com.datastax.driver.core.exceptions.OperationTimedOutException: [10.0.3.37/10.0.3.37:9042] Timed out waiting for server response
    

    At 16:18, the stress starts to experience EMPTY RESULT errors:

    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.3.37:9042-7, inFlight=128, closed=false] Response received on stream 27584 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.3.37:9042-7, inFlight=128, closed=false] Response received on stream 27648 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.3.37:9042-7, inFlight=128, closed=false] Response received on stream 27712 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.0.5:9042-11, inFlight=128, closed=false] Response received on stream 32640 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.0.5:9042-11, inFlight=128, closed=false] Response received on stream 32704 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.0.5:9042-11, inFlight=128, closed=false] Response received on stream 0 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.1.149:9042-4, inFlight=128, closed=false] Response received on stream 16320 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.1.149:9042-4, inFlight=128, closed=false] Response received on stream 16384 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    16:18:24.362 [cluster1-nio-worker-0] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.1.149:9042-4, inFlight=128, closed=false] Response received on stream 16448 but no handler set anymore (either the request has timed out or it was closed due to another error). Received message is EMPTY RESULT
    

    Weirdly enough, node#4 ,10.0.1.77, does not seem to experience any timeouts. In fact, the messages in the stress' log I see in that time period are healthy heartbeat messages:

    14:42:15.661 [cluster1-nio-worker-3] DEBUG com.datastax.driver.core.Connection - Connection[/10.0.1.77:9042-2, inFlight=1, closed=false] Keyspace set to keyspace1
    16:19:32.899 [cluster1-reconnection-0] DEBUG com.datastax.driver.core.Host.STATES - [10.0.1.77/10.0.1.77:9042] preparing to open 1 new connections, total = 15
    16:19:32.901 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] Connection established, initializing transport
    16:19:32.937 [cluster1-nio-worker-17] DEBUG c.d.s.netty.handler.ssl.SslHandler - [id: 0x14eb560e, L:/10.0.1.115:48940 - R:10.0.1.77/10.0.1.77:9042] HANDSHAKEN: TLS_RSA_WITH_AES_128_CBC_SHA
    16:19:41.082 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Host.STATES - [10.0.1.77/10.0.1.77:9042] Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] Transport initialized, connection ready
    16:20:03.838 [cluster1-reconnection-0] DEBUG com.datastax.driver.core.Host.STATES - [Control connection] established to 10.0.1.77/10.0.1.77:9042
    16:20:33.809 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
    16:20:41.918 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] heartbeat query succeeded
    16:21:11.926 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
    16:21:13.881 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] heartbeat query succeeded
    16:21:43.882 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
    16:21:48.369 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] heartbeat query succeeded
    16:22:18.373 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
    16:22:22.816 [cluster1-nio-worker-17] DEBUG com.datastax.driver.core.Connection - Connection[10.0.1.77/10.0.1.77:9042-28, inFlight=0, closed=false] heartbeat query succeeded
    

    Screenshot from 2020-12-03 13-44-38

    Screenshot from 2020-12-03 14-04-38

    From the looks of the metrics of both foreground and background writes per instance, it seems that node#4 indeed receives less writes than any other node. Perhaps it's possible that this fact caused the inflight hints messages of the other nodes to fill up, considering that in the previous errors nodes 1-3 reported that inFlight=128? Perhaps there is an issue with key distribution between the nodes, which caused the other nodes to receive more stress than they could have handled.

    The failed stress command:

    cassandra-stress write cl=QUORUM n=1100200300 -schema 'replication(factor=3) compaction(strategy=LeveledCompactionStrategy)' -port jmx=6868 -mode cql3 native -rate threads=1000 -col 'size=FIXED(200) n=FIXED(5)' -pop seq=1..1100200300
    

    Other prepare stresses for this run:

    cassandra-stress write cl=QUORUM n=50000000 -schema 'replication(factor=3) compression=LZ4Compressor compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=lz4 -rate threads=50 -pop seq=1..50000000 -log interval=5
    cassandra-stress write cl=QUORUM n=50000000 -schema 'replication(factor=3) compression=SnappyCompressor compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=snappy -rate threads=50 -pop seq=1..50000000 -log interval=5
    cassandra-stress write cl=QUORUM n=50000000 -schema 'replication(factor=3) compression=DeflateCompressor compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none -rate threads=50 -pop seq=1..50000000 -log interval=5
    cassandra-stress write cl=QUORUM n=50000000 -schema 'replication(factor=3) compression=ZstdCompressor compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native compression=none -rate threads=50 -pop seq=1..50000000 -log interval=5
    

    (Each of them runs once, spread across 2 loaders)

    Node list:

    longevity-tls-1tb-7d-4-3-db-node-66a319cd-1 [34.243.3.190 | 10.0.1.149]
    longevity-tls-1tb-7d-4-3-db-node-66a319cd-2 [54.246.50.198 | 10.0.0.5] 
    longevity-tls-1tb-7d-4-3-db-node-66a319cd-3 [54.247.54.152 | 10.0.3.37]
    longevity-tls-1tb-7d-4-3-db-node-66a319cd-4 [52.211.7.163 | 10.0.1.77] 
    

    Logs:

    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                            Log links for testrun with test id 66a319cd-223d-450b-8f0f-2bb423d39693                                                                                            |
    +-----------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Date            | Log type    | Link                                                                                                                                                                                                                          |
    +-----------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | 20190101_010101 | prometheus  | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/prometheus_snapshot_20201202_164129.tar.gz                                                                                                |
    | 20201202_163157 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_163157/grafana-screenshot-overview-20201202_163158-longevity-tls-1tb-7d-4-3-monitor-node-66a319cd-1.png                          |
    | 20201202_163157 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_163157/grafana-screenshot-scylla-per-server-metrics-nemesis-20201202_163545-longevity-tls-1tb-7d-4-3-monitor-node-66a319cd-1.png |
    | 20201202_164145 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_164145/grafana-screenshot-overview-20201202_164145-longevity-tls-1tb-7d-4-3-monitor-node-66a319cd-1.png                          |
    | 20201202_164145 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_164145/grafana-screenshot-scylla-per-server-metrics-nemesis-20201202_164500-longevity-tls-1tb-7d-4-3-monitor-node-66a319cd-1.png |
    | 20201202_165046 | db-cluster  | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_165046/db-cluster-66a319cd.zip                                                                                                   |
    | 20201202_165046 | loader-set  | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_165046/loader-set-66a319cd.zip                                                                                                   |
    | 20201202_165046 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_165046/monitor-set-66a319cd.zip                                                                                                  |
    | 20201202_165046 | sct-runner  | https://cloudius-jenkins-test.s3.amazonaws.com/66a319cd-223d-450b-8f0f-2bb423d39693/20201202_165046/sct-runner-66a319cd.zip                                                                                                   |
    +-----------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    To start the monitor using hydra:

    hydra investigate show-monitor 66a319cd-223d-450b-8f0f-2bb423d39693
    
  • some non-prepared statements can leak memory (with set/map/tuple/udt literals)

    some non-prepared statements can leak memory (with set/map/tuple/udt literals)

    This is Scylla's bug tracker, to be used for reporting bugs only. If you have a question about Scylla, and not a bug, please ask it in our mailing-list at [email protected] or in our slack channel.

    • [*] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

    Installation details Scylla version: 4.0.4 Cluster size: 10 nodes, 4 shards per node OS: Ubuntu

    After running ok for a few days, nodes consistently start having 'bad_alloc' errors, even though we do not have a lot of Data files (~1850 Data files) and our data size (400G per node) is not that great comparing to the memory available to the node (90G for 4 shards, so about 22G per shard).

    Aug 11 04:31:13 fr-eqx-scylla-04 scylla[10177]: WARN 2020-08-11 04:31:13,007 [shard 0] storage_proxy - Failed to apply mutation from 192.168.96.47#0: std::bad_alloc (std::bad_alloc)

    Our non_lsa memory is always growing and at some point it just start having bad_alloc once it reaches a level:

    2020-08-11_non_lsa_scylla-04_shards

  • Repair based node operations

    Repair based node operations

    storage_service: Switch to repair based node operation

    Here is a simple introduction to the node operations scylla supports and some of the issues.

    • Replace operation

      It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy.

    • Rebuild operation

      It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy.

    • Bootstrap operation

      It is used to add a new node into the cluster. The token ring changes. Do no suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range.

      Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire.

      Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data.

    • Decommission operation

      It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes.

      It suffers from resumable issue like bootstrap operation.

    • Removenode operation

      It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy.

    To solve all the issues above. We could use repair based node operation. The idea behind repair based node operations is simple: use repair to sync data between replicas instead of streaming.

    The benefits:

    • Latest copy is guaranteed

    • Resumable in nature

    • No extra data is streamed on wire E.g., rebuild twice, will not stream the same data twice

    • Unified code path for all the node operations

    • Free repair operation during bootstrap, replace operation and so on.

    Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test

    Changes in v2:

    • Rebased to latest master
    • Do not call ranges.erase in do_decommission_removenode_with_repair
    • Use seastar::thread::maybe_yield
    • Do not pass empty hosts and data_centers in sync_data_with_repair
    • Add more comments about the repair node choosing in removenode operation
    • Use neighbors_set.insert(current_eps.begin(), current_eps.end());
  • possible query failure, crash and/or data corruption if the number of columns in a clustering key or partition key is more than 16

    possible query failure, crash and/or data corruption if the number of columns in a clustering key or partition key is more than 16

    Installation details: Scylla version (or git commit hash): 3.3.rc1-0.20200218.756574d094 with build-id ee19790f9aa13c85860c4e41330be76c65db1e95 Cluster size: 3 nodes OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-00d212758df086c8b instance_type: i3.large

    Gemini command executed with large number of primary and clustering keys and columns. Next gemini command was executed:

    /$HOME/gemini -d --duration 480m --warmup 60m -c 50 -m mixed -f --non-interactive \
    --cql-features normal --async-objects-stabilization-backoff 100ms \
    --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" \
    --max-mutation-retries 5 --max-mutation-retries-backoff 100ms \
    --max-partition-keys 12 --min-partition-keys 8 \
    --max-clustering-keys 20 --min-clustering-keys 12 \
    --max-columns 100 --min-columns 80 \
    --test-cluster=10.0.65.153 --outfile /home/centos/gemini_result_0ed0f240-6040-4f6f-aaa2-1891f7462b8b.log --seed 1
    

    Next schema was generated by gemini:

    Schema: {
        "keyspace": {
            "name": "ks1",
            "replication": {
                "class": "SimpleStrategy",
                "replication_factor": "3"
            },
            "oracle_replication": {
                "class": "SimpleStrategy",
                "replication_factor": 1
            }
        },
        "tables": [
            {
                "name": "table1",
                "partition_keys": [
                    {
                        "name": "pk0",
                        "type": "int"
                    },
                    {
                        "name": "pk1",
                        "type": "smallint"
                    },
                    {
                        "name": "pk2",
                        "type": "tinyint"
                    },
                    {
                        "name": "pk3",
                        "type": "int"
                    },
                    {
                        "name": "pk4",
                        "type": "varint"
                    },
                    {
                        "name": "pk5",
                        "type": "tinyint"
                    },
                    {
                        "name": "pk6",
                        "type": "varint"
                    },
                    {
                        "name": "pk7",
                        "type": "smallint"
                    },
                    {
                        "name": "pk8",
                        "type": "varint"
                    }
                ],
                "clustering_keys": [
                    {
                        "name": "ck0",
                        "type": "timeuuid"
                    },
                    {
                        "name": "ck1",
                        "type": "timeuuid"
                    },
                    {
                        "name": "ck2",
                        "type": "inet"
                    },
                    {
                        "name": "ck3",
                        "type": "double"
                    },
                    {
                        "name": "ck4",
                        "type": "float"
                    },
                    {
                        "name": "ck5",
                        "type": "uuid"
                    },
                    {
                        "name": "ck6",
                        "type": "inet"
                    },
                    {
                        "name": "ck7",
                        "type": "text"
                    },
                    {
                        "name": "ck8",
                        "type": "smallint"
                    },
                    {
                        "name": "ck9",
                        "type": "varchar"
                    },
                    {
                        "name": "ck10",
                        "type": "float"
                    },
                    {
                        "name": "ck11",
                        "type": "inet"
                    },
                    {
                        "name": "ck12",
                        "type": "inet"
                    }
                ],
                "columns": [
                    {
                        "name": "col0",
                        "type": {
                            "kind": "set",
                            "type": "decimal",
                            "frozen": true
                        }
                    },
                    {
                        "name": "col1",
                        "type": "duration"
                    },
                    {
                        "name": "col2",
                        "type": "blob"
                    },
                    {
                        "name": "col3",
                        "type": {
                            "types": {
                                "udt_1357993852_0": "double",
                                "udt_1357993852_1": "boolean",
                                "udt_1357993852_2": "ascii",
                                "udt_1357993852_3": "uuid",
                                "udt_1357993852_4": "float"
                            },
                            "type_name": "udt_1357993852",
                            "frozen": true
                        }
                    },
                    {
                        "name": "col4",
                        "type": "uuid"
                    },
                    {
                        "name": "col5",
                        "type": "tinyint"
                    },
                    {
                        "name": "col6",
                        "type": "timestamp"
                    },
                    {
                        "name": "col7",
                        "type": "timestamp"
                    },
                    {
                        "name": "col8",
                        "type": "blob"
                    }
                ],
                "indexes": [
                    {
                        "name": "col5_idx",
                        "column": "col5",
                        "column_idx": 5
                    }
                ],
                "materialized_views": [
                    {
                        "name": "table1_mv_0",
                        "partition_keys": [
                            {
                                "name": "col6",
                                "type": "timestamp"
                            },
                            {
                                "name": "pk0",
                                "type": "int"
                            },
                            {
                                "name": "pk1",
                                "type": "smallint"
                            },
                            {
                                "name": "pk2",
                                "type": "tinyint"
                            },
                            {
                                "name": "pk3",
                                "type": "int"
                            },
                            {
                                "name": "pk4",
                                "type": "varint"
                            },
                            {
                                "name": "pk5",
                                "type": "tinyint"
                            },
                            {
                                "name": "pk6",
                                "type": "varint"
                            },
                            {
                                "name": "pk7",
                                "type": "smallint"
                            },
                            {
                                "name": "pk8",
                                "type": "varint"
                            }
                        ],
                        "clustering_keys": [
                            {
                                "name": "ck0",
                                "type": "timeuuid"
                            },
                            {
                                "name": "ck1",
                                "type": "timeuuid"
                            },
                            {
                                "name": "ck2",
                                "type": "inet"
                            },
                            {
                                "name": "ck3",
                                "type": "double"
                            },
                            {
                                "name": "ck4",
                                "type": "float"
                            },
                            {
                                "name": "ck5",
                                "type": "uuid"
                            },
                            {
                                "name": "ck6",
                                "type": "inet"
                            },
                            {
                                "name": "ck7",
                                "type": "text"
                            },
                            {
                                "name": "ck8",
                                "type": "smallint"
                            },
                            {
                                "name": "ck9",
                                "type": "varchar"
                            },
                            {
                                "name": "ck10",
                                "type": "float"
                            },
                            {
                                "name": "ck11",
                                "type": "inet"
                            },
                            {
                                "name": "ck12",
                                "type": "inet"
                            }
                        ]
                    }
                ],
                "known_issues": {
                    "https://github.com/scylladb/scylla/issues/3708": true
                },
                "table_options": null
            }
        ]
    }
    

    Number of column is less than set, but gemini issue is opened: https://github.com/scylladb/gemini/issues/224

    Gemini command should run about 8 hours. But gemini failed after 5 hours of running. During this gemini running on each node were found several Reactor stalls for 658 ms Opened backtrace:

    "backtrace": "void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /usr/include/fmt/format.h:1158\nseastar::backtrace_buffer::append_backtrace() at /usr/include/fmt/format.h:1158\n (inlined by) print_with_backtrace at ./build/release/seastar/./seastar/src/core/reactor.cc:742\nseastar::internal::cpu_stall_detector::generate_trace() at /usr/include/fmt/format.h:1158\nseastar::internal::cpu_stall_detector::maybe_report() at /usr/include/fmt/format.h:1158\n (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1053\n (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1176\n?? ??:0\n?? ??:0\nsigned char* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<signed char>(signed char const*, signed char const*, signed char*) at /usr/include/fmt/format.h:1158\n (inlined by) signed char* std::__copy_move_a<false, signed char const*, signed char*>(signed char const*, signed char const*, signed char*) at /usr/include/c++/9/bits/stl_algobase.h:404\n (inlined by) signed char* std::__copy_move_a2<false, signed char const*, signed char*>(signed char const*, signed char const*, signed char*) at /usr/include/c++/9/bits/stl_algobase.h:440\n (inlined by) signed char* std::copy<signed char const*, signed char*>(signed char const*, signed char const*, signed char*) at /usr/include/c++/9/bits/stl_algobase.h:474\n (inlined by) signed char* std::__copy_n<signed char const*, unsigned long, signed char*>(signed char const*, unsigned long, signed char*, std::random_access_iterator_tag) at /usr/include/c++/9/bits/stl_algo.h:782\n (inlined by) signed char* std::copy_n<signed char const*, unsigned long, signed char*>(signed char const*, unsigned long, signed char*) at /usr/include/c++/9/bits/stl_algo.h:806\n (inlined by) bytes_ostream::write(std::basic_string_view<signed char, std::char_traits<signed char> >) at ././bytes_ostream.hh:258\n (inlined by) bytes_ostream::write(char const*, unsigned long) at ././bytes_ostream.hh:265\n (inlined by) void seastar::simple_memory_input_stream::copy_to<bytes_ostream>(bytes_ostream&) const at ././seastar/include/seastar/core/simple-stream.hh:363\n (inlined by) _ZZN7seastar30fragmented_memory_input_streamIN13bytes_ostream17fragment_iteratorEE7copy_toIS1_EEvRT_ENKUlS5_E_clINS_26simple_memory_input_streamEEEDaS5_ at ././seastar/include/seastar/core/simple-stream.hh:427\n (inlined by) _ZN7seastar30fragmented_memory_input_streamIN13bytes_ostream17fragment_iteratorEE17for_each_fragmentIZNS3_7copy_toIS1_EEvRT_EUlS6_E_EEvmOS6_ at ././seastar/include/seastar/core/simple-stream.hh:394\n (inlined by) void seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::copy_to<bytes_ostream>(bytes_ostream&) at ././seastar/include/seastar/core/simple-stream.hh:426\n (inlined by) void seastar::memory_input_stream<bytes_ostream::fragment_iterator>::copy_to<bytes_ostream>(bytes_ostream&)::{lambda(bytes_ostream&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(auto, bytes_ostream&) const at ././seastar/include/seastar/core/simple-stream.hh:573\n (inlined by) _ZN7seastar19memory_input_streamIN13bytes_ostream17fragment_iteratorEE11with_streamIZNS3_7copy_toIS1_EEvRT_EUlS7_E_EEDcOS6_ at ././seastar/include/seastar/core/simple-stream.hh:472\n (inlined by) void seastar::memory_input_stream<bytes_ostream::fragment_iterator>::copy_to<bytes_ostream>(bytes_ostream&) at ././seastar/include/seastar/core/simple-stream.hh:572\n (inlined by) void ser::serializer<ser::qr_partition_view>::write<bytes_ostream>(bytes_ostream&, ser::qr_partition_view) at ./build/release/gen/idl/query.dist.impl.hh:232\n (inlined by) void ser::serialize<ser::qr_partition_view, bytes_ostream>(bytes_ostream&, ser::qr_partition_view const&) at ././serializer.hh:232\n (inlined by) ser::query_result__partitions<bytes_ostream>::add(ser::qr_partition_view) at ./build/release/gen/idl/query.dist.impl.hh:851\n (inlined by) operator() at ./query.cc:277\n (inlined by) do_with<query::result_merger::get()::<lambda(query::result_view)> > at ./query-result-reader.hh:164\n (inlined by) query::result_merger::get() at ./query.cc:269\noperator() at /usr/include/fmt/format.h:1158\n (inlined by) apply at ././seastar/include/seastar/core/apply.hh:36\n (inlined by) apply<cql3::statements::indexed_table_select_statement::do_execute_base_query(service::storage_proxy&, std::vector<cql3::statements::select_statement::primary_key>&&, service::query_state&, const cql3::query_options&, gc_clock::time_point, seastar::shared_ptr<const service::pager::paging_state>) const::<lambda(auto:175&&)> [with auto:175 = cql3::statements::indexed_table_select_statement::do_execute_base_query(service::storage_proxy&, std::vector<cql3::statements::select_statement::primary_key>&&, service::query_state&, const cql3::query_options&, gc_clock::time_point, seastar::shared_ptr<const service::pager::paging_state>) const::base_query_state&]::<lambda()> > at ././seastar/include/seastar/core/apply.hh:44\n (inlined by) apply<cql3::statements::indexed_table_select_statement::do_execute_base_query(service::storage_proxy&, std::vector<cql3::statements::select_statement::primary_key>&&, service::query_state&, const cql3::query_options&, gc_clock::time_point, seastar::shared_ptr<const service::pager::paging_state>) const::<lambda(auto:175&&)> [with auto:175 = cql3::statements::indexed_table_select_statement::do_execute_base_query(service::storage_proxy&, std::vector<cql3::statements::select_statement::primary_key>&&, service::query_state&, const cql3::query_options&, gc_clock::time_point, seastar::shared_ptr<const service::pager::paging_state>) const::base_query_state&]::<lambda()> > at ././seastar/include/seastar/core/future.hh:1565\n (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1191\n (inlined by) run_and_dispose at ././seastar/include/seastar/core/future.hh:506\nseastar::reactor::run_tasks(seastar::reactor::task_queue&) at /usr/include/fmt/format.h:1158\nseastar::reactor::run_some_tasks() at /usr/include/fmt/format.h:1158\nseastar::reactor::run_some_tasks() at /usr/include/fmt/format.h:1158\n (inlined by) seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2686\nseastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::{lambda()#3}::operator()() const at /usr/include/fmt/format.h:1158\nstd::function<void ()>::operator()() const at /usr/include/c++/9/bits/std_function.h:690\n (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:52\n?? ??:0\n?? ??:0\n", "raw_backtrace": "0x0000000002a5c472\n0x0000000002a02feb\n0x0000000002a03512\n0x0000000002a03847\n0x00007f5d71700b1f\n0x00007f5d70df4d52\n0x0000000001a980d2\n0x0000000001175222\n0x00000000029ff291\n0x00000000029ff49f\n0x0000000002a3fc65\n0x0000000002a4e8bc\n0x00000000029d9b2d\n/libreloc/libpthread.so.0+0x00000000000094e1\n/libreloc/libc.so.6+0x0000000000101692"}
    

    and a lot of errors on each node:

    2020-02-19T17:52:48+00:00  gemini-8h-large-num-columns-GeminiL-db-node-f2d6a8e0-1 !WARNING | scylla: [shard 0] tracing - Maximum records limit is hit 2750001 times
    2020-02-19T17:52:49+00:00  gemini-8h-large-num-columns-GeminiL-db-node-f2d6a8e0-1 !WARNING | scylla: [shard 0] tracing - Maximum records limit is hit 2760001 times
    

    with next error:

    2020-02-19T17:52:54+00:00  gemini-8h-large-num-columns-GeminiL-db-node-f2d6a8e0-3 !ERR     | scylla: [shard 0] storage_proxy - Exception when communicating with 10.0.207.169: std::runtime_error (marshaling error: read_simple_exactly - size mismatch (expected 4, got 1) Backtrace:   0x2c4f08d#012  0x9fcd3e#012  0x444b28#012  0x4d8fe5#012  0xa78e8b#012  0xeab269#012  0xc27a67#012  0xc28239#012  0xc600e3#012  0xadebf3#012  0xae14c1#012  0x29ff291#012  0x29ff49f#012  0x2a3fc65#012  0x29a5d6f#012  0x29a6e9e#012  0x72a4e3#012  /opt/scylladb/libreloc/libc.so.6+0x271a2#012  0x77548d#012)
    2020-02-19T17:52:54+00:00  gemini-8h-large-num-columns-GeminiL-db-node-f2d6a8e0-3 !ERR     | scylla: [shard 0] storage_proxy - Exception when communicating with 10.0.157.50: std::runtime_error (marshaling error: read_simple_exactly - size mismatch (expected 4, got 1) Backtrace:   0x2c4f08d#012  0x9fcd3e#012  0x444b28#012  0x4d8fe5#012  0xa78e8b#012  0xeab269#012  0xc27a67#012  0xc28239#012  0xc600e3#012  0xadebf3#012  0xae14c1#012  0x29ff291#012  0x29ff49f#012  0x2a3fc65#012  0x29a5d6f#012  0x29a6e9e#012  0x72a4e3#012  /opt/scylladb/libreloc/libc.so.6+0x271a2#012  0x77548d#012)
    

    This errors appeared on each node;

    db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/f2d6a8e0-1086-4be2-971a-805b64a240dd/20200219_193418/db-cluster-f2d6a8e0.zip | loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/f2d6a8e0-1086-4be2-971a-805b64a240dd/20200219_193418/loader-set-f2d6a8e0.zip | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/f2d6a8e0-1086-4be2-971a-805b64a240dd/20200219_193418/monitor-set-f2d6a8e0.zip | sct-runner | https://cloudius-jenkins-test.s3.amazonaws.com/f2d6a8e0-1086-4be2-971a-805b64a240dd/20200219_193418/sct-runner-f2d6a8e0.zip |

    Core dumps were not found

Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.
Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.

Kvrocks is a key-value NoSQL database based on RocksDB and compatible with Redis protocol.

Jun 23, 2022
Kvrocks is a distributed key value NoSQL database based on RocksDB and compatible with Redis protocol.
Kvrocks is a distributed key value NoSQL database based on RocksDB and compatible with Redis protocol.

Kvrocks is a distributed key value NoSQL database based on RocksDB and compatible with Redis protocol.

Jun 24, 2022
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

?? ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Jun 24, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large-scale graphs with billions of vertices (nodes) and trillions of edges, with milliseconds of latency. It delivers enterprise-grade high performance to simplify the most complex data sets imaginable into meaningful and useful information.

Jun 24, 2022
FEDB is a NewSQL database optimised for realtime inference and decisioning application
FEDB is a NewSQL database optimised for realtime inference and decisioning application

FEDB is a NewSQL database optimised for realtime inference and decisioning applications. These applications put real-time features extracted from multiple time windows through a pre-trained model to evaluate new data to support decision making. Existing in-memory databases cost hundreds or even thousands of milliseconds so they cannot meet the requirements of inference and decisioning applications.

Jun 16, 2022
The MongoDB Database

The MongoDB Database

Jun 18, 2022
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB

Jun 20, 2022
RediSearch is a Redis module that provides querying, secondary indexing, and full-text search for Redis.
RediSearch is a Redis module that provides querying, secondary indexing, and full-text search for Redis.

A query and indexing engine for Redis, providing secondary indexing, full-text search, and aggregations.

Jun 17, 2022
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

Jun 17, 2022
✔️The smallest header-only GUI library(4 KLOC) for all platforms
✔️The smallest header-only GUI library(4 KLOC) for all platforms

Welcome to GUI-lite The smallest header-only GUI library (4 KLOC) for all platforms. 中文 Lightweight ✂️ Small: 4,000+ lines of C++ code, zero dependenc

Jun 17, 2022
MinIO C++ Client SDK for Amazon S3 Compatible Cloud Storage

The MinIO C++ Client SDK provides simple APIs to access any Amazon S3 compatible object storage.

Jun 17, 2022
MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

Copyright (c) 2000, 2021, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the

Jun 24, 2022
The database built for IoT streaming data storage and real-time stream processing.
The database built for IoT streaming data storage and real-time stream processing.

The database built for IoT streaming data storage and real-time stream processing.

Jun 15, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Jun 17, 2022
Overlay Microsoft Flight Simulator (FS2020) aircraft data onto real airport charts in real-time
Overlay Microsoft Flight Simulator (FS2020) aircraft data onto real airport charts in real-time

FLIGHTSIM CHARTS Introduction Overlay Microsoft Flight Simulator (FS2020) aircraft data onto real airport charts in real-time. Instantly teleport to a

May 31, 2022
An R interface to the 'Apache Arrow' C API

carrow The goal of carrow is to wrap the Arrow Data C API and Arrow Stream C API to provide lightweight Arrow support for R packages to consume and pr

Jun 15, 2022
Amazon Lumberyard is a free AAA game engine deeply integrated with AWS and Twitch – with full source.
Amazon Lumberyard is a free AAA game engine deeply integrated with AWS and Twitch – with full source.

Amazon Lumberyard Amazon Lumberyard is a free, AAA game engine that gives you the tools you need to create high quality games. Deeply integrated with

Jun 22, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

May 24, 2022
A sample project combining Epic Games' MetaHuman digital characters with Amazon Polly text-to-speech.
A sample project combining Epic Games' MetaHuman digital characters with Amazon Polly text-to-speech.

Amazon Polly & MetaHumans Sample Project A sample project combining Epic Games' MetaHuman digital characters with Amazon Polly text-to-speech. This Un

Jun 23, 2022
An easy to build CO2 Monitor/Meter with Android and iOS App for real time visualization and charting of air data, data logger, a variety of communication options (BLE, WIFI, MQTT, ESP-Now) and many supported sensors.
An easy to build CO2 Monitor/Meter with Android and iOS App for real time visualization and charting of air data, data logger, a variety of communication options (BLE, WIFI, MQTT, ESP-Now) and many supported sensors.

CO2-Gadget An easy to build CO2 Monitor/Meter with cell phone App for real time visualization and charting of air data, datalogger, a variety of commu

May 26, 2022