Powerful multi-threaded coroutine dispatcher and parallel execution engine

Quantum Library : A scalable C++ coroutine framework

Build status

Quantum is a full-featured and powerful C++ framework build on top of the Boost coroutine library. The framework allows users to dispatch units of work (a.k.a. tasks) as coroutines and execute them concurrently using the 'reactor' pattern.

Features

  • NEW Added support for simpler V2 coroutine API which returns computed values directly.
  • Header-only library and interface-based design.
  • Full integration with Boost asymmetric coroutine library.
  • Highly parallelized coroutine framework for CPU-bound workloads.
  • Support for long-running or blocking IO tasks.
  • Allows explicit and implicit cooperative yielding between coroutines.
  • Task continuations and coroutine chaining for serializing work execution.
  • Synchronous and asynchronous dispatching using futures and promises similar to STL.
  • Support for streaming futures which allows faster processing of large data sets.
  • Support for future references.
  • Cascading execution output during task continuations (a.k.a. past futures).
  • Task prioritization.
  • Internal error handling and exception forwarding.
  • Ability to write lock-free code by synchronizing coroutines on dedicated queues.
  • Coroutine-friendly mutexes and condition variables for locking critical code paths or synchronizing access to external objects.
  • Fast pre-allocated memory pools for internal objects and coroutines.
  • Parallel forEach and mapReduce functions.
  • Various stats API.
  • Sequencer class allowing strict FIFO ordering of tasks based on sequence ids.

Sample code

Quantum is very simple and easy to use:

using namespace Bloomberg::quantum;

// Define a coroutine
int getDummyValue(CoroContextPtr<int> ctx)
{
    int value;
    ...           //do some work
    ctx->yield(); //be nice and let other coroutines run (optional cooperation)
    ...           //do more work and calculate 'value'
    return ctx->set(value);
}

// Create a dispatcher
Dispatcher dispatcher;

// Dispatch a work item to do some work and return a value
int result = dispatcher.post(getDummyValue)->get();

Chaining tasks can also be straightforward. In this example we produce various types in a sequence.

using namespace Bloomberg::quantum;

// Create a dispatcher
Dispatcher dispatcher;

auto ctx = dispatcher.postFirst([](CoroContextPtr<int> ctx)->int {
    return ctx->set(55); //Set the 1st value
})->then([](CoroContextPtr<double> ctx)->int {
    // Get the first value and add something to it
    return ctx->set(ctx->getPrev<int>() + 22.33); //Set the 2nd value
})->then([](CoroContextPtr<std::string> ctx)->int {
    return ctx->set("Hello world!"); //Set the 3rd value
})->finally([](CoroContextPtr<std::list<int>> ctx)->int {
    return ctx->set(std::list<int>{1,2,3}); //Set 4th value
})->end();

int i = ctx->getAt<int>(0); //This will throw 'FutureAlreadyRetrievedException'
                            //since future was already read in the 2nd coroutine
double d = ctx->getAt<double>(1); //returns 77.33
std::string s = ctx->getAt<std::string>(2); //returns "Hello world!";
std::list<int>& listRef = ctx->getRefAt<std::list<int>>(3); //get list reference
std::list<int>& listRef2 = ctx->getRef(); //get another list reference.
                                          //The 'At' overload is optional for last chain future
std::list<int> listValue = ctx->get(); //get list value

Chaining with the new V2 api:

using namespace Bloomberg::quantum;

// Create a dispatcher
Dispatcher dispatcher;

auto ctx = dispatcher.postFirst([](VoidContextPtr ctx)->int {
    return 55; //Set the 1st value
})->then([](VoidContextPtr ctx)->double {
    // Get the first value and add something to it
    return ctx->getPrev<int>() + 22.33; //Set the 2nd value
})->then([](VoidContextPtr ctx)->std::string {
    return "Hello world!"; //Set the 3rd value
})->finally([](VoidContextPtr ctx)->std::list<int> {
    return {1,2,3}; //Set 4th value
})->end();

Building and installing

Quantum is a header-only library and as such no targets need to be built. To install simply run:

> cmake -Bbuild <options> .
> cd build
> make install

CMake options

Various CMake options can be used to configure the output:

  • QUANTUM_BUILD_DOC : Build Doxygen documentation. Default OFF.
  • QUANTUM_ENABLE_DOT : Enable generation of DOT viewer files. Default OFF.
  • QUANTUM_VERBOSE_MAKEFILE : Enable verbose cmake output. Default ON.
  • QUANTUM_ENABLE_TESTS : Builds the tests target. Default OFF.
  • QUANTUM_BOOST_STATIC_LIBS: Link with Boost static libraries. Default ON.
  • QUANTUM_BOOST_USE_MULTITHREADED : Use Boost multi-threaded libraries. Default ON.
  • QUANTUM_USE_DEFAULT_ALLOCATOR : Use default system supplied allocator instead of Quantum's. Default OFF.
  • QUANTUM_ALLOCATE_POOL_FROM_HEAP : Pre-allocates object pools from heap instead of the application stack. Default OFF.
  • QUANTUM_BOOST_USE_SEGMENTED_STACKS : Use Boost segmented stacks for coroutines. Default OFF.
  • QUANTUM_BOOST_USE_PROTECTED_STACKS : Use Boost protected stacks for coroutines (slow!). Default OFF.
  • QUANTUM_BOOST_USE_FIXEDSIZE_STACKS : Use Boost fixed size stacks for coroutines. Default OFF.
  • QUANTUM_INSTALL_ROOT : Specify custom install path. Default is /usr/local/include for Linux or c:/Program Files for Windows.
  • QUANTUM_PKGCONFIG_DIR : Specify custom install path for the quantum.pc file. Default is ${QUANTUM_INSTALL_ROOT}/share/pkgconfig. To specify a relative path from QUANTUM_INSTALL_ROOT, omit leading /.
  • QUANTUM_EXPORT_PKGCONFIG : Generate quantum.pc file. Default ON.
  • QUANTUM_CMAKE_CONFIG_DIR : Specify a different install directory for the project's config, target and version files. Default is ${QUANTUM_INSTALL_ROOT}/share/cmake.
  • QUANTUM_EXPORT_CMAKE_CONFIG : Generate CMake config, target and version files. Default ON.
  • BOOST_ROOT : Specify a different Boost install directory.
  • GTEST_ROOT : Specify a different GTest install directory.

Note: options must be preceded with -D when passed as arguments to CMake.

Running tests

Run the following from the top directory:

> cmake -Bbuild -DQUANTUM_ENABLE_TESTS=ON <options> .
> cd build
> make quantum_test && ctest

Using

To use the library simply include <quantum/quantum.h> in your application. Also, the following libraries must be included in the link:

  • boost_context
  • pthread

Quantum library is fully is compatible with C++11, C++14 and C++17 language features. See compiler options below for more details.

Compiler options

The following compiler options can be set when building your application:

  • __QUANTUM_PRINT_DEBUG : Prints debug and error information to stdout and stderr respectively.
  • __QUANTUM_USE_DEFAULT_ALLOCATOR : Disable pool allocation for internal objects (other than coroutine stacks) and use default system allocators instead.
  • __QUANTUM_ALLOCATE_POOL_FROM_HEAP : Pre-allocates object pools from heap instead of the application stack (default). This affects internal object allocations other than coroutines. Coroutine pools are always heap-allocated due to their size.
  • __QUANTUM_BOOST_USE_SEGMENTED_STACKS : Uses boost segmented stack for on-demand coroutine stack growth. Note that Boost.Context library must be built with property segmented-stacks=on and applying BOOST_USE_UCONTEXT and BOOST_USE_SEGMENTED_STACKS at b2/bjam command line.
  • __QUANTUM_BOOST_USE_PROTECTED_STACKS : Uses boost protected stack for runtime bound-checking. When using this option, coroutine creation (but not runtime efficiency) becomes more expensive.
  • __QUANTUM_BOOST_USE_FIXEDSIZE_STACKS : Uses boost fixed size stack. This defaults to system default allocator.

Application-wide settings

Various application-wide settings can be configured via ThreadTraits, AllocatorTraits and StackTraits.

Documentation

Please see the wiki page for a detailed overview of this library, use-case scenarios and examples.

For class description visit the API reference page.

Comments
  • Implemented experimental::Sequencer

    Implemented experimental::Sequencer

    Signed-off-by: Denis Mindolin [email protected]

    Describe your changes This PR introduces a lightweight Sequencer module (class quantum::experimental::Sequencer) whose functionality is similar to quantum::Sequencer but has the following differences w.r.t. quantum::Sequencer.

    • Ordering of pending tasks in quantum::Sequencer is done via quantum::Dispatcher. In particular, each tasks scheduled via quantum::Sequencer explicitly waits for its dependent tasks to finish. However while in the pending state, these tasks are still enqueued into quantum::Dispatcher. In the presence of a large number of dependent tasks, quantum::Dispatcher needs to keep traversing over those queues to find a tasks with no dependencies (i.e. ready to be executed). This often results in a scheduling lag (for the queue traversal) + wasted CPU cycles for checking the states of dependents. In contrast to that, quantum::experimental::Sequencer constructs a dependency graph outside of quantum::Dispatcher and schedules them via quantum::Dispatcher only when they have no pending dependents.

    Testing performed Unit tests added

    Additional context Add any other context about your contribution here.

  • Regarding Release

    Regarding Release

    This is not an issue. I didn't know where to raise my query. So, raising it here. We are planning to use the Quantum library for one of our products for high scalability and performance requirement. So, can you please let me know if there is any plan for a release which can be used for the product dev?

  • Rename DEPRECATED macro

    Rename DEPRECATED macro

    Signed-off-by: Adam M. Rosenzweig [email protected]

    Describe your changes Remove the DEPRECATED macro, which was causing conflicts in downstream libraries and/or applications which linked in quantum and also other code which referenced DEPRECATED (for other purposes).

    As quantum is not using these macros at all, I have chosen to remove them rather than rename them.

    Testing performed Verified that quantum builds, all tests still pass, and that all of our internal tasks & libraries that depend on quantum build successfully with the change.

  • Coro local storage

    Coro local storage

    Signed-off-by: Denis Mindolin [email protected]

    Describe your changes Adding the coroutine local storage feature. In multi-threaded applications, the thread local storage (TLS) is often useful to associate a variable with a thread and use it in multiple contexts in that thread without passing to functions explicitly. However, TLS won't work well with coroutines that yield because TSL variables from one coroutine may get exposed and corrupted by another coroutine running in the same thread (or a different thread, if shared coroutines are used). Hence the new CLS api (Coroutine Local Storage). It allows to associate with a coroutine any number of named variables of the std::shared_ptr type from the body of the coroutine; and those variables are only visible from the body of that coroutine only.

    The CLS feature is going to be particularly useful in our project for logging purposes to associate each log file line with a coroutine-id/message-id that it corresponds to.

    Testing performed Ran the existing tests and added a new test for CLS.

    Additional context Add any other context about your contribution here.

  • Shared any-coro-queue changes

    Shared any-coro-queue changes

    Signed-off-by: Denis Mindolin [email protected]

    Issue number of the reported bug or feature request: #

    Describe your changes Introducing the shared-any-coro-queue mode. When this feature is enabled, Dispatcher creates a dedicated Queue::QueueId::Any queue and a thread for processing tasks enqueued into it. All coroutine tasks enqueued with queueId = Queue::QueueId::Any will be pushed to that queue. Also each thread in the coroQueueIdRangeForAny range registers itself as a 'helper' for the Queue::QueueId::Any queue and executes tasks from the Queue::QueueId::Any as well as tasks from its own queue.

    Testing performed Ran the existing tests and added a performance test for this mode.

    Additional context Add any other context about your contribution here.

  • Update experimental::Sequencer JSON URI

    Update experimental::Sequencer JSON URI

    Signed-off-by: Adam M. Rosenzweig [email protected]

    Describe your changes Assign a new and unique JSON URI to the experimental::SequencerConfiguration, distinct from the regular SequencerConfiguration.

    Testing performed Verified the two changes match, and code still compiles.

    Additional context This will facilitate clean code in other repositories to parse a JSON document into the appropriate SequencerConfiguration object.

  • Provide local::context() to Mutex::Guard

    Provide local::context() to Mutex::Guard

    Signed-off-by: Adam M. Rosenzweig [email protected]

    Issue number of the reported bug or feature request: N/A

    Describe your changes @demin80 's new experimental::Sequencer cannot be called safely from inside a coroutine. Many of our internal use-cases in fact call into the Sequencer from within coroutines, but because the Mutex::Guard constructors that do not take a ICoroSync::Ptr assert(!local::context()), the task crashes. The change is to use the constructor which does take an ICoroSync::Ptr and populate it from local::context(); this is safe whether the Sequencer is invoked from within a coroutine or not.

    Testing performed After identifying the problem separately, I added a unit test (SequencerExperimentalTest, CoroSafety) and confirmed that this caused an assertion and crash of the test executable. Then I amended the Mutex::Guard objects to take in local::context() and re-ran the tests, which now all passed without a crash.

  • Provided move constructor and move assignment operator for SpinLock type.

    Provided move constructor and move assignment operator for SpinLock type.

    *Issue number of the reported bug or feature request: #150 *

    Describe your changes Provided move constructor and move assignment operator for SpinLock type. Since the copy constructor for atomics is deleted, we'll need to explicitly define the move constructor and assignment operator.

    Testing performed Was unable to recreate the reported bug in my environment as I don't have those versions of Linux distribution or the version of clang. All the current tests part of Quantum passed. From the reported error messages, which are very clearly pointing to the issue, I feel confident that this fix will resolve the issue. Please try this change and report if it fixes it.

    Additional context

  • Inefficient SpinLock implementation

    Inefficient SpinLock implementation

    Describe the bug SpinLock as it is implemented in Quantum kills neighbor HyperThread performance and consumes excessive power.

    Proposal Use PAUSE instruction (_mm_pause(); intrinsic) for short-duration lock. Use a back-off algorithm to gradually reduce bus traffic used for the atomic checks. Insert co-op yield point if it takes too long to acquire the lock.

    Additional context

    • The subject refers to this code: https://github.com/bloomberg/quantum/blob/master/quantum/impl/quantum_spinlock_impl.h#L33
    • Please refer to this blog: https://software.intel.com/en-us/articles/long-duration-spin-wait-loops-on-hyper-threading-technology-enabled-intel-processors and Intel Optimization Manual
  • Added access to the coroutine context from within any function call.

    Added access to the coroutine context from within any function call.

    Signed-off-by: Alexander Damian [email protected]

    Describe your changes

    • Add access to the coroutine context from within any function. This allows to yield, sleep or call any ICoroContext API (such as creating new coroutines) if the function is invoked from a coroutine.
    • changed inner namespace quantum::cls to quantum::local.
    • Added GenericFuture class. This facade util class can wrap any future type at runtime, when the execution context is unknown and can be determined via quantum::local::context(); Example:
    //We want to post a coroutine but this API can be called from either a regular 
    //thread or a coroutine. Since we can't block inside a coroutine, the future 
    //returned must be 'smart' and block only if in a thread.
    GenericFuture<int> someApi(...)
    {
        ... //business logic
        auto ctx = local::context(); //get coroutine context if any
        if (ctx) {
            //post via coroutine context
            return {ctx->post(...), ctx}; 
        }
        else {
            //post via dispatcher
            return dispatcher->post(...);
        }
    }
    
    // usage will be:
    int i = someApi(...).get(); //if we are inside a coroutine we won't block but yield instead
    

    Testing performed Created specific GTEST for this scenario.

  • cls variable improvements

    cls variable improvements

    Signed-off-by: Denis Mindolin [email protected]

    Describe your changes

    1. Got rid of the exception throwing code in cls::variable by using thread_local when the function is called outside of a coroutine
    2. Added cls::Guard for cls variable management.

    Testing performed Ran existing tests + added a test for cls::Guard.

    Additional context Add any other context about your contribution here.

  • Add coroutine state handler callback

    Add coroutine state handler callback

    Coroutine state handler support in API

    Describe your changes Added a coroutine state handler callback which can be used for acquiring/releasing resources of a coroutine between its executions(multiple yields). One of the cases requiring this callback is a synchronization of the coroutine local storage with the thread local storage.

    Testing performed A new test file was added with several coroutine state handlers:

    1. Empty handler
    2. Handler with exception
    3. Handler with coroutine local storage variables management

    Additional context

    1. Added Timer class in tests
    2. Fixed indentation
  • GCC 12 warning: Invalid memory model

    GCC 12 warning: Invalid memory model

    Describe the bug

    Compiler warning invalid-memory-model triggered on atomic_base.h.

    To Reproduce

    Compile with GCC 12.

    Expected behavior

    No warnings

    Environment (please complete the following information):

    • Operating System and Version: macOS 12.4, amd64, Bazel, Homebrew GCC 12.1.0_1, C++17 mode

    Logs

    In member function 'void std::__atomic_base<_IntTp>::store(__int_type, std::memory_order) [with _ITp = int]',
        inlined from 'void Bloomberg::quantum::Task::SuspensionGuard::set(int)' at external/com_bloomberg_quantum/quantum/quantum_task.h:131:34:
    /usr/local/Cellar/[email protected]/12.1.0_1/bin/../lib/gcc/12/gcc/x86_64-apple-darwin21/12/../../../../../../include/c++/12/bits/atomic_base.h:464:25: warning: invalid memory model 'memory_order_acq_rel' for 'void __atomic_store_4(volatile void*, unsigned int, int)' [-Winvalid-memory-model]
      464 |         __atomic_store_n(&_M_i, __i, int(__m));
          |         ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
    /usr/local/Cellar/[email protected]/12.1.0_1/bin/../lib/gcc/12/gcc/x86_64-apple-darwin21/12/../../../../../../include/c++/12/bits/atomic_base.h:464:25: note: valid models are 'memory_order_relaxed', 'memory_order_seq_cst', 'memory_order_release'
    
  • Only use cpu_set_t on Linux

    Only use cpu_set_t on Linux

    Issue number of the reported bug or feature request: #

    Describe your changes

    Makes TaskQueue::pinToCore a no-op on platforms that don't support cpu_set_t (i.e. non-Linux)

    Testing performed

    Compiled quantum on macOS.

    Additional context

    In the future, TaskQueue::pinToCore should be implemented on other platforms to improve performance.

  • Removed used of deprecated forced_unwind

    Removed used of deprecated forced_unwind

    Issue number of the reported bug or feature request: #157

    Describe your changes It replaces the (removed) used of boost::coroutine2::detail::forced_unwind in a catch with boost::context

    Testing performed Just a basic compilation.

    Additional context Details described in issue #157

  • Use of removed boost::coroutine2::detail::forced_unwind struct

    Use of removed boost::coroutine2::detail::forced_unwind struct

    Since boost 1.74, boost::coroutine2::detail::forced_unwind no longer exists . Since this is already mentioned in quantum itself this is just an omission.

    It is also mentioned as an issue in https://github.com/boostorg/coroutine2/issues/23

    To Reproduce Just build with boost >=1.74.

  • Fixed the move semantics related and other issues reported by the newer clang compiler.

    Fixed the move semantics related and other issues reported by the newer clang compiler.

    *Issue number of the reported bug or feature request: #150 *

    Describe your changes Fixed all of the move semantics related and other issues reported by the newer clang compiler.

    Testing performed Was able to run all the included tests successfully which tests the whole fixture and primitives.

    Additional context

Coroutine - C++11 single .h asymmetric coroutine implementation via ucontext / fiber

C++11 single .h asymmetric coroutine implementation API in namespace coroutine: routine_t create(std::function<void()> f); void destroy(routine_t id);

Dec 20, 2022
C++-based high-performance parallel environment execution engine for general RL environments.
C++-based high-performance parallel environment execution engine for general RL environments.

EnvPool is a highly parallel reinforcement learning environment execution engine which significantly outperforms existing environment executors. With

Dec 30, 2022
Termite-jobs - Fast, multiplatform fiber based job dispatcher based on Naughty Dogs' GDC2015 talk.

NOTE This library is obsolete and may contain bugs. For maintained version checkout sx library. until I rip it from there and make a proper single-hea

Jan 9, 2022
lc is a fast multi-threaded line counter.
lc is a fast multi-threaded line counter.

Fast multi-threaded line counter in Modern C++ (2-10x faster than `wc -l` for large files)

Oct 25, 2022
Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction

Kokkos: Core Libraries Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platfor

Jan 5, 2023
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

Dec 30, 2022
KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software
KRATOS Multiphysics (

KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software, aiming at modularity, extensibility, and high performance. Kratos is written in C++, and counts with an extensive Python interface.

Dec 29, 2022
A golang-style C++ coroutine library and more.

CO is an elegant and efficient C++ base library that supports Linux, Windows and Mac platforms. It pursues minimalism and efficiency, and does not rely on third-party library such as boost.

Jan 5, 2023
A go-style coroutine library in C++11 and more.
A go-style coroutine library in C++11 and more.

cocoyaxi English | 简体中文 A go-style coroutine library in C++11 and more. 0. Introduction cocoyaxi (co for short), is an elegant and efficient cross-pla

Dec 27, 2022
:copyright: Concurrent Programming Library (Coroutine) for C11

libconcurrent tiny asymmetric-coroutine library. Description asymmetric-coroutine bidirectional communication by yield_value/resume_value native conte

Sep 2, 2022
Single header asymmetric stackful cross-platform coroutine library in pure C.
Single header asymmetric stackful cross-platform coroutine library in pure C.

minicoro Minicoro is single-file library for using asymmetric coroutines in C. The API is inspired by Lua coroutines but with C use in mind. The proje

Dec 29, 2022
A C++20 coroutine library based off asyncio
A C++20 coroutine library based off asyncio

kuro A C++20 coroutine library, somewhat modelled on Python's asyncio Requirements Kuro requires a C++20 compliant compiler and a Linux OS. Tested on

Nov 9, 2022
C++20 Coroutine-Based Synchronous Parser Combinator Library

This library contains a monadic parser type and associated combinators that can be composed to create parsers using C++20 Coroutines.

Dec 17, 2022
Async GRPC with C++20 coroutine support

agrpc Build an elegant GRPC async interface with C++20 coroutine and libunifex (target for C++23 executor). Get started mkdir build && cd build conan

Dec 21, 2022
Cppcoro - A library of C++ coroutine abstractions for the coroutines TS

CppCoro - A coroutine library for C++ The 'cppcoro' library provides a large set of general-purpose primitives for making use of the coroutines TS pro

Dec 30, 2022
Mx - C++ coroutine await, yield, channels, i/o events (single header + link to boost)

mx C++11 coroutine await, yield, channels, i/o events (single header + link to boost). This was originally part of my c++ util library kit, but I'm se

Sep 21, 2019
Elle - The Elle coroutine-based asynchronous C++ development framework.
Elle - The Elle coroutine-based asynchronous C++ development framework.

Elle, the coroutine-based asynchronous C++ development framework Elle is a collection of libraries, written in modern C++ (C++14). It contains a rich

Jan 1, 2023
C++14 coroutine-based task library for games

SquidTasks Squid::Tasks is a header-only C++14 coroutine-based task library for games. Full project and source code available at https://github.com/we

Nov 30, 2022
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

moodycamel::ConcurrentQueue An industrial-strength lock-free queue for C++. Note: If all you need is a single-producer, single-consumer queue, I have

Jan 3, 2023