A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib

This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependencies are represented as atomic counters.

Under the covers, the task graph is executed using fibers, which in turn, are run on a pool of worker threads (one thread per CPU core). This allows the scheduler to wait on dependencies without task chaining or context switches.

This library was created as a proof of concept of the ideas presented by Christian Gyrling in his 2015 GDC Talk 'Parallelizing the Naughty Dog Engine Using Fibers'


Example

#include "ftl/task_counter.h"
#include "ftl/task_scheduler.h"

#include <assert.h>
#include <stdint.h>

struct NumberSubset {
	uint64_t start;
	uint64_t end;

	uint64_t total;
};

void AddNumberSubset(ftl::TaskScheduler *taskScheduler, void *arg) {
	(void)taskScheduler;
	NumberSubset *subset = reinterpret_cast<NumberSubset *>(arg);

	subset->total = 0;

	while (subset->start != subset->end) {
		subset->total += subset->start;
		++subset->start;
	}

	subset->total += subset->end;
}

/**
 * Calculates the value of a triangle number by dividing the additions up into tasks
 *
 * A triangle number is defined as:
 *         Tn = 1 + 2 + 3 + ... + n
 *
 * The code is checked against the numerical solution which is:
 *         Tn = n * (n + 1) / 2
 */
int main() {
	// Create the task scheduler and bind the main thread to it
	ftl::TaskScheduler taskScheduler;
	taskScheduler.Init();

	// Define the constants to test
	constexpr uint64_t triangleNum = 47593243ULL;
	constexpr uint64_t numAdditionsPerTask = 10000ULL;
	constexpr uint64_t numTasks = (triangleNum + numAdditionsPerTask - 1ULL) / numAdditionsPerTask;

	// Create the tasks
	// FTL allows you to create Tasks on the stack.
	// However, in this case, that would cause a stack overflow
	ftl::Task *tasks = new ftl::Task[numTasks];
	NumberSubset *subsets = new NumberSubset[numTasks];
	uint64_t nextNumber = 1ULL;

	for (uint64_t i = 0ULL; i < numTasks; ++i) {
		NumberSubset *subset = &subsets[i];

		subset->start = nextNumber;
		subset->end = nextNumber + numAdditionsPerTask - 1ULL;
		if (subset->end > triangleNum) {
			subset->end = triangleNum;
		}

		tasks[i] = {AddNumberSubset, subset};

		nextNumber = subset->end + 1;
	}

	// Schedule the tasks
	ftl::TaskCounter counter(&taskScheduler);
	taskScheduler.AddTasks(numTasks, tasks, ftl::TaskPriority::High, &counter);

	// FTL creates its own copies of the tasks, so we can safely delete the memory
	delete[] tasks;

	// Wait for the tasks to complete
	taskScheduler.WaitForCounter(&counter);

	// Add the results
	uint64_t result = 0ULL;
	for (uint64_t i = 0; i < numTasks; ++i) {
		result += subsets[i].total;
	}

	// Test
	assert(triangleNum * (triangleNum + 1ULL) / 2ULL == result);

	// Cleanup
	delete[] subsets;

	// The destructor of TaskScheduler will shut down all the worker threads
	// and unbind the main thread

	return 0;
}


Automatic Test Matrix

Windows

Linux

OSX

Visual Studio 2017

Windows VC++ 2017 build status

GCC 4.9

Linux GCC 4.9 build status

GCC 4.9

OSX GCC 4.9 build status

Visual Studio 2019

Windows VC++ 2019 build status

GCC 9

Linux GCC 9 build status

GCC 9

OSX GCC 9 build status

Clang 3.7

Linux Clang 3.7 build status

Clang 9

Linux Clang 9 build status

Clang 9

OSX Clang 9 build status


How it works

Honestly, the best explanation is to watch Christian Gyrling’s talk. It’s free to watch (as of the time of writing) from the GDC vault. His explaination of fibers as well as how they used the fiber system in their game engine is excellent. However, I will try to give a TL;DR; version here.

What are fibers

A fiber consists of a stack and a small storage space for registers. It’s a very lightweight execution context that runs inside a thread. You can think of it as a shell of an actual thread.

Why go though the hassle though? What’s the benefit?

The beauty of fibers is that you can switch between them extremely quickly. Ultimately, a switch consists of saving out registers, then swapping the execution pointer and the stack pointer. This is much much faster than a full-on thread context switch.

How do fibers apply to task-based multithreading?

To answer this question, let’s compare to another task-based multithreading library: Intel’s Threading Building Blocks. TBB is an extremely well polished and successful tasking library. It can handle really complex task graphs and has an excellent scheduler. However, let’s imagine a scenario:

  1. Task A creates Tasks B, C, and D and sends them to the scheduler

  2. Task A does some other work, but then it hits the dependency: B, C, and D must be finished.

  3. If they aren’t finished, we can do 2 things:

    1. Spin-wait / Sleep

    2. Ask the scheduler for a new task and start executing that

  4. Let’s take path b

  5. So the scheduler gives us Task G and we start executing

  6. But Task G ends up needing a dependency as well, so we ask the scheduler for another new task

  7. And another, and another

  8. In the meantime, Tasks B, C, and D have completed

  9. Task A could theoretically be continued, but it’s buried in the stack under the tasks that we got while we were waiting

  10. The only way we can resume A is to wait for the entire chain to unravel back to it, or suffer a context switch.

Now, obviously, this is a contrived example. And as I said above, TBB has an awesome scheduler that works hard to alleviate this problem. That said, fibers can help to eliminate the problem altogether by allowing cheap switching between tasks. This allows us to isolate the execution of one task from another, preventing the 'chaining' effect described above.


The Architecture from 10,000 ft

(Christian has some great illustrations on pages 8 - 17 of his slides that help explain the flow of fibers and tasks. I suggest looking at those while you’re reading)

Task Queue - An 'ordinary' queue for holding the tasks that are waiting to be executed. In the current code, there is only one queue. However, a more sophisticated system might have multiple queues with varying priorities.

Fiber Pool - A pool of fibers used for switching to new tasks while the current task is waiting on a dependency. Fibers execute the tasks

Worker Threads - 1 per logical CPU core. These run the fibers.

Waiting Tasks - A list of the tasks that are waiting for a dependency to be fufilled. Dependencies are represented with atomic counters

Tasks can be created on the stack. They’re just a simple struct with a function pointer and an optional void *arg to be passed to the function:

struct Task {
    TaskFunction Function;
    void *ArgData;
};
Task tasks[10];
for (uint i = 0; i < 10; ++i) {
    tasks[i] = {MyFunctionPointer, myFunctionArg};
}

You schedule a task for execution by calling TaskScheduler::AddTasks()

ftl::TaskCounter counter(taskScheduler);
taskScheduler->AddTasks(10, tasks, ftl::TaskPriority::High, &counter);

The tasks get added to the queue, and other threads (or the current one, when it is finished with the current task) can start executing them when they get popped off the queue.

AddTasks can optionally take a pointer to a TaskCounter. If you do, the value of the counter will incremented by the number of tasks queued. Every time a task finishes, the counter will be atomically decremented. You can use this functionality to create depencendies between tasks. You do that with the function

void TaskScheduler::WaitForCounter(TaskCounter *counter);

This is where fibers come into play. If the counter == 0, the function trivially returns. If not, the scheduler will move the current fiber into the Waiting Tasks list and grab a new fiber from the Fiber Pool. The new fiber pops a task from the Task Queue and starts execution with that.

But what about the task we stored in Waiting Tasks? When will it finish being executed?

When the TaskCounter hit zero from decrements, we add all the waiting fibers to the Ready Fibers list in the TaskScheduler. Before a fiber tries to pop a task off the Task Queue, it checks if there are any Ready Fibers. If so, it will return itself to the Fiber Pool and switch to the fiber that is ready. The ready fiber will continue execution right where it left off


Advanced Features

FullAtomicCounter

TaskCounters are implemented with an internal atomic counter. However, access to this atomic counter is protected from the user for performance and algorithmic simplicity reasons. That said, it can be useful to be able to use WaitForCounter on something non task-related. That’s where FullAtomicCounter comes in.

FullAtomicCounter has member functions correlaries for all the "regular" atomic functions (load, store, fetch_add, etc). Each time they’re called, we check all waiting fibers if they’re equal to their target value. In comparison, TaskCounter only checks when the final value is zero. Therefore, FullAtomicCounter has more overhead than TaskCounter, but much greater flexibility

Fibtex

Generally, you shouldn’t use Mutexes in fiber code, for two reasons:

  1. If you take a mutex, and call WaitForCounter(), when WaitForCounter resumes, your code could be on another thread. The mutex unlock will be undefined behavior, and probably lead to a deadlock

  2. Mutex contention will block the worker threads. And since we generally don’t oversubscribe the threads to the cores, this leaves cores idle.

To solve this, we created Fibtex. It implements the std lockable interface, so you can use it with all your favorite wrappers (std::lock_guard, std::unique_lock, etc.) It’s implemented behind the scenes with a TaskCounter, so if a Fibtex is locked, a waiter can switch to another task and do valuable work


Dependencies

  • C++11 Compiler

  • CMake 3.2 or greater


Supported Platforms

Arch

Windows

Linux

OS X

iOS

Android

arm

Needs testing

Tested OK

In theory

In theory

arm_64

Needs testing

Tested OK

In theory

In theory

x86

Tested OK

Needs testing

Needs testing

In theory

x86_64

Tested OK

Tested OK

Tested OK

In theory

ppc

In theory

ppc_64

In theory


Building

FiberTaskingLib is a standard CMake build. However, for detailed instructions on how to build and include the library in your own project, see the documentation page.


License

The library is licensed under the Apache 2.0 license. However, FiberTaskingLib distributes and uses code from other Open Source Projects that have their own licenses:


Contributing

Contributions are very welcome. See the contributing page for more details.


Request for Criticism

This implementation was something I created because I thought Christian’s presentation was really interesting and I wanted to explore it myself. The code is still a work in progress and I would love to hear your critiques of how I could make it better. I will continue to work on this project and improve it as best as possible.

Comments
  • Boost-Context based builds are broken

    Boost-Context based builds are broken

    Functional tests all seg-fault. The unit tests have managed to fix a number of other erros. However, this issue still remains.

    Issue #13 shows that the seg-faults go away if we execute the tests in a specific order. This could mean that the shutdown code is missing something

  • task affinity and the main task

    task affinity and the main task

    I was wondering on how you'd imaging a game engine to use this.

    You have the concept of a Main Task which seems to land on arbitrary threads, just like all other task. Can you explain your thoughts on that?

    I tried to init GLFW once inside that main task and that caused it to crash, even in moments when it was on the main thread. Now I probably don't want to do system calls inside tasks that run on fibers. However, from using the library it is not clear whether the context switching (register and stack) is robust enough.

    1. What CAN run inside these?
    2. can you outline a pseudo main game loop using this (with worst case, OpenGL in use).
  • EmptyQueueBehavior::Sleep on Mac OS X never wakes

    EmptyQueueBehavior::Sleep on Mac OS X never wakes

    Since I'm running a game main loop which creates tasks that update game engine systems, if the loop is constrained by vsync then there will be gaps where no tasks exist. Because of this I wanted to define the empty queue behavior to either sleep the threads OTHER than the thread processing the main loop until the main loop creates more tasks or yield to the os. Using the new_empty_queue_behavior branch.

    Using Mac OS X, EmptyQueueBehavior::Yield doesn't seem to effect thread usage at all, all threads remain at 100% except the one running the main loop which sleeps a bit after swapping the back buffer if vsync is enabled. EmptyQueueBehavior::Sleep seems to halt the program entirely. It doesn't even make it through one full iteration of the main loop. I suspect the condition variable is never waking the threads back up and when the main task does taskScheduler->WaitForCounter(...); it just halts forever and never returns.

  • Segmentation fault in test for Calculate Triangle Number on Mac OS X (Sierra)

    Segmentation fault in test for Calculate Triangle Number on Mac OS X (Sierra)

    I can prevent (or hide) the segmentation fault by moving the code that deletes the task array in TriangleNumberMainTask() so that the delete occurs after the WaitForCounter() call.

    This does not make sense to me as it seems we should be able to delete the array immediately after calling AddTasks().

    This occurs with commit [dd05035] using make (not Xcode). I will test on Windows 10 momentarily.

    EDIT: This does not occur for me using VC2015 (amd_x64) in Release and Debug builds. Just an OS X issue?

  • Add ftl::ThreadLocal to emulate thread_local keyword

    Add ftl::ThreadLocal to emulate thread_local keyword

    As title explains, this provides a nice little wrapper to allow for thread local storage. Acts like the keyword so is non-movable and non-copyable.

    Usage:

    static ftl::ThreadLocal<std::vector<int>> local_int(taskScheduler);
    // Finds the true variable every time
    *local_int = {2};
    assert(local_int->size() == 1);
    
    // Handle finds the true variable once
    auto handle = local_int.GetHandle();
    *handle = {3};
    assert(handle->size() == 1)
    
    taskScheduler->WaitForCounter(...)
    // Don't do this, handle will not update thread after wait. Thread unsafe and might update the wrong one.
    handle->emplace_back(3);
    
    // This is fine
    local_int->emplace_back(4);
    
  • Crash on Thread Join on Ubuntu 16.04 and 18.04

    Crash on Thread Join on Ubuntu 16.04 and 18.04

    Alright, so this is a bit of a heisenbug. I put the test code in this repo so that you can test it (should compile on all desktop platforms fine).

    On Ubuntu 16.04 and 18.04 on both gcc-5 and clang-6.0 there is a crash somewhere within jump_fcontext. The asm says the segfault is on the first line. pthread_join is always on the callstack on the main thread. This manifests as a segmentation fault if you are running it under gdb, but just causes the program to spin indefinitely if run outside of the debugger.

    There's unfortunately not much more information I can provide. You can run the preprocessor on it to remove all the preprocessor magic, and if it helpful I can provide a cpp file with the preprocessesing of the crazy macros already done. However, I highly doubt it's a macro problem, as it is just doing what you would normally do without macros.

    If you need any more information, let me know. Thanks for this lovely library

  • Heavy synchronization activity

    Heavy synchronization activity

    Hi there,

    I've been trying different approaches to speed up my nasty N^2 algorithm in the core of my engine/server. Without any threading: ~44ms, with OpenMP: ~14.5ms, IntelTBB: ~8ms, FiberTaskingLib: ~14ms.

    I did a bit of profiling of this with VTune, and found that a significant portion of time was spent in synchronization.

    image

    image

    What are your thoughts on this?

  • Tests fail thread sanitizer.

    Tests fail thread sanitizer.

    Tried running the tests using the thread sanitizer and they failed in the calc triangles test.

    However I don't know if these are false positives, or a genuine race condition.

    1. edited root CMakeLists.txt set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -fsanitize=thread -fno-omit-frame-pointer")

    2. Edited TaskScheduler::Initialize to limit to two threads. m_numThreads = 2;//RAM: FTLGetNumHardwareThreads();

    3. created a 64 bit clang debug build (clang 3.6.2) CC=clang CXX=clang++ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug"

    4. ran the test test/FiberTaskingLib-test

    Find shown below the output from the thread sanitizer's first two conflicts:

    WARNING: ThreadSanitizer: data race (pid=17941)
      Read of size 8 at 0x7d6c0001f640 by thread T2:
        #0 std::_Hashtable<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_bucket_index(int const&, unsigned long) const /usr/include/c++/5.2.0/bits/hashtable.h:621 (FiberTaskingLib-test+0x0000004f25ec)
        #1 std::__detail::_Map_base<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](int&&) /usr/include/c++/5.2.0/bits/hashtable_policy.h:617 (FiberTaskingLib-test+0x0000004efd8a)
        #2 std::unordered_map<int, FiberTaskingLib::Fiber*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> > >::operator[](int&&) /usr/include/c++/5.2.0/bits/unordered_map.h:672 (FiberTaskingLib-test+0x0000004edfd2)
        #3 FiberTaskingLib::TaskScheduler::ThreadStart(void*) ../source/fiber_tasking_lib/task_scheduler.cpp:38 (FiberTaskingLib-test+0x0000004eab98)
    
      Previous write of size 8 at 0x7d6c0001f640 by thread T1:
        [failed to restore the stack]
    
      Location is heap block of size 1688 at 0x7d6c0001f100 allocated by main thread:
        #0 operator new(unsigned long) /build/gcc-multilib/src/gcc-5.2.0/libsanitizer/tsan/tsan_interceptors.cc:571 (libtsan.so.0+0x000000025ef3)
        #1 FunctionalTests_CalcTriangleNum_Test::TestBody() ../test/calc_triangle_num.cpp:51 (FiberTaskingLib-test+0x00000048d14a)
        #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004cb4fa)
        #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c2ad0)
        #4 testing::Test::Run() ../libs/gtest/src/gtest.cc:2437 (FiberTaskingLib-test+0x0000004a0f19)
        #5 testing::TestInfo::Run() ../libs/gtest/src/gtest.cc:2612 (FiberTaskingLib-test+0x0000004a18fd)
        #6 testing::TestCase::Run() ../libs/gtest/src/gtest.cc:2730 (FiberTaskingLib-test+0x0000004a2212)
        #7 testing::internal::UnitTestImpl::RunAllTests() ../libs/gtest/src/gtest.cc:4604 (FiberTaskingLib-test+0x0000004aa5b1)
        #8 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004ccc71)
        #9 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c401e)
        #10 testing::UnitTest::Run() ../libs/gtest/src/gtest.cc:4222 (FiberTaskingLib-test+0x0000004a8c56)
        #11 RUN_ALL_TESTS() ../libs/gtest/include/gtest/gtest.h:2326 (FiberTaskingLib-test+0x00000048e752)
        #12 main ../test/main.cpp:16 (FiberTaskingLib-test+0x00000048e671)
    
      Thread T2 (tid=17944, running) created by main thread at:
        #0 pthread_create /build/gcc-multilib/src/gcc-5.2.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000027a37)
        #1 FiberTaskingLib::FTLCreateThread(unsigned long*, unsigned int, void* (*)(void*), void*, unsigned long) ../source/fiber_tasking_lib/thread_abstraction.h:204 (FiberTaskingLib-test+0x0000004ec3a4)
        #2 FiberTaskingLib::TaskScheduler::Initialize(unsigned int, FiberTaskingLib::GlobalArgs*) ../source/fiber_tasking_lib/task_scheduler.cpp:186 (FiberTaskingLib-test+0x0000004eb841)
        #3 FunctionalTests_CalcTriangleNum_Test::TestBody() ../test/calc_triangle_num.cpp:52 (FiberTaskingLib-test+0x00000048d16e)
        #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004cb4fa)
        #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c2ad0)
        #6 testing::Test::Run() ../libs/gtest/src/gtest.cc:2437 (FiberTaskingLib-test+0x0000004a0f19)
        #7 testing::TestInfo::Run() ../libs/gtest/src/gtest.cc:2612 (FiberTaskingLib-test+0x0000004a18fd)
        #8 testing::TestCase::Run() ../libs/gtest/src/gtest.cc:2730 (FiberTaskingLib-test+0x0000004a2212)
        #9 testing::internal::UnitTestImpl::RunAllTests() ../libs/gtest/src/gtest.cc:4604 (FiberTaskingLib-test+0x0000004aa5b1)
        #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004ccc71)
        #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c401e)
        #12 testing::UnitTest::Run() ../libs/gtest/src/gtest.cc:4222 (FiberTaskingLib-test+0x0000004a8c56)
        #13 RUN_ALL_TESTS() ../libs/gtest/include/gtest/gtest.h:2326 (FiberTaskingLib-test+0x00000048e752)
        #14 main ../test/main.cpp:16 (FiberTaskingLib-test+0x00000048e671)
    
      Thread T1 (tid=17943, running) created by main thread at:
        #0 pthread_create /build/gcc-multilib/src/gcc-5.2.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000027a37)
        #1 FiberTaskingLib::FTLCreateThread(unsigned long*, unsigned int, void* (*)(void*), void*, unsigned long) ../source/fiber_tasking_lib/thread_abstraction.h:204 (FiberTaskingLib-test+0x0000004ec3a4)
        #2 FiberTaskingLib::TaskScheduler::Initialize(unsigned int, FiberTaskingLib::GlobalArgs*) ../source/fiber_tasking_lib/task_scheduler.cpp:186 (FiberTaskingLib-test+0x0000004eb841)
        #3 FunctionalTests_CalcTriangleNum_Test::TestBody() ../test/calc_triangle_num.cpp:52 (FiberTaskingLib-test+0x00000048d16e)
        #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004cb4fa)
        #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c2ad0)
        #6 testing::Test::Run() ../libs/gtest/src/gtest.cc:2437 (FiberTaskingLib-test+0x0000004a0f19)
        #7 testing::TestInfo::Run() ../libs/gtest/src/gtest.cc:2612 (FiberTaskingLib-test+0x0000004a18fd)
        #8 testing::TestCase::Run() ../libs/gtest/src/gtest.cc:2730 (FiberTaskingLib-test+0x0000004a2212)
        #9 testing::internal::UnitTestImpl::RunAllTests() ../libs/gtest/src/gtest.cc:4604 (FiberTaskingLib-test+0x0000004aa5b1)
        #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004ccc71)
        #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c401e)
        #12 testing::UnitTest::Run() ../libs/gtest/src/gtest.cc:4222 (FiberTaskingLib-test+0x0000004a8c56)
        #13 RUN_ALL_TESTS() ../libs/gtest/include/gtest/gtest.h:2326 (FiberTaskingLib-test+0x00000048e752)
        #14 main ../test/main.cpp:16 (FiberTaskingLib-test+0x00000048e671)
    
    SUMMARY: ThreadSanitizer: data race /usr/include/c++/5.2.0/bits/hashtable.h:621 std::_Hashtable<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_bucket_index(int const&, unsigned long) const
    ==================
    ==================
    WARNING: ThreadSanitizer: data race (pid=17941)
      Read of size 8 at 0x7d6c0001f638 by thread T3:
        #0 std::_Hashtable<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node(unsigned long, int const&, unsigned long) const <null> (FiberTaskingLib-test+0x0000004f5a72)
        #1 std::_Hashtable<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_node(unsigned long, int const&, unsigned long) const /usr/include/c++/5.2.0/bits/hashtable.h:632 (FiberTaskingLib-test+0x0000004f266c)
        #2 std::__detail::_Map_base<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](int&&) /usr/include/c++/5.2.0/bits/hashtable_policy.h:618 (FiberTaskingLib-test+0x0000004efda6)
        #3 std::unordered_map<int, FiberTaskingLib::Fiber*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> > >::operator[](int&&) /usr/include/c++/5.2.0/bits/unordered_map.h:672 (FiberTaskingLib-test+0x0000004edfd2)
        #4 FiberTaskingLib::TaskScheduler::ThreadStart(void*) ../source/fiber_tasking_lib/task_scheduler.cpp:38 (FiberTaskingLib-test+0x0000004eab98)
    
      Previous write of size 8 at 0x7d6c0001f638 by thread T1:
        [failed to restore the stack]
    
      Location is heap block of size 1688 at 0x7d6c0001f100 allocated by main thread:
        #0 operator new(unsigned long) /build/gcc-multilib/src/gcc-5.2.0/libsanitizer/tsan/tsan_interceptors.cc:571 (libtsan.so.0+0x000000025ef3)
        #1 FunctionalTests_CalcTriangleNum_Test::TestBody() ../test/calc_triangle_num.cpp:51 (FiberTaskingLib-test+0x00000048d14a)
        #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004cb4fa)
        #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c2ad0)
        #4 testing::Test::Run() ../libs/gtest/src/gtest.cc:2437 (FiberTaskingLib-test+0x0000004a0f19)
        #5 testing::TestInfo::Run() ../libs/gtest/src/gtest.cc:2612 (FiberTaskingLib-test+0x0000004a18fd)
        #6 testing::TestCase::Run() ../libs/gtest/src/gtest.cc:2730 (FiberTaskingLib-test+0x0000004a2212)
        #7 testing::internal::UnitTestImpl::RunAllTests() ../libs/gtest/src/gtest.cc:4604 (FiberTaskingLib-test+0x0000004aa5b1)
        #8 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004ccc71)
        #9 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c401e)
        #10 testing::UnitTest::Run() ../libs/gtest/src/gtest.cc:4222 (FiberTaskingLib-test+0x0000004a8c56)
        #11 RUN_ALL_TESTS() ../libs/gtest/include/gtest/gtest.h:2326 (FiberTaskingLib-test+0x00000048e752)
        #12 main ../test/main.cpp:16 (FiberTaskingLib-test+0x00000048e671)
    
      Thread T3 (tid=17945, running) created by main thread at:
        #0 pthread_create /build/gcc-multilib/src/gcc-5.2.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000027a37)
        #1 FiberTaskingLib::FTLCreateThread(unsigned long*, unsigned int, void* (*)(void*), void*, unsigned long) ../source/fiber_tasking_lib/thread_abstraction.h:204 (FiberTaskingLib-test+0x0000004ec3a4)
        #2 FiberTaskingLib::TaskScheduler::Initialize(unsigned int, FiberTaskingLib::GlobalArgs*) ../source/fiber_tasking_lib/task_scheduler.cpp:186 (FiberTaskingLib-test+0x0000004eb841)
        #3 FunctionalTests_CalcTriangleNum_Test::TestBody() ../test/calc_triangle_num.cpp:52 (FiberTaskingLib-test+0x00000048d16e)
        #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004cb4fa)
        #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c2ad0)
        #6 testing::Test::Run() ../libs/gtest/src/gtest.cc:2437 (FiberTaskingLib-test+0x0000004a0f19)
        #7 testing::TestInfo::Run() ../libs/gtest/src/gtest.cc:2612 (FiberTaskingLib-test+0x0000004a18fd)
        #8 testing::TestCase::Run() ../libs/gtest/src/gtest.cc:2730 (FiberTaskingLib-test+0x0000004a2212)
        #9 testing::internal::UnitTestImpl::RunAllTests() ../libs/gtest/src/gtest.cc:4604 (FiberTaskingLib-test+0x0000004aa5b1)
        #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004ccc71)
        #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c401e)
        #12 testing::UnitTest::Run() ../libs/gtest/src/gtest.cc:4222 (FiberTaskingLib-test+0x0000004a8c56)
        #13 RUN_ALL_TESTS() ../libs/gtest/include/gtest/gtest.h:2326 (FiberTaskingLib-test+0x00000048e752)
        #14 main ../test/main.cpp:16 (FiberTaskingLib-test+0x00000048e671)
    
      Thread T1 (tid=17943, running) created by main thread at:
        #0 pthread_create /build/gcc-multilib/src/gcc-5.2.0/libsanitizer/tsan/tsan_interceptors.cc:895 (libtsan.so.0+0x000000027a37)
        #1 FiberTaskingLib::FTLCreateThread(unsigned long*, unsigned int, void* (*)(void*), void*, unsigned long) ../source/fiber_tasking_lib/thread_abstraction.h:204 (FiberTaskingLib-test+0x0000004ec3a4)
        #2 FiberTaskingLib::TaskScheduler::Initialize(unsigned int, FiberTaskingLib::GlobalArgs*) ../source/fiber_tasking_lib/task_scheduler.cpp:186 (FiberTaskingLib-test+0x0000004eb841)
        #3 FunctionalTests_CalcTriangleNum_Test::TestBody() ../test/calc_triangle_num.cpp:52 (FiberTaskingLib-test+0x00000048d16e)
        #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004cb4fa)
        #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c2ad0)
        #6 testing::Test::Run() ../libs/gtest/src/gtest.cc:2437 (FiberTaskingLib-test+0x0000004a0f19)
        #7 testing::TestInfo::Run() ../libs/gtest/src/gtest.cc:2612 (FiberTaskingLib-test+0x0000004a18fd)
        #8 testing::TestCase::Run() ../libs/gtest/src/gtest.cc:2730 (FiberTaskingLib-test+0x0000004a2212)
        #9 testing::internal::UnitTestImpl::RunAllTests() ../libs/gtest/src/gtest.cc:4604 (FiberTaskingLib-test+0x0000004aa5b1)
        #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004ccc71)
        #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) <null> (FiberTaskingLib-test+0x0000004c401e)
        #12 testing::UnitTest::Run() ../libs/gtest/src/gtest.cc:4222 (FiberTaskingLib-test+0x0000004a8c56)
        #13 RUN_ALL_TESTS() ../libs/gtest/include/gtest/gtest.h:2326 (FiberTaskingLib-test+0x00000048e752)
        #14 main ../test/main.cpp:16 (FiberTaskingLib-test+0x00000048e671)
    
    SUMMARY: ThreadSanitizer: data race ??:0 std::_Hashtable<int, std::pair<int const, FiberTaskingLib::Fiber*>, std::allocator<std::pair<int const, FiberTaskingLib::Fiber*> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node(unsigned long, int const&, unsigned long) const
    ==================
    
  • Usage of the ftl scheduler

    Usage of the ftl scheduler

    Hi there,

    I think I am using the library in a wrong way. Because I wanted to make a simple loading screen where I want load a bunch of textures in the background. When I made new tasks it holds everything before continue with doing anything else. Do I need to set something anywhere in order to continue my code after scheduler?

  • Allow dynamically allocated size of fiber wait lists with ftl::AtomicCounter

    Allow dynamically allocated size of fiber wait lists with ftl::AtomicCounter

    This should solve the problems that we talked about in #66 about the amount of things that can wait on an atomic counter. It isn't perfect as it isn't dynamic, but that would require a lockfree list and that's a lot of work and overhead for what should be a simple job.

  • Add ability to use system boost

    Add ability to use system boost

    Hi!

    I tried to compile your library with the latest boost 1.63.0 but it does not work out of the box. Is it possible to provide an alternate way of including boost except that from third party dir?

    1. Boost 1.63.0 does not have fcontext.h
    2. Namespace boost_context is not the default namespace for boost. Should be boost::context.
  • Emscripten support

    Emscripten support

    Hi,

    Thank you for this awesome library. Is it possible to have emscripten support using its fiber library https://emscripten.org/docs/api_reference/fiber.h.html ?

  • Use an atomic bitfield for m_freeFibers instead of an array of atomic<bool>

    Use an atomic bitfield for m_freeFibers instead of an array of atomic

    The idea being that we can use less atomic memory fetches / memory thrashing.

    Another approach would be to still use the array of atomic, but have the threads start from a random index each time. To do this we would need to have a prng initialized and seeded in TLS. (PCG could work well / be fast and simple).

    Thus, GetNextFreeFiberIndex() would first query the prng in order to get the start index, and then it would search from startIndex and loop around to startIndex - 1. (the current behavior iterates from 0 to m_fiberPoolSize)

  • Dynamically changing EmptyQueueBehavior from Sleep to anything else can leave threads permanently sleeping

    Dynamically changing EmptyQueueBehavior from Sleep to anything else can leave threads permanently sleeping

    In multiple places in the code, we switch based on the empty queue behavior and do different logic for each.

    Most prominantly, when FiberStart function fails to find a task or fiber to run, it will either loop, yield, or sleep. If behavior is set to Sleep, in AddTask(s) and when waiting tasks are ready, we then notify sleeping threads to wake.

    const EmptyQueueBehavior behavior = m_emptyQueueBehavior.load(std::memory_order_relaxed);
    if (behavior == EmptyQueueBehavior::Sleep) {
        // Find a thread that is sleeping and wake it
        for (size_t i = 0; i < m_numThreads; ++i) {
            std::unique_lock<std::mutex> lock(m_tls[i].FailedQueuePopLock);
            if (m_tls[i].FailedQueuePopAttempts >= kFailedPopAttemptsHeuristic) {
                m_tls[i].FailedQueuePopAttempts = 0;
                m_tls[i].FailedQueuePopCV.notify_one();
    
                break;
            }
        }
    }
    

    However, if the user changes the behavior mid run, we can end up with a thread permanently sleeping:

    1. Worker thread A goes to sleep since there are no tasks
    2. User changes behavior from Sleep to Yield
    3. User code calls AddTasks()
    4. AddTasks() doesn't wake thread A because behavior is set to Yield now.
  • Update the Copyrights to include all contributors

    Update the Copyrights to include all contributors

    Something to the effect of: Copyright 2020 by the FiberTaskingLib Contributors

    Similar for all the file headers.

    Then create a Contributors.txt file that lists all the contributors.

experimental cooperative threading library for gba in pure C
experimental cooperative threading library for gba in pure C

gba-co-thread Experimental cooperative threading library for Gameboy Advance in pure C. See co_thread.h and co_thread.c for the tiny threading library

Jul 24, 2022
oneAPI Threading Building Blocks (oneTBB)

oneAPI Threading Building Blocks oneAPI Threading Building Blocks (oneTBB) lets you easily write parallel C++ programs that take full advantage of mul

Aug 10, 2022
Tbb - oneAPI Threading Building Blocks (oneTBB)

oneAPI Threading Building Blocks oneTBB is a flexible C++ library that simplifies the work of adding parallelism to complex applications, even if you

Aug 7, 2022
Powerful multi-threaded coroutine dispatcher and parallel execution engine

Quantum Library : A scalable C++ coroutine framework Quantum is a full-featured and powerful C++ framework build on top of the Boost coroutine library

Jul 25, 2022
SymQEMU: Compilation-based symbolic execution for binaries

SymQEMU This is SymQEMU, a binary-only symbolic executor based on QEMU and SymCC. It currently extends QEMU 4.1.1 and works with the most recent versi

Aug 14, 2022
C++-based high-performance parallel environment execution engine for general RL environments.
C++-based high-performance parallel environment execution engine for general RL environments.

EnvPool is a highly parallel reinforcement learning environment execution engine which significantly outperforms existing environment executors. With

Aug 17, 2022
C++14 coroutine-based task library for games

SquidTasks Squid::Tasks is a header-only C++14 coroutine-based task library for games. Full project and source code available at https://github.com/we

Aug 16, 2022
Arcana.cpp - Arcana.cpp is a collection of helpers and utility code for low overhead, cross platform C++ implementation of task-based asynchrony.

Arcana.cpp Arcana is a collection of general purpose C++ utilities with no code that is specific to a particular project or specialized technology are

Aug 10, 2022
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

moodycamel::ConcurrentQueue An industrial-strength lock-free queue for C++. Note: If all you need is a single-producer, single-consumer queue, I have

Aug 15, 2022
A bounded multi-producer multi-consumer concurrent queue written in C++11
A bounded multi-producer multi-consumer concurrent queue written in C++11

MPMCQueue.h A bounded multi-producer multi-consumer concurrent queue written in C++11. It's battle hardened and used daily in production: In the Frost

Aug 13, 2022
C++11 thread safe, multi-producer, multi-consumer blocking queue, stack & priority queue class

BlockingCollection BlockingCollection is a C++11 thread safe collection class that provides the following features: Modeled after .NET BlockingCollect

Aug 7, 2022
Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction

Kokkos: Core Libraries Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platfor

Aug 10, 2022
Exploration of x86-64 ISA using speculative execution.

Haruspex /həˈrʌspeks/ A religious official in ancient Rome who predicted the future or interpreted the meaning of events by examining the insides of b

Aug 8, 2022
A header-only C++ library for task concurrency
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Aug 6, 2022
OOX: Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling.

OOX Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling. Try it Requirements: Install cmake,

Aug 9, 2022
Cpp-taskflow - Modern C++ Parallel Task Programming Library
Cpp-taskflow - Modern C++ Parallel Task Programming Library

Cpp-Taskflow A fast C++ header-only library to help you quickly write parallel programs with complex task dependencies Why Cpp-Taskflow? Cpp-Taskflow

Mar 30, 2021
A General-purpose Parallel and Heterogeneous Task Programming System
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a

Aug 16, 2022
A task scheduling framework designed for the needs of game developers.

Intel Games Task Scheduler (GTS) To the documentation. Introduction GTS is a C++ task scheduling framework for multi-processor platforms. It is design

Aug 14, 2022
A hybrid thread / fiber task scheduler written in C++ 11

Marl Marl is a hybrid thread / fiber task scheduler written in C++ 11. About Marl is a C++ 11 library that provides a fluent interface for running tas

Aug 4, 2022