EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

Support development of enkiTS through Github Sponsors or Patreon

Become a Patron

enkiTS Logo

enkiTS

Master branch Dev branch
Build Status for branch: master Build Status for branch: dev

enki Task Scheduler

A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

The primary goal of enkiTS is to help developers create programs which handle both data and task level parallelism to utilize the full performance of multicore CPUs, whilst being lightweight (only a small amount of code) and easy to use.

enkiTS was developed for, and is used in enkisoftware's Avoyd codebase.

Platforms

  • Windows, Linux, Mac OS, Android (should work on iOS)
  • x64 & x86, ARM

enkiTS is primarily developed on x64 and x86 Intel architectures on MS Windows, with well tested support for Linux and somewhat less frequently tested support on Mac OS and ARM Android.

Examples

Several examples exist in the example folder.

For further examples, see https://github.com/dougbinks/enkiTSExamples

Building

Building enkiTS is simple, just add the files in enkiTS/src to your build system (_c.* files can be ignored if you only need C++ interface), and add enkiTS/src to your include path. Unix / Linux builds will likely require the pthreads library.

For C++

  • Use #include "TaskScheduler.h"
  • Add enkiTS/src to your include path
  • Compile / Add to project:
    • TaskScheduler.cpp
  • Unix / Linux builds will likely require the pthreads library.

For C

  • Use #include "TaskScheduler_c.h"
  • Add enkiTS/src to your include path
  • Compile / Add to project:
    • TaskScheduler.cpp
    • TaskScheduler_c.cpp
  • Unix / Linux builds will likely require the pthreads library.

For cmake, on Windows / Mac OS X / Linux with cmake installed, open a prompt in the enkiTS directory and:

  1. mkdir build
  2. cd build
  3. cmake ..
  4. either run make all or for Visual Studio open enkiTS.sln

Project Features

  1. Lightweight - enkiTS is designed to be lean so you can use it anywhere easily, and understand it.
  2. Fast, then scalable - enkiTS is designed for consumer devices first, so performance on a low number of threads is important, followed by scalability.
  3. Braided parallelism - enkiTS can issue tasks from another task as well as from the thread which created the Task System, and has a simple task interface for both data parallel and task parallelism.
  4. Up-front Allocation friendly - enkiTS is designed for zero allocations during scheduling.
  5. Can pin tasks to a given thread - enkiTS can schedule a task which will only be run on the specified thread.
  6. Can set task priorities - Up to 5 task priorities can be configured via define ENKITS_TASK_PRIORITIES_NUM (defaults to 3). Higher priority tasks are run before lower priority ones.
  7. Can register external threads to use with enkiTS - Can configure enkiTS with numExternalTaskThreads which can be registered to use with the enkiTS API.
  8. Custom allocator API - can configure enkiTS with custom allocators, see example/CustomAllocator.cpp and example/CustomAllocator_c.c.
  9. Dependencies - can set dependendencies between tasks see example/Dependencies.cpp and example/Dependencies_c.c.
  10. Completion Actions - can perform an action on task completion. This avoids the expensive action of adding the task to the scheduler, and can be used to safely delete a completed task. See example/CompletionAction.cpp and example/CompletionAction_c.c
  11. NEW Can wait for pinned tasks - Can wait for pinned tasks, useful for creating IO threads which do no other work. See example/WaitForPinnedTasks.cpp and example/WaitForPinnedTasks_c.c.

Using enkiTS

C++ usage

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

// define a task set, can ignore range if we only do one thing
struct ParallelTaskSet : enki::ITaskSet {
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // do something here, can issue tasks with g_TS
    }
};

int main(int argc, const char * argv[]) {
    g_TS.Initialize();
    ParallelTaskSet task; // default constructor has a set size of 1
    g_TS.AddTaskSetToPipe( &task );

    // wait for task set (running tasks if they exist)
    // since we've just added it and it has no range we'll likely run it.
    g_TS.WaitforTask( &task );
    return 0;
}

C++ 11 lambda usage

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

int main(int argc, const char * argv[]) {
   g_TS.Initialize();

   enki::TaskSet task( 1, []( enki::TaskSetPartition range_, uint32_t threadnum_  ) {
         // do something here
      }  );

   g_TS.AddTaskSetToPipe( &task );
   g_TS.WaitforTask( &task );
   return 0;
}

Task priorities usage in C++

// See full example in Priorities.cpp
#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

struct ExampleTask : enki::ITaskSet
{
    ExampleTask( ) { m_SetSize = size_; }

    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // See full example in Priorities.cpp
    }
};


// This example demonstrates how to run a long running task alongside tasks
// which must complete as early as possible using priorities.
int main(int argc, const char * argv[])
{
    g_TS.Initialize();

    ExampleTask lowPriorityTask( 10 );
    lowPriorityTask.m_Priority  = enki::TASK_PRIORITY_LOW;

    ExampleTask highPriorityTask( 1 );
    highPriorityTask.m_Priority = enki::TASK_PRIORITY_HIGH;

    g_TS.AddTaskSetToPipe( &lowPriorityTask );
    for( int task = 0; task < 10; ++task )
    {
        // run high priority tasks
        g_TS.AddTaskSetToPipe( &highPriorityTask );

        // wait for task but only run tasks of the same priority or higher on this thread
        g_TS.WaitforTask( &highPriorityTask, highPriorityTask.m_Priority );
    }
    // wait for low priority task, run any tasks on this thread whilst waiting
    g_TS.WaitforTask( &lowPriorityTask );

    return 0;
}

Pinned Tasks usage in C++

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

// define a task set, can ignore range if we only do one thing
struct PinnedTask : enki::IPinnedTask {
    void Execute() override {
      // do something here, can issue tasks with g_TS
    }
};

int main(int argc, const char * argv[]) {
    g_TS.Initialize();
    PinnedTask task; //default constructor sets thread for pinned task to 0 (main thread)
    g_TS.AddPinnedTask( &task );

    // RunPinnedTasks must be called on main thread to run any pinned tasks for that thread.
    // Tasking threads automatically do this in their task loop.
    g_TS.RunPinnedTasks();

    // wait for task set (running tasks if they exist)
    // since we've just added it and it has no range we'll likely run it.
    g_TS.WaitforTask( &task );
    return 0;
}

Dependency usage in C++

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

// define a task set, can ignore range if we only do one thing
struct TaskA : enki::ITaskSet {
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // do something here, can issue tasks with g_TS
    }
};

struct TaskB : enki::ITaskSet {
    enki::Dependency m_Dependency;
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // do something here, can issue tasks with g_TS
    }
};

int main(int argc, const char * argv[]) {
    g_TS.Initialize();
    
    // set dependencies once (can set more than one if needed).
    TaskA taskA;
    TaskB taskB;
    taskB.SetDependency( taskB.m_Dependency, &taskA );

    g_TS.AddTaskSetToPipe( &taskA ); // add first task
    g_TS.WaitforTask( &taskB );      // wait for last
    return 0;
}

External task thread usage in C++

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;
struct ParallelTaskSet : ITaskSet
{
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // Do something
    }
};

void threadFunction()
{
    g_TS.RegisterExternalTaskThread();

    // sleep for a while instead of doing something such as file IO
    std::this_thread::sleep_for( std::chrono::milliseconds( num_ * 100 ) );

    ParallelTaskSet task;
    g_TS.AddTaskSetToPipe( &task );
    g_TS.WaitforTask( &task);

    g_TS.DeRegisterExternalTaskThread();
}

int main(int argc, const char * argv[])
{
    enki::TaskSchedulerConfig config;
    config.numExternalTaskThreads = 1; // we have one extra external thread

    g_TS.Initialize( config );

    std::thread exampleThread( threadFunction );

    exampleThread.join();

    return 0;
}

WaitForPinnedTasks thread usage in C++ (useful for IO threads)

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

struct RunPinnedTaskLoopTask : enki::IPinnedTask
{
    void Execute() override
    {
        while( g_TS.GetIsRunning() )
        {
            g_TS.WaitForNewPinnedTasks(); // this thread will 'sleep' until there are new pinned tasks
            g_TS.RunPinnedTasks();
        }
    }
};

struct PretendDoFileIO : enki::IPinnedTask
{
    void Execute() override
    {
        // Do file IO
    }
};

int main(int argc, const char * argv[])
{
    enki::TaskSchedulerConfig config;

    // In this example we create more threads than the hardware can run,
    // because the IO thread will spend most of it's time idle or blocked
    // and therefore not scheduled for CPU time by the OS
    config.numTaskThreadsToCreate += 1;

    g_TS.Initialize( config );

    // in this example we place our IO threads at the end
    RunPinnedTaskLoopTask runPinnedTaskLoopTasks;
    runPinnedTaskLoopTasks.threadNum = g_TS.GetNumTaskThreads() - 1;
    g_TS.AddPinnedTask( &runPinnedTaskLoopTasks );

    // Send pretend file IO task to external thread FILE_IO
    PretendDoFileIO pretendDoFileIO;
    pretendDoFileIO.threadNum = runPinnedTaskLoopTasks.threadNum;
    g_TS.AddPinnedTask( &pretendDoFileIO );

    // ensure runPinnedTaskLoopTasks complete by explicitly calling shutdown
    g_TS.WaitforAllAndShutdown();

    return 0;
}

Bindings

Deprecated

The C++98 compatible branch has been deprecated as I'm not aware of anyone needing it.

The user thread versions are no longer being maintained as they are no longer in use. Similar functionality can be obtained with the externalTaskThreads

Projects using enkiTS

Avoyd

Avoyd is an abstract 6 degrees of freedom voxel game. enkiTS was developed for use in our in-house engine powering Avoyd.

Avoyd screenshot

Imogen

GPU/CPU Texture Generator

Imogen screenshot

ToyPathRacer

Aras Pranckevičius' code for his series on Daily Path Tracer experiments with various languages.

ToyPathTracer screenshot.

License (zlib)

Copyright (c) 2013-2020 Doug Binks

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgement in the product documentation would be appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.
Owner
Doug Binks
Game dev, C++, multithreading, Runtime Compiled C++, voxels, graphics. Co-founder of enkisoftware with @juliettef. Occasionally available for consultancy.
Doug Binks
Comments
  • question : support sleep-waiting ?

    question : support sleep-waiting ?

    Hi, It looks like all WaitFor* methods in enkiTS is busy waiting. In some situations they will result in long spinning time. So what is the preferred/proper way to use sleep-waiting for tasks finishing in enkiTS ?

  • Feature: custom allocators

    Feature: custom allocators

    Whilst enkiTS only allocates at initialization time, a custom allocator would be useful to some users for tracking memory consumption.

    https://twitter.com/serhii_rieznik/status/1187011358220541952

  • Pinned task problem

    Pinned task problem

    I think there is a problem with TaskScheduler::WakeThreadsForNewTasks() and pinned tasks.

    Consider a possible case: what if the number of suspended threads - those waiting the m_pNewTaskSemaphore to be signalled - increases just before the SemaphoreSignal() called, i.e. the value of waiting was not accurate as some threads fell asleep between the check and the signal. In this case some task threads would idle. Most of the time it's not a big deal, those threads would awake when next task arrives. (Not sure if it possible, but even if we are so unlucky and all the tasks threads fall asleep just after the check - the calling thread would handle the task itself.)

    Now, when using AddPinnedTask() there's a subtle chance that the thread we pinned the task to was suspended as described above:

    • [User thread] calls AddPinnedTask().
    • [Task thread] falls asleep just after the m_NumThreadsWaitingForNewTasks check.
    • The semaphore is either not being released at all or it awakens some threads but the desired one.
    • [User thread] calls WaitforTask() and hangs as the thread the task is pinned to can't handle the request.

    This is the problem I ran into while trying to port my code to enkiTS. Though to be honest I'm not quite sure if it indeed the case and if my assumption is accurate. Parallel programming is hard.

  • Taskset dependencies

    Taskset dependencies

    May I ask if you are planning on adding events in the near future? If you are, I would really like to hear about how you plan on designing them. If not, I would be interested in adding them as a PR.

    My motivation is: I am designing a completely asynch image decompressor, so I need to connect up different task sets with events, so that the completion of one task set causes a waiting set to enqueue itself.

    Interested to hear your plans for this feature.

  • Crash in Android 11 beta

    Crash in Android 11 beta

    This is almost certainly an issue on Android's side, not yours, but I wanted to at least bring things to attention. In enki::DefaultAllocFunc(), non-Win32 programs use posix_memalign(). Our 64-bit Android app is segfaulting at launch, and it seems that the call to posix_memalign() is involved in whatever's going wrong. If we replace that with a call to plain malloc(), our app carries along running "fine". By "fine", I mean this isn't significantly tested or shipped beyond my local build going from segfault-at-launch to looking like all's well.

    Like I said, probably an Android 11 beta issue-- which I'm testing on a Pixel phone-- but at least wanted to make sure you were aware.

    Edit: correction/clarification. It's not the call to posix_memalign() itself that's segfaulting. When TaskScheduler::StartThreads() runs m_pTaskCompleteSemaphore = SemaphoreNew()-- the second call to SemaphoreNew()-- the resulting placement-new in TaskScheduler::New() is what's segfaulting. Even though posix_memalign() returned a "success" error code of 0, and the pointer to memory is non-null.

  • Valgrind errors on OSX High Sierra

    Valgrind errors on OSX High Sierra

    Hi Doug, I ran my image codec, grok, which uses enkiTS, on OSX with valgrind, and I see this error:

    ==3380== Process terminating with default action of signal 11 (SIGSEGV)
    ==3380==  Access not within mapped region at address 0x18
    ==3380==    at 0x100CC05BA: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
    ==3380==    by 0x100CC050C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
    ==3380==    by 0x100CBFBF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
    ==3380==  If you believe this happened as a result of a stack
    ==3380==  overflow in your program's main thread (unlikely but
    ==3380==  possible), you can try to increase the size of the
    ==3380==  main thread stack using the --main-stacksize= flag.
    ==3380==  The main thread stack size used in this run was 8388608.
    

    This is for an earlier version of enkiTS, as the latest version will result in BAD_ACCESS error and my program crashes.

    Have you run any valgrind tests on OSX ? Everything looks good on Linux.

    Thanks.

  • Occasional performance spikes in SetEvent

    Occasional performance spikes in SetEvent

    Using Microprofile on Windows, I noticed that SetEventcan occasionally take longer than usual. enkiTS calls it to wake up the worker threads in AddTaskSetToPipe. SetEventusually blocks for less than 1ms, but I've seen spikes way up in the 20s of ms, which is a problem if the game waits on a task which is blocked by AddTaskSetToPipe. It ends up producing noticable frame spikes every few seconds. By changing the Event Object to auto-reset in EventCreate(https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx) these spikes disappear. However, this will only wake up 1 thread at a time, and may decrease thread utilization. I could also avoid the issue by increasing the spin count, however this of course increases power consumption. This is probably not a big issue if the threads rarely wait, so it depends on the workload of the scheduler, as well as the number of cores in use. Maybe auto-reset mode should be an option?

  • Feature suggestion: running tasks from non main/task threads

    Feature suggestion: running tasks from non main/task threads

    Hi Doug,

    enkiTS does not allow running tasks or waiting for completion from threads other than main/task threads, as I understood.

    For example, I would like to be able to use the system from rendering thread, which itself is not a task thread, but a full-fledged thread typically running in parallel with the main one. Or from background loading thread which is mostly idle waiting for IO, but uses tasks to decompress/finalize assets. It could be cool if enkiTS was able to support that. What do you think? Thanks!

    -- Aleksei

  • ThreadSanitizer reports

    ThreadSanitizer reports

    I have been testing latest master (4f9941b). ThreadSanitizer, enabled under XCode 11.0, is reporting some data races when running unmodified samples.

    I am reporting the output of one Data race report for ParallelSum as an example.

    ==================
    WARNING: ThreadSanitizer: data race (pid=56086)
      Read of size 4 at 0x7ffeefbff49c by thread T4:
        #0 enki::TaskScheduler::TryRunTask(unsigned int, unsigned int, unsigned int&) TaskScheduler.cpp:412 (ParallelSum:x86_64+0x100007204)
        #1 enki::TaskScheduler::TryRunTask(unsigned int, unsigned int&) TaskScheduler.cpp:377 (ParallelSum:x86_64+0x1000050d0)
        #2 enki::TaskScheduler::TaskingThreadFunction(enki::ThreadArgs const&) TaskScheduler.cpp:236 (ParallelSum:x86_64+0x100004e04)
        #3 decltype(std::__1::forward<void (*)(enki::ThreadArgs const&)>(fp)(std::__1::forward<enki::ThreadArgs>(fp0))) std::__1::__invoke<void (*)(enki::ThreadArgs const&), enki::ThreadArgs>(void (*&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) type_traits:4361 (ParallelSum:x86_64+0x10000d06d)
        #4 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs>&, std::__1::__tuple_indices<2ul>) thread:342 (ParallelSum:x86_64+0x10000ceb1)
        #5 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs> >(void*) thread:352 (ParallelSum:x86_64+0x10000bf09)
    
      Previous write of size 4 at 0x7ffeefbff49c by main thread:
        #0 enki::ITaskSet::ITaskSet() TaskScheduler.h:122 (ParallelSum:x86_64+0x100003288)
        #1 ParallelReductionSumTaskSet::ParallelReductionSumTaskSet(unsigned int) ParallelSum.cpp:81 (ParallelSum:x86_64+0x100003ba8)
        #2 ParallelReductionSumTaskSet::ParallelReductionSumTaskSet(unsigned int) ParallelSum.cpp:82 (ParallelSum:x86_64+0x100002e04)
        #3 main ParallelSum.cpp:146 (ParallelSum:x86_64+0x100002390)
    
      Location is stack of main thread.
    
      Thread T4 (tid=3398714, running) created by main thread at:
        #0 pthread_create <null>:2673040 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2aa2d)
        #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:328 (ParallelSum:x86_64+0x10000be4e)
        #2 std::__1::thread::thread<void (&)(enki::ThreadArgs const&), enki::ThreadArgs, void>(void (&&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) thread:368 (ParallelSum:x86_64+0x10000ba71)
        #3 std::__1::thread::thread<void (&)(enki::ThreadArgs const&), enki::ThreadArgs, void>(void (&&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) thread:360 (ParallelSum:x86_64+0x100006238)
        #4 enki::TaskScheduler::StartThreads() TaskScheduler.cpp:298 (ParallelSum:x86_64+0x100005901)
        #5 enki::TaskScheduler::Initialize(unsigned int) TaskScheduler.cpp:924 (ParallelSum:x86_64+0x10000a687)
        #6 main ParallelSum.cpp:136 (ParallelSum:x86_64+0x10000231a)
    
    SUMMARY: ThreadSanitizer: data race TaskScheduler.cpp:412 in enki::TaskScheduler::TryRunTask(unsigned int, unsigned int, unsigned int&)
    ==================
    ThreadSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.
    
    

    This is reporting that reading subTask.pTask->m_RangeToRun is a data race

    
    bool TaskScheduler::TryRunTask( uint32_t threadNum_, uint32_t priority_, uint32_t& hintPipeToCheck_io_ )
    {
    // ...
     
            if( subTask.pTask->m_RangeToRun < partitionSize )
            {
                SubTaskSet taskToRun = SplitTask( subTask, subTask.pTask->m_RangeToRun );
           }
    
    
    

    When declaring ParallelSumTaskSet m_ParallelSumTaskSet; inside struct ParallelReductionSumTaskSet

    struct ParallelReductionSumTaskSet : ITaskSet
    {
        ParallelSumTaskSet m_ParallelSumTaskSet;
        uint64_t m_FinalSum;
    
        ParallelReductionSumTaskSet( uint32_t size_ ) : m_ParallelSumTaskSet( size_ ), m_FinalSum(0)
        {
                m_ParallelSumTaskSet.Init( g_TS.GetNumTaskThreads() );
        }
    
        virtual void    ExecuteRange( TaskSetPartition range_, uint32_t threadnum_ )
        {
            g_TS.AddTaskSetToPipe( &m_ParallelSumTaskSet );
            g_TS.WaitforTask( &m_ParallelSumTaskSet );
    
            for( uint32_t i = 0; i < m_ParallelSumTaskSet.m_NumPartialSums; ++i )
            {
                m_FinalSum += m_ParallelSumTaskSet.m_pPartialSums[i].count;
            }
        }
    }
    
    

    will initialize ParallelSumTaskSet::m_RangeToRun in the constructor:

        class ITaskSet : public ICompletable
        {
        public:
            ITaskSet()
                : m_SetSize(1)
                , m_MinRange(1)
                , m_RangeToRun(1)
            {}
    };
    

    I am not expert on the field, but it looks like a potential false positive, because TryRunTask is executed only after AddTaskSetToPipe.

    I try to keep our software clean from all sanitizer reports, so that I can catch real bugs ;) For this reason if this or other reports look safe, I suggest to add annotations that disable TSAN where appropriate (using no_sanitize("thread")).

    Does it make sense for me to report all data races found by the TSAN output?

    Oh, and thanks for your excellent work on the library :)

  • Feature request: check return codes for semaphore system calls

    Feature request: check return codes for semaphore system calls

    I noticed that the return codes for semaphore creation etc. aren't being checked for errors. So, create may fail and caller doesn't get notified. What do you think is the best way of handling error conditions: throw an exception, or return false ? Thanks!

  • Completion of error handling

    Completion of error handling

  • Q: Manual partitioning

    Q: Manual partitioning

    enkiTS automatically partitions workload into ranges.

    What is a recommended way to manually partition workload and submit these tasks to the scheduler? So, I want to create a list of tasks with manually specified (start, end) range.

    There are two applications:

    • user defined task splitting
    • creating tasks over 2D/3D domain

    Any suggestions are greatly appreciated.

  • Scheduling tasks with high priority after-the-fact

    Scheduling tasks with high priority after-the-fact

    This may be out of scope for the library, or potentially I've missed a way to do it. What I would like to be able to do is add a number of tasks with no particular priority and have them be scheduled in parallel in no particular order, but I would want the ability to immediately run one if I discover that it's particularly needed soon (while allowing others to still be scheduled as/when).

    The sketched idea is to have the main thread push tasks for a bunch of deferred operations as it's processing something, but then if the results of one of those deferred operations is needed it wants to either immediately run that task on the main thread or spinloop if it's already been scheduled. The tasks can come in any order so you could push 100 different tasks and then want the results from number 50 before continuing, so waiting for 1-49 would not be ideal as then the main thread can't continue processing in parallel with them.

    As far as I can tell unless I'm misunderstanding the code, WaitForTask does not prioritise the task you pass in to wait on but it looks at all tasks at equal priority. I don't see a way to elevate the priority of a task after it has been created either. I also wondered if the way to solve this would be to add a new task with a dependency on the one I want to run, but at least from a layperson's eye that doesn't seem like it would change the scheduling order.

    Is there a way to do this, and/or is it something you'd be interested in supporting? In principle this could be refactored such that there is no main thread needing to sync, and everything including its processing becomes task-based so this translates into dependency management, but that's further than I'd want to go.

  • Running tasks via WaitForTask(NULL)

    Running tasks via WaitForTask(NULL)

    I want to be able to say "wait up to N microseconds on the current thread for a task to be executable then run at most one task", and optionally repeat that afterwards with a smaller N, as a way of better using time than a usleep(N) on my main thread in between doing other high priority work. The docs for TaskScheduler::WaitForTask say:

    if called with 0 it will try to run tasks, and return if none available.

    And assuming 0 means NULL then that seems pretty much like what I want, however I'm not sure what guarantees there are on how much work this does. Will it run only one task or multiple? Will it keep going until no more are ready?

    If this contractually only runs one task at most then I think that would pretty much do what I want, though I would need to implement the timeout myself with an external spinloop. I'm not sure if that's much less optimal than if you could do a timeout internally. I also don't see a way to tell if this actually ran a task, to be able to run at most one within the timeframe. That would be nice but isn't necessary.

    Wall of text if desired with more context on what I'm trying to do to avoid the XY problem

    I'm sketching a design where I would have a main thread which adds N tasks, and then be solely responsible for picking up special work from the tasks that needs to go to and from the GPU via a single controlling thread, running it and reading back the results, and adding follow-up tasks to process the results. Normally this would just have a sleep while waiting for results as there's a fair amount of latency in going to the GPU and getting results back, so I want to be able to run tasks on the CPU in the meantime on the main thread without being late to pick up results from the GPU.

    My ideal design then would be for the main thread to have a loop whereby it checks to see if there's GPU work to process, and then if there's nothing to do it runs CPU tasks for a bit before checking again. The key is I would really only want it to run a bounded amount of CPU work before returning to check on the GPU again, to have guarantees on how frequently I will check for GPU work again.

    From what I can see there's a few options:

    1. Keep all of the GPU work out of enkiTS entirely, have my outer loop that looks for GPU work but instead of sleeping when there's nothing to do I instead call WaitforTask(NULL). Hence the above question :smile:

    2. Rely on the OS scheduler instead of enkiTS's scheduler. Let the main thread do a loop and do a genuine OS sleep in between GPU work, relying on the thread being kicked off the hardware core and add one more task thread to be scheduled at the same time. I'm worried that this could impact the main thread though as now the hardware threads could be oversubscribed, and it depends on when the OS scheduler decides to schedule the main thread again.

    3. Set a high priority pinned task on the main thread, and each time it completes it queues a new pinned task on the main thread to be re-run again. The main thread just does WaitForAll or similar and relies on the enkiTS scheduler to sort it out. I think this might technically work but I have a feeling it's going to boil down to a spinloop, since the GPU task will keep being higher priority and I don't think there's anything to stop it getting scheduled again as soon as a new task is added. I don't see a way to represent that sleep/delay before being scheduled again (at least without some explicit dependency on a task in between which I don't have).

    4. Have the tasks submitting the GPU work add a new high priority pinned task whenever they add new work, but the challenge here is if the GPU isn't ready I don't want the pinned task to be blocking if there's more CPU work to do - so something needs to get it to check again and I'm not guaranteed any more tasks will submit new GPU work after that.

    I may also be missing an obviously better way to do this!

A easy to use multithreading thread pool library for C. It is a handy stream like job scheduler with an automatic garbage collector. This is a multithreaded job scheduler for non I/O bound computation.

A easy to use multithreading thread pool library for C. It is a handy stream-like job scheduler with an automatic garbage collector for non I/O bound computation.

Jun 4, 2022
A hybrid thread / fiber task scheduler written in C++ 11

Marl Marl is a hybrid thread / fiber task scheduler written in C++ 11. About Marl is a C++ 11 library that provides a fluent interface for running tas

Jan 4, 2023
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

Dec 30, 2022
EnkiTSExamples - Examples for enkiTS
EnkiTSExamples - Examples for enkiTS

Support development of enkiTS through Github Sponsors or Patreon enkiTS Examples Submodules are licensed under their own licenses, see their contents

Sep 30, 2022
A General-purpose Parallel and Heterogeneous Task Programming System
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a

Dec 31, 2022
A General-purpose Parallel and Heterogeneous Task Programming System
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an

Dec 26, 2022
Cpp-taskflow - Modern C++ Parallel Task Programming Library
Cpp-taskflow - Modern C++ Parallel Task Programming Library

Cpp-Taskflow A fast C++ header-only library to help you quickly write parallel programs with complex task dependencies Why Cpp-Taskflow? Cpp-Taskflow

Mar 30, 2021
Sqrt OS is a simulation of an OS scheduler and memory manager using different scheduling algorithms including Highest Priority First (non-preemptive), Shortest Remaining Time Next, and Round Robin
Sqrt OS is a simulation of an OS scheduler and memory manager using different scheduling algorithms including Highest Priority First (non-preemptive), Shortest Remaining Time Next, and Round Robin

A CPU scheduler determines an order for the execution of its scheduled processes; it decides which process will run according to a certain data structure that keeps track of the processes in the system and their status.

Sep 7, 2022
OOX: Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling.

OOX Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling. Try it Requirements: Install cmake,

Oct 25, 2022
afl/afl++ with a hierarchical seed scheduler

This is developed based on AFLplusplus (2.68c, Qemu mode), thanks to its amazing maintainers and community Build and Run Please follow the instruction

Nov 25, 2022
Forkpool - A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20

riften::Forkpool A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20. This project uses C++20's coroutines to implement c

Dec 31, 2022
Bikeshed - Lock free hierarchical work scheduler

Branch OSX / Linux / Windows master master bikeshed Lock free hierarchical work scheduler Builds with MSVC, Clang and GCC, header only, C99 compliant,

Dec 30, 2022
Scheduler - Modern C++ Scheduling Library

Scheduler Modern C++ Header-Only Scheduling Library. Tasks run in thread pool. Requires C++11 and ctpl_stl.h in the path. Inspired by the Rufus-Schedu

Dec 21, 2022
Arcana.cpp - Arcana.cpp is a collection of helpers and utility code for low overhead, cross platform C++ implementation of task-based asynchrony.

Arcana.cpp Arcana is a collection of general purpose C++ utilities with no code that is specific to a particular project or specialized technology are

Nov 23, 2022
A task scheduling framework designed for the needs of game developers.

Intel Games Task Scheduler (GTS) To the documentation. Introduction GTS is a C++ task scheduling framework for multi-processor platforms. It is design

Jan 3, 2023
A header-only C++ library for task concurrency
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Dec 19, 2022
Task System presented in "Better Code: Concurrency - Sean Parent"
Task System presented in

task_system task_system provides a task scheduler for modern C++. The scheduler manages an array of concurrent queues A task, when scheduled, is enque

Dec 7, 2022
Jobxx - Lightweight C++ task system

jobxx License Copyright (c) 2017 Sean Middleditch [email protected] This is free and unencumbered software released into the public domain. A

May 28, 2022
C++14 coroutine-based task library for games

SquidTasks Squid::Tasks is a header-only C++14 coroutine-based task library for games. Full project and source code available at https://github.com/we

Nov 30, 2022