Parallel-util - Simple header-only implementation of "parallel for" and "parallel map" for C++11

parallel-util

Test

A single-header implementation of parallel_for, parallel_map, and parallel_exec using C++11.

This library is based on multi-threading on CPU (std::thread) and the default concurrency is set to the hardware concurrency (std::thread::hardware_concurrency()).

Usage of parallel_for

Suppose that you have a callable function that can be stored by an instance of std::function<void(int)>, for example, defined by C++11 lambda expression:

auto process = [](int i) { ... };

and want to parallelize the following for-loop procedure:

for (int i = 0; i < n; ++ i) { process(i); }

By using parallel-util, this can be easily parallelized by

parallelutil::parallel_for(n, process);

Usage of parallel_map

Suppose that you have a callable function that takes an instance of T1 as input and returns an instance of T2 as output, and thus can be stored by an instance of std::function<T2(T1)>. For example,

auto square = [](double x) { return x * x; };

In this case, T1 = T2 = double. Also suppose that you have an array of T1 and want to obtain an array of T2 by applying the function to each array element. For example, you have an array:

std::vector<double> input_array = { 0.2, 0.9, - 0.4, 0.5, 0.3 };

and want to their squares. By using parallel-util, this can be easily parallelized by

auto output_array = parallelutil::parallel_map(input_array, square);

where output_array is an array: { 0.04, 0.81, 0.16, 0.25, 0.09 }.

If you are using C++17 Parallel STL, std::transform has similar functionality.

Usage of parallel_exec

An arbitrary number of functions whose type is std::function<void()>, for example,

auto process_1 = [](){ ... };
auto process_2 = [](){ ... };
auto process_3 = [](){ ... };

can be executed in parallel by

parallelutil::parallel_exec({ process_1, process_2, process_3 });

Installation

parallel-util is a header-only, single-file library. It can be used by just copying parallel-util.hpp and pasting it into your project.

Alternatively, it can be installed using cmake. If your project is also managed using cmake, ExternalProject or add_subdirectory commands are useful for including parallel-util to your project.

If you want to install parallel-util to your system, use the typical cmake cycle:

git clone https://github.com/yuki-koyama/parallel-util.git
mkdir build
cd build
cmake ../parallel-util
make install

Dependencies

  • C++ Standard Library; Thread support library (require -pthread)

Persuing Further Performance

Please consider to use more sophisticated libraries such as Intel(R) Threading Building Blocks.

Projects using parallel-util

LICENSING

MIT License.

Owner
Similar Resources

Cpp-mempool - C++ header-only mempool library

cpp-mempool C++ header-only mempool library

Jun 21, 2022

A General-purpose Parallel and Heterogeneous Task Programming System

A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a

Dec 31, 2022

Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction

Kokkos: Core Libraries Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platfor

Jan 5, 2023

Powerful multi-threaded coroutine dispatcher and parallel execution engine

Quantum Library : A scalable C++ coroutine framework Quantum is a full-featured and powerful C++ framework build on top of the Boost coroutine library

Dec 30, 2022

An optimized C library for math, parallel processing and data movement

PAL: The Parallel Architectures Library The Parallel Architectures Library (PAL) is a compact C library with optimized routines for math, synchronizat

Dec 11, 2022

A General-purpose Parallel and Heterogeneous Task Programming System

A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an

Dec 26, 2022

Parallel algorithms (quick-sort, merge-sort , enumeration-sort) implemented by p-threads and CUDA

程序运行方式 一、编译程序,进入sort-project(cuda-sort-project),输入命令行 make 程序即可自动编译为可以执行文件sort(cudaSort)。 二、运行可执行程序,输入命令行 ./sort 或 ./cudaSort 三、删除程序 make clean 四、指定线程

May 30, 2022

EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

Support development of enkiTS through Github Sponsors or Patreon enkiTS Master branch Dev branch enki Task Scheduler A permissively licensed C and C++

Dec 27, 2022
Comments
  • Unbalance inner loops

    Unbalance inner loops

    About the parallel_for function, the current algorithm to assign inner loops to each thread is not well designed and can produce unbalanced assignments. For example, suppose the following case:

    • 1050 loops
    • 100 threads

    The thread no.1 to no.99 will be assigned 10 inner loops, but the last thread no.100 will be assigned 60 inner loops. Obviously, the last thread can be the bottleneck.

    This should be fixed somehow.

  • Added a new task queue-based parallel_for -- which should be the default?

    Added a new task queue-based parallel_for -- which should be the default?

    Recently I added a new parallel_for function:

    template<typename Callable>
    void queue_based_parallel_for(int n, Callable function, int target_concurrency = 0);
    

    This function uses a task queue and each thread takes a next task from the queue every time a task finishes.

    Compared to the original parallel_for, this function is likely to achieve better CPU occupancy especially when the cost of each local process is computationally heterogenous (i.e., some processes are light and others are heavy). However, this function could be slower than the original parallel_for in some cases because of

    1. cache inefficiency (each thread works on less local processes) and
    2. mutex lock for the task queue.

    The question is, which approach should be the default parallel_for?

DwThreadPool - A simple, header-only, dependency-free, C++ 11 based ThreadPool library.
DwThreadPool - A simple, header-only, dependency-free, C++ 11 based ThreadPool library.

dwThreadPool A simple, header-only, dependency-free, C++ 11 based ThreadPool library. Features C++ 11 Minimal Source Code Header-only No external depe

Oct 28, 2022
Parallel implementation of Dijkstra's shortest path algorithm using MPI

Parallel implementation of Dijkstra's shortest path algorithm using MPI

Jan 21, 2022
C++20's jthread for C++11 and later in a single-file header-only library
C++20's jthread for C++11 and later in a single-file header-only library

jthread lite: C++20's jthread for C++11 and later A work in its infancy. Suggested by Peter Featherstone. Contents Example usage In a nutshell License

Dec 8, 2022
A header-only C++ library for task concurrency
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Dec 19, 2022
Header-Only C++20 Coroutines library

CPP20Coroutines Header-Only C++20 Coroutines library This repository aims to demonstrate the capabilities of C++20 coroutines. generator Generates val

Aug 15, 2022
Header-only library for multithreaded programming

CsLibGuarded Introduction The CsLibGuarded library is a standalone header only library for multithreaded programming. This library provides templated

Jan 2, 2023
Portable header-only C++ low level SIMD library

libsimdpp libsimdpp is a portable header-only zero-overhead C++ low level SIMD library. The library presents a single interface over SIMD instruction

Dec 13, 2022
EOSP ThreadPool is a header-only templated thread pool writtent in c++17.

EOSP Threadpool Description EOSP ThreadPool is a header-only templated thread pool writtent in c++17. It is designed to be easy to use while being abl

Apr 22, 2022
Header-only library for multithreaded programming

CsLibGuarded Introduction The CsLibGuarded library is a standalone header only library for multithreaded programming. This library provides templated

Dec 20, 2022
Fiber - A header only cross platform wrapper of fiber API.

Fiber Header only cross platform wrapper of fiber API A fiber is a particularly lightweight thread of execution. Which is useful for implementing coro

Jul 31, 2022