120 Resources
C/C++ parallel-computing Libraries
PaRSEC: the Parallel Runtime Scheduler and Execution Controller for micro-tasks on distributed heterogeneous systems.
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
Parallel bitonic sorter with limited enclave page cache (default 80MB).
Parallel Oblivious Sorter with SGX Parallel bitonic sorter with limited Intel SGX enclave page cache (default 80MB). Compile and Run Compile make clea
An OpenGL 4.3 / C++ 11 rendering engine oriented towards animation
aer-engine About An OpenGL 4.3 / C++ 11 rendering engine oriented towards animation. Features: Custom animation model format, SKMA, with a Blender exp
Parallel implementation of Dijkstra's shortest path algorithm using MPI
Parallel implementation of Dijkstra's shortest path algorithm using MPI
FFTW is a free collection of fast C routines for computing the Discrete Fourier Transform in one or more dimensions
FFTW is a free collection of fast C routines for computing the Discrete Fourier Transform in one or more dimensions
A collection of hash tables for parallel programming, including lock-free, wait-free tables.
Hatrack Hash tables for parallel programming This project consisists of fast hash tables suitable for parallel programming, including multiple lock-fr
[WIP] Experimental C++14 multithreaded compile-time entity-component-system library.
ecst Experimental & work-in-progress C++14 multithreaded compile-time Entity-Component-System header-only library. Overview Successful development of
🚀 The fastest WebAssembly interpreter, and the most universal runtime
Wasm3 The fastest WebAssembly interpreter, and the most universal runtime. Based on CoreMark 1.0 and independent benchmarks. Your mileage may vary. Ge
32Kb, small memory footprint, single binary that run list of commands in parallel and waits for their termination
await 32K, small memory footprint, single binary that run list of commands in parallel and waits for their termination documentation linux install cur
C and Python examples from my book on using PETSc to solve PDEs
p4pdes PETSc for Partial Differential Equations is a new book on using PETSc to solve partial differential equations by modern numerical methods. Orde
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.
Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac
Insight Toolkit (ITK) is an open-source, cross-platform toolkit for N-dimensional scientific image processing, segmentation, and registration
ITK: The Insight Toolkit C++ Python Linux macOS Windows Linux (Code coverage) Links Homepage Download Discussion Software Guide Help Examples Issue tr
The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
RaftLib is a C++ Library for enabling stream/data-flow parallel computation. Using simple right shift operators (just like the C++ streams that you wo
KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software
KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software, aiming at modularity, extensibility, and high performance. Kratos is written in C++, and counts with an extensive Python interface.
A coupling library for partitioned multi-physics simulations, including, but not restricted to fluid-structure interaction and conjugate heat transfer simulations.
A coupling library for partitioned multi-physics simulations, including, but not restricted to fluid-structure interaction and conjugate heat transfer simulations.
PP-Speaker is a linux kernel alsa driver (parallel port audio, covox)
// SPDX-License-Identifier: GPL-2.0-or-later PP-Speaker driver for Linux Copyright (C) 2022-2022 ariel/KotCzarny ([email protected]) Small FAQ: Q: What
Reliable PostgreSQL Backup & Restore
pgBackRest Reliable PostgreSQL Backup & Restore Introduction pgBackRest aims to be a reliable, easy-to-use backup and restore solution that can seamle
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.
Rangeless - c++ LINQ -like library of higher-order functions for data manipulation
rangeless::fn range-free LINQ-like library of higher-order functions for manipulation of containers and lazy input-sequences. Documentation What it's
ParallelComputingPlayground - Shows different programming techniques for parallel computing on CPU and GPU
ParallelComputingPlayground Shows different programming techniques for parallel computing on CPU and GPU. Purpose The idea here is to compute a Mandel
Parallel-hashmap - A family of header-only, very fast and memory-friendly hashmap and btree containers.
The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map
Parallel-util - Simple header-only implementation of "parallel for" and "parallel map" for C++11
parallel-util A single-header implementation of parallel_for, parallel_map, and parallel_exec using C++11. This library is based on multi-threading on
EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.
Support development of enkiTS through Github Sponsors or Patreon enkiTS Master branch Dev branch enki Task Scheduler A permissively licensed C and C++
Thrust - The C++ parallel algorithms library.
Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac
Cpp-taskflow - Modern C++ Parallel Task Programming Library
Cpp-Taskflow A fast C++ header-only library to help you quickly write parallel programs with complex task dependencies Why Cpp-Taskflow? Cpp-Taskflow
Partr - Parallel Tasks Runtime
Parallel Tasks Runtime A parallel task execution runtime that uses parallel depth-first (PDF) scheduling [1]. [1] Shimin Chen, Phillip B. Gibbons, Mic
Nelson - Nelson numerical interpreter
Nelson is an array programming language providing a powerful open computing environment for engineering and scientific applications using modern C/C++
SuanPan - 🧮 An Open Source, Parallel and Heterogeneous Finite Element Analysis Framework
suanPan Introduction 🧮 suanPan is a finite element method (FEM) simulation platform for applications in fields such as solid mechanics and civil/stru
Blitz++ is a C++ template class library which provides array objects for scientific computing
Blitz++ is a C++ template class library which provides array objects for scientific computing
Adorad - Fast, Expressive, & High-Performance Programming Language for those who dare
The Adorad Language Adorad | Documentation | Contributing | Compiler design Key Features of Adorad Simplicity: the language can be learned in less tha
RXMesh - A GPU Mesh Data Structure - SIGGRAPH 2021
RXMesh About RXMesh is a surface triangle mesh data structure and programming model for processing static meshes on the GPU. RXMesh aims at provides a
The problem consists in determining all shortest paths between pairs of nodes in a given graph.
All-Pairs-Shortest-Path-Problem-Parallel-Computing The problem consists in determining all shortest paths between pairs of nodes in a given graph. Exe
Parallel algorithms (quick-sort, merge-sort , enumeration-sort) implemented by p-threads and CUDA
程序运行方式 一、编译程序,进入sort-project(cuda-sort-project),输入命令行 make 程序即可自动编译为可以执行文件sort(cudaSort)。 二、运行可执行程序,输入命令行 ./sort 或 ./cudaSort 三、删除程序 make clean 四、指定线程
4eisa40 GPU computing : exploiting the GPU to execute advanced simulations
GPU-computing 4eisa40 GPU computing : exploiting the GPU to execute advanced simulations Activities Parallel programming Algorithms Image processing O
SMID, Parallel computing of CNN
Parallel Computing in Deep Reference Network 1. Introduction Deep neural networks are made up of a number of layers of linked nodes, each of which imp
BLAS-like Library Instantiation Software Framework
Contents Introduction Education and Learning What's New What People Are Saying About BLIS Key Features How to Download BLIS Getting Started Example Co
A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems
mpi-histo A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. T
Lua HTTP async client using libcurl (supports multi requests in parallel)
lua-async-http lua-async-http rock, is a new lua rock written in C and based on libcurl. It allow us to make multiple http/https (with client certific
Emergency alert and tracer for realtime high-performance computing app (work in progress, currently supported env is only Linux x86-64).
HPC Emerg Emergency alert and tracer for realtime high-performance computing app (work in progress, currently supported env is only Linux x86-64). Exa
Source code for the TKET quantum compiler, Python bindings and utilities
tket Introduction This repository contains the full source code for tket, a quantum SDK. If you just want to use tket via Python, the easiest way is t
Bruteforce BitCoin Private keys WIF, Minikeys, Passphrases...
Fialka M-125 This is a modified version LostCoins Huge thanks kanhavishva and to all developers whose codes were used in Fialka M-125. Quick start Сon
Parallel programming for everyone.
Tutorial | Examples | Forum Documentation | 简体中文文档 | Contributor Guidelines Overview Taichi (太极) is a parallel programming language for high-performan
This is the git repository for the FFTW library for computing Fourier transforms (version 3.x), maintained by the FFTW authors.
This is the git repository for the FFTW library for computing Fourier transforms (version 3.x), maintained by the FFTW authors.
Matplot++: A C++ Graphics Library for Data Visualization 📊🗾
Matplot++ A C++ Graphics Library for Data Visualization Data visualization can help programmers and scientists identify trends in their data and effic
Easy-to-use Scientific Computing library in/for C++ available for Linux and Windows.
Matrix Table of Contents Installation Development 2.1. Linux 2.2. Windows Benchmarking Testing Quick Start Guide 5.1. Initializers 5.2. Slicing 5.3. P
FEMTIC is a 3-D magnetotelluric inversion code. FEMTIC is made by object-oriented programming with C++.
FEMTIC FEMTIC is a 3-D magnetotelluric inversion code based on the following studies. FEMTIC was made using object-oriented programming with C++. FEMT
Standalone c++ implementation for computing Motif Adjacency Matrices of large directed networks, for 3-node graphlets and 4-node graphletsa containing a 4 edge loop.
Building Motif Adjacency Matrices This is an efficient C++ software for building Motif Adjacency Matrices (MAM) of networks, for a range of motifs/gra
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI
High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结
Parallel library for approximate inference on discrete Bayesian networks
baylib C++ library Baylib is a parallel inference library for discrete Bayesian networks supporting approximate inference algorithms both in CPU and G
Boki: Stateful Serverless Computing with Shared Logs [SOSP '21]
Boki Boki is a research FaaS runtime for stateful serverless computing with shared logs. Boki exports the shared log API to serverless functions, allo
Fetch FreeBSD ports with parallel connection support and connection pipelining.
Parfetch Fetch FreeBSD ports with parallel connection support and connection pipelining. 🔥 This is an experiment. Use at your own risk. This is a glu
Vulkan RDP plugin for standalone Mupen64Plus
mupen64plus-video-parallel Implementation of Themaister's Vulkan RDP emulator over OGL 3.3. Disclaimer Do not expect any support/help. Pull requests w
EDSL for PDE solver composing
HomePage | Document Overview OpFlow (运筹) is an embedded domain specific language (EDSL) for partial differential equation (PDE) solver composing. It a
GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units
GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units Welcome to the repository of ESCAL @ UCR's GPTPU project! We aim at demonstrating t
Built a peer-to-peer group based file sharing system where users could share or download files from the groups they belonged to. Supports parallel downloading with multiple file chunks from multiple peers.
Mini-Torrent Built a peer-to-peer group based file sharing system where users could share or download files from the groups they belonged to. Supports
C++-based high-performance parallel environment execution engine for general RL environments.
EnvPool is a highly parallel reinforcement learning environment execution engine which significantly outperforms existing environment executors. With
Playbit System interface defines an OS-like computing platform which can be implemented on a wide range of hosts
PlaySys The Playbit System interface PlaySys defines an OS-like computing platform which can be implemented on a wide range of hosts like Linux, BSD,
SpDB is a data integration tool designed to organize scientific data from different sources under the same namespace according to a global schema and to provide access to them in a unified form (views)
SpDB is a data integration tool designed to organize scientific data from different sources under the same namespace according to a global schema and to provide access to them in a unified form (views). Its main purpose is to provide a unified data access interface for complex scientific computations in order to enable the interaction and integration between different programs and databases.
Newton fractal in openframeworks, with shaders. (inspired by: 3b1b)
Newton-fractal Newton fractal in openframeworks, with shaders. (inspired by: 3b1b) Formula read more: Newton's method learn more: Newton's Fractal (wh
Course project for Computing Principles
TrojanMap Author: Tianhong Qi This project focuses on using data structures in C++ and implementing various graph algorithms to build a map applicatio
Final Project for Multicore Processors Course at NYU: Parallel Ray Tracing Algorithm
Multicore_ParallelRayTracing Final Project for Multicore Processors Course at NYU: Parallel Ray Tracing Algorithm Team Member: Hanlin He, Yaowei Zong,
An efficient C++17 GPU numerical computing library with Python-like syntax
MatX - Matrix Primitives Library MatX is a modern C++ library for numerical computing on NVIDIA GPUs. Near-native performance can be achieved while us
HashLibPlus is a recommended C++11 hashing library that provides a fluent interface for computing hashes and checksums of strings, files, streams, bytearrays and untyped data to mention but a few.
HashLibPlus HashLibPlus is a recommended C++11 hashing library that provides a fluent interface for computing hashes and checksums of strings, files,
A command line tool for numerically computing Out-of-time-ordered correlations for N=4 supersymmetric Yang-Mills theory and Beta deformed N=4 SYM.
A command line tool to compute OTOC for N=4 supersymmetric Yang–Mills theory This is a command line tool to numerically compute Out-of-time-ordered co
C++ class for creating and computing arbitrary-length integers
BigNumber BigNumber is a C++ class that allows for the creation and computation of arbitrary-length integers. The maximum possible length of a BigNumb
ArrayFire: a general purpose GPU library.
ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs,
A C++17 message passing library based on MPI
MPL - A message passing library MPL is a message passing library written in C++17 based on the Message Passing Interface (MPI) standard. Since the C++
Multi-backend implementation of SYCL for CPUs and GPUs
hipSYCL - a SYCL implementation for CPUs and GPUs hipSYCL is a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing
A C++ GPU Computing Library for OpenCL
Boost.Compute Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API an
QMoM methods for fluid dynamics in C++, C, and OpenACC.
GPU-QBMMlib QMoM methods for fluid dynamics in C++, C, and OpenACC. Agenda Add more test cases from Marchisio Debug higher-dimensional algorithms Fami
a Living ENsemble Simulator -- a lens to help you watch biophysics
aLENS (a Living ENsemble Simulator) The motivation, algorithm and examples are discussed in this paper: aLENS: towards the cellular-scale simulation o
C++ Parallel Computing and Asynchronous Networking Engine
As Sogou`s C++ server engine, Sogou C++ Workflow supports almost all back-end C++ online services of Sogou, including all search services, cloud input method,online advertisements, etc., handling more than 10 billion requests every day
CComp: A Parallel Compression Algorithm for Compressed Word Search
The goal of CComp is to achieve better compressed search times while achieving the same compression-decompression speed as other parallel compression algorithms. CComp achieves this by splitting both the word dictionaries and the input stream, processing them in parallel.
A family of header-only, very fast and memory-friendly hashmap and btree containers.
The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map
A General-purpose Parallel and Heterogeneous Task Programming System
Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an
C++ implementation of the Python Numpy library
NumCpp: A Templatized Header Only C++ Implementation of the Python NumPy Library Author: David Pilger [email protected] Version: License Testing C++
C++ Mathematical Expression Parsing And Evaluation Library
C++ Mathematical Expression Toolkit Library Documentation Section 00 - Introduction Section 01 - Capabilities Section 02 - Example Expressions
C++ library for solving large sparse linear systems with algebraic multigrid method
AMGCL AMGCL is a header-only C++ library for solving large sparse linear systems with algebraic multigrid (AMG) method. AMG is one of the most effecti
Performance Evaluation of a Parallel Image Enhancement Technique for Dark Images on Multithreaded CPU and GPU Architectures
Performance Evaluation of a Parallel Image Enhancement Technique for Dark Images on Multithreaded CPU and GPU Architectures Image processing is a rese
Yet Another Concurrency Library
YACLib YACLib (Yet Another Concurrency Library) is a C++ library for concurrent tasks execution. Documentation Install guide About dependencies Target
A C++17 thread pool for high-performance scientific computing.
We present a modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any dependencies other than the C++17 standard library, thus allowing a great degree of portability
AREG IoT SDK (or AREG SDK) is a real-time asynchronous communication framework written in C++ for embedded development that enables thin servers run on connected Things and provide device specific services at the edge of IoT network.
AREG IoT SDK to simplify multitasking programming Bring your product to live service enabled Introduction AREG IoT SDK (or AREG SDK) is an Object Remo
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
Pseudofermion functional renormalization group solver for (frustrated) quantum magnets in two and three spatial dimensions.
SpinParser SpinParser ("Spin Pseudofermion Algorithms for Research on Spin Ensembles via Renormalization") is a software platform to perform pseudofer
WasmEdge Runtime is a high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications.
WasmEdge Runtime is a high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications.
Supplemental source code for "A Halfedge Refinement Rule for Parallel Catmull-Clark Subdivision".
This repository provides source code to reproduce some of the results of my paper "A Halfedge Refinement Rule for Parallel Catmull-Clark Subdivision". The key contribution of this paper is to provide super simple algorithms to compute Catmull-Clark subdivision in parallel with support for semi-sharp creases. The algorithms are compiled in the C header-only library CatmullClark.h. In addition you will find a direct GLSL port of these algorithms in the glsl/ folder. For various usage examples, see the examples/ folder.
Dorylus: Affordable, Scalable, and Accurate GNN Training
Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads This is Dorylus, a Scalable, Resource-eff
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.
Shared-Memory Parallel Graph Partitioning for Large K
KaMinPar The graph partitioning software KaMinPar -- Karlsruhe Minimal Graph Partitioning. KaMinPar is a shared-memory parallel tool to heuristically
A webserver hosting a bank system for Minecraft, able to be used from web browser or from CC/OC if you're playing modded.
CCash A webserver hosting a bank system for Minecraft, able to be used from web browser or from CC/OC if you're playing modded. Description the curren
a language for fast, portable data-parallel computation
Halide Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines. Halid
C++ Parallel Computing and Asynchronous Networking Engine
As Sogou`s C++ server engine, Sogou C++ Workflow supports almost all back-end C++ online services of Sogou, including all search services, cloud input method,online advertisements, etc., handling more than 10 billion requests every day. This is an enterprise-level programming engine in light and elegant design which can satisfy most C++ back-end development requirements.
monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture
monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture monolish is a linear equation solver library that monolithically fuses va
Fidelius - YeeZ Privacy Computing
Fidelius - YeeZ Privacy Computing Introduction In order to empower data collaboration between enterprises and help enterprises use data to enhance the
C++ implementation of the Python Numpy library
NumCpp: A Templatized Header Only C++ Implementation of the Python NumPy Library
Material for the UIBK Parallel Programming Lab (2021)
UIBK PS Parallel Systems (703078, 2021) This repository contains material required to complete exercises for the Parallel Programming lab in the 2021
Fast parallel CTC.
In Chinese 中文版 warp-ctc A fast parallel implementation of CTC, on both CPU and GPU. Introduction Connectionist Temporal Classification is a loss funct
ParaMonte: Plain Powerful Parallel Monte Carlo and MCMC Library for Python, MATLAB, Fortran, C++, C.
Overview | Installation | Dependencies | Parallelism | Examples | Acknowledgments | License | Authors ParaMonte: Plain Powerful Parallel Monte Carlo L
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree