60 Resources
C/C++ cuda Libraries
A CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD).
Velvet Velvet is a CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD). Why another cloth simulator? There are a
We implemented our own sequential version of GA, PSO, SA and ACA using C++ and the parallelized version with CUDA support
We implemented our own sequential version of GA, PSO, SA and ACA using C++ (some using Eigen3 as matrix operation backend) and the parallelized version with CUDA support. All of them are much faster than the popular lib scikit-opt.
Making it easier to work with shaders
Slang Slang is a shading language that makes it easier to build and maintain large shader codebases in a modular and extensible fashion, while also ma
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.
dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas
WifSolverCuda - Tool for solving misspelled or damaged Bitcoin Private Key in Wallet Import Format (WIF)
WifSolverCuda Tool for solving misspelled or damaged Bitcoin Private Key in Wallet Import Format (WIF) Usage: WifSolverCuda [-d deviceId] [-b NbBlocks
FoxRaycaster, optimized, fixed and with a CUDA option
Like FoxRaycaster(link) but with a nicer GUI, bug fixes, more optimized and with CUDA. Used in project: Code from FoxRaycaster, which was based on thi
physically based path tracer on gpu
GPUPathtracer physically based path tracer on gpu 特点 积分器(ambient occlusion, path tracing, light tracing, volumetric path tracing, bidirectional path t
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.
Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac
Thrust - The C++ parallel algorithms library.
Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac
PointPillars MultiHead 40FPS - A REAL-TIME 3D detection network [Pointpillars] compiled by CUDA/TensorRT/C++.
English | 简体中文 PointPillars High performance version of 3D object detection network -PointPillars, which can achieve the real-time processing (less th
Adorad - Fast, Expressive, & High-Performance Programming Language for those who dare
The Adorad Language Adorad | Documentation | Contributing | Compiler design Key Features of Adorad Simplicity: the language can be learned in less tha
RXMesh - A GPU Mesh Data Structure - SIGGRAPH 2021
RXMesh About RXMesh is a surface triangle mesh data structure and programming model for processing static meshes on the GPU. RXMesh aims at provides a
Parallel algorithms (quick-sort, merge-sort , enumeration-sort) implemented by p-threads and CUDA
程序运行方式 一、编译程序,进入sort-project(cuda-sort-project),输入命令行 make 程序即可自动编译为可以执行文件sort(cudaSort)。 二、运行可执行程序,输入命令行 ./sort 或 ./cudaSort 三、删除程序 make clean 四、指定线程
PlenOctree Volume Rendering (supports CUDA & fragment shader backends)
PlenOctree Volume Rendering This is a real-time PlenOctree volume renderer written in C++ using OpenGL, constituting part of the code release for: Ple
A C++-based, cross platform ray tracing library
Visionaray A C++ based, cross platform ray tracing library Getting Visionaray The Visionaray git repository can be cloned using the following commands
Raytracer implemented with CPU and GPU using CUDA
Raytracer This is a training project aimed at learning ray tracing algorithm and practicing convert sequential CPU code into a parallelized GPU code u
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
PSTensor : Custimized a Tensor Data Structure Compatible with PyTorch and TensorFlow. You may need this software in the following cases. Manage memory
dvr scanner rewritten in c++.
Dvr-Scanner-CUDA dvr scanner rewritten in c++. FOR WIN32! not linux, yet. this program REQUIRES you have the nvidia cuda toolkit/drivers AND nvcuvid.
CUDA Custom Buffers and example blocks
gr-cuda CUDA Support for GNU Radio using the custom buffer changes introduced in GR 3.10. Custom buffers for CUDA-enabled hardware are provided that c
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.
Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode
CUDA-accelerated Apriltag detection and pose estimation.
Isaac ROS Apriltag Overview This ROS2 node uses the NVIDIA GPU-accelerated AprilTags library to detect AprilTags in images and publishes their poses,
Brute Force Bitcoin Private keys, Public keys
Rotor-Cuda This is a modified version of KeyHunt v1.7 by kanhavishva. A lot of gratitude to all the developers whose codes has been used here. Feature
ArrayFire: a general purpose GPU library.
ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs,
Multi-backend implementation of SYCL for CPUs and GPUs
hipSYCL - a SYCL implementation for CPUs and GPUs hipSYCL is a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing
Cooperative primitives for CUDA C++.
CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model
A General-purpose Parallel and Heterogeneous Task Programming System
Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an
C++ library for solving large sparse linear systems with algebraic multigrid method
AMGCL AMGCL is a header-only C++ library for solving large sparse linear systems with algebraic multigrid (AMG) method. AMG is one of the most effecti
A easy-to-use image processing library accelerated with CUDA on GPU.
gpucv Have you used OpenCV on your CPU, and wanted to run it on GPU. Did you try installing OpenCV and get frustrated with its installation. Fret not
HIPIFY: Convert CUDA to Portable C++ Code
Tools to translate CUDA source code into portable HIP C++ automatically
Software ray tracer written from scratch in C that can run on CPU or GPU with emphasis on ease of use and trivial setup
A minimalist and platform-agnostic interactive/real-time raytracer. Strong emphasis on simplicity, ease of use and almost no setup to get started with
ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.
ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.
LightSeq: A High Performance Library for Sequence Processing and Generation
LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, Sentiment Analysis, and other related tasks with sequence data.
TensorRT for Scaled YOLOv4(yolov4-csp.cfg)
TensoRT Scaled YOLOv4 TensorRT for Scaled YOLOv4(yolov4-csp.cfg) 很多人都写过TensorRT版本的yolo了,我也来写一个。 测试环境 ubuntu 18.04 pytorch 1.7.1 jetpack 4.4 CUDA 11.0
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.
BM3D denoising filter for VapourSynth, implemented in CUDA
VapourSynth-BM3DCUDA Copyright© 2021 WolframRhodium BM3D denoising filter for VapourSynth, implemented in CUDA Description Please check VapourSynth-BM
Tiny CUDA Neural Networks
This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast "fully fused" multi-layer perceptron as well as support for various advanced input encodings, losses, and optimizers.
Nvidia contributed CUDA tutorial for Numba
This is an adapted version of one delivered internally at NVIDIA - its primary audience is those who are familiar with CUDA C/C++ programming, but perhaps less so with Python and its ecosystem.
A CUDA implementation of Lattice Boltzmann for fluid dynamics simulation
Lattice Boltzmann simulation I am conscious of being only an individual struggling weakly against the stream of time. But it still remains in my power
monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture
monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture monolish is a linear equation solver library that monolithically fuses va
Fast, differentiable sorting and ranking in PyTorch
Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)
hashcat is the world's fastest and most advanced password recovery utility
hashcat is the world's fastest and most advanced password recovery utility, supporting five unique modes of attack for over 300 highly-optimized hashing algorithms. hashcat currently supports CPUs, GPUs, and other hardware accelerators on Linux, Windows, and macOS, and has facilities to help enable distributed password cracking.
Ethereum miner with OpenCL, CUDA and stratum support
Ethminer is an Ethash GPU mining worker: with ethminer you can mine every coin which relies on an Ethash Proof of Work thus including Ethereum, Ethereum Classic, Metaverse, Musicoin, Ellaism, Pirl, Expanse and others. This is the actively maintained version of ethminer. It originates from cpp-ethereum project (where GPU mining has been discontinued) and builds on the improvements made in Genoil's fork. See FAQ for more details.
A library for high performance deep learning inference on NVIDIA GPUs.
Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV
Forward - A library for high performance deep learning inference on NVIDIA GPUs
a library for high performance deep learning inference on NVIDIA GPUs.
libcu++: The C++ Standard Library for Your Entire System
libcu++, the NVIDIA C++ Standard Library, is the C++ Standard Library for your entire system. It provides a heterogeneous implementation of the C++ Standard Library that can be used in and between CPU and GPU code.
Open3D: A Modern Library for 3D Data Processing
Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. We welcome contributions from the open-source community.
ThunderSVM: A Fast SVM Library on GPUs and CPUs
What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss
ThunderGBM: Fast GBDTs and Random Forests on GPUs
Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV
PyTorchTOP This project demonstrates how to use OpenCV with CUDA modules and PyTorch/LibTorch in a TouchDesigner Custom Operator. Building this projec
GPU Cloth TOP in TouchDesigner using CUDA-enabled NVIDIA Flex
This project demonstrates how to use NVIDIA FleX for GPU cloth simulation in a TouchDesigner Custom Operator. It also shows how to render dynamic meshes from the texture data using custom PBR GLSL material shaders inside TouchDesigner.
C++ library for solving large sparse linear systems with algebraic multigrid method
AMGCL AMGCL is a header-only C++ library for solving large sparse linear systems with algebraic multigrid (AMG) method. AMG is one of the most effecti
Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Remotery A realtime CPU/GPU profiler hosted in a single C file with a viewer that runs in a web browser. Supported Platforms: Windows Windows UWP (Hol
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
VexCL VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to redu
stdgpu: Efficient STL-like Data Structures on the GPU
stdgpu: Efficient STL-like Data Structures on the GPU Features | Examples | Documentation | Building | Integration | Contributing | License | Contact
Thin C++-flavored wrappers for the CUDA Runtime API
cuda-api-wrappers: Thin C++-flavored wrappers for the CUDA runtime API Branch Build Status: Master | Development: nVIDIA's Runtime API for CUDA is int
A General-purpose Parallel and Heterogeneous Task Programming System
Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a
ArrayFire: a general purpose GPU library.
ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i
kaldi-asr/kaldi is the official location of the Kaldi project.
Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux
A GPU (CUDA) based Artificial Neural Network library
Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart