Thrust is a C++ parallel programming library which resembles the C++ Standard Library.

Thrust: Code at the speed of light

Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software. Develop high-performance applications rapidly with Thrust!

Thrust is included in the NVIDIA HPC SDK and the CUDA Toolkit.

Quick Start

Getting the Thrust Source Code

The CUDA Toolkit provides a recent release of the Thrust source code in include/thrust. This will be suitable for most users.

Users that wish to contribute to Thrust or try out newer features should recursively clone the Thrust Github repository:

git clone --recursive https://github.com/NVIDIA/thrust.git

Using Thrust From Your Project

Thrust is a header-only library; there is no need to build or install the project unless you want to run the Thrust unit tests.

For CMake-based projects, we provide a CMake package for use with find_package. See the CMake README for more information. Thrust can also be added via add_subdirectory or tools like the CMake Package Manager.

For non-CMake projects, compile with:

  • The Thrust include path (-I )
  • The CUB include path, if using the CUDA device system (-I /dependencies/cub/ )
  • By default, the CPP host system and CUDA device system are used. These can be changed using compiler definitions:
    • -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_XXX, where XXX is CPP (serial, default), OMP (OpenMP), or TBB (Intel TBB)
    • -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_XXX, where XXX is CPP, OMP, TBB, or CUDA (default).

Examples

Thrust is best explained through examples. The following source code generates random numbers serially and then transfers them to a parallel device where they are sorted.

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <algorithm>
#include <cstdlib>

int main(void)
{
  // generate 32M random numbers serially
  thrust::host_vector<int> h_vec(32 << 20);
  std::generate(h_vec.begin(), h_vec.end(), rand);

  // transfer data to the device
  thrust::device_vector<int> d_vec = h_vec;

  // sort data on the device (846M keys per second on GeForce GTX 480)
  thrust::sort(d_vec.begin(), d_vec.end());

  // transfer data back to host
  thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());

  return 0;
}

This code sample computes the sum of 100 random numbers in parallel:

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <algorithm>
#include <cstdlib>

int main(void)
{
  // generate random data serially
  thrust::host_vector<int> h_vec(100);
  std::generate(h_vec.begin(), h_vec.end(), rand);

  // transfer to device and compute sum
  thrust::device_vector<int> d_vec = h_vec;
  int x = thrust::reduce(d_vec.begin(), d_vec.end(), 0, thrust::plus<int>());
  return 0;
}

Additional usage examples can be found in the examples/ and testing/ directories of the Github repo.

Documentation Resources

CI Status

Supported Compilers

Thrust is regularly tested using the specified versions of the following compilers. Unsupported versions may emit deprecation warnings, which can be silenced by defining THRUST_IGNORE_DEPRECATED_COMPILER during compilation.

  • NVCC 11.0+
  • NVC++ 20.9+
  • GCC 5+
  • Clang 7+
  • MSVC 2019+ (19.20/16.0/14.20)

Releases

Thrust is distributed with the NVIDIA HPC SDK and the CUDA Toolkit in addition to GitHub.

See the changelog for details about specific releases.

Thrust Release Included In
1.15.0 TBD
1.14.0 NVIDIA HPC SDK 21.9
1.13.1 CUDA Toolkit 11.5
1.13.0 NVIDIA HPC SDK 21.7
1.12.1 CUDA Toolkit 11.4
1.12.0 NVIDIA HPC SDK 21.3
1.11.0 CUDA Toolkit 11.3
1.10.0 NVIDIA HPC SDK 20.9 & CUDA Toolkit 11.2
1.9.10-1 NVIDIA HPC SDK 20.7 & CUDA Toolkit 11.1
1.9.10 NVIDIA HPC SDK 20.5
1.9.9 CUDA Toolkit 11.0
1.9.8-1 NVIDIA HPC SDK 20.3
1.9.8 CUDA Toolkit 11.0 Early Access
1.9.7-1 CUDA Toolkit 10.2 for Tegra
1.9.7 CUDA Toolkit 10.2
1.9.6-1 NVIDIA HPC SDK 20.3
1.9.6 CUDA Toolkit 10.1 Update 2
1.9.5 CUDA Toolkit 10.1 Update 1
1.9.4 CUDA Toolkit 10.1
1.9.3 CUDA Toolkit 10.0
1.9.2 CUDA Toolkit 9.2
1.9.1-2 CUDA Toolkit 9.1
1.9.0-5 CUDA Toolkit 9.0
1.8.3 CUDA Toolkit 8.0
1.8.2 CUDA Toolkit 7.5
1.8.1 CUDA Toolkit 7.0
1.8.0
1.7.2 CUDA Toolkit 6.5
1.7.1 CUDA Toolkit 6.0
1.7.0 CUDA Toolkit 5.5
1.6.0
1.5.3 CUDA Toolkit 5.0
1.5.2 CUDA Toolkit 4.2
1.5.1 CUDA Toolkit 4.1
1.5.0
1.4.0 CUDA Toolkit 4.0
1.3.0
1.2.1
1.2.0
1.1.1
1.1.0
1.0.0

Development Process

Thrust uses the CMake build system to build unit tests, examples, and header tests. To build Thrust as a developer, the following recipe should be followed:

# Clone Thrust and CUB repos recursively:
git clone --recursive https://github.com/NVIDIA/thrust.git
cd thrust

# Create build directory:
mkdir build
cd build

# Configure -- use one of the following:
cmake ..   # Command line interface.
ccmake ..  # ncurses GUI (Linux only)
cmake-gui  # Graphical UI, set source/build directories in the app

# Build:
cmake --build . -j 
   
       # invokes make (or ninja, etc)

# Run tests and examples:
ctest

   

By default, a serial CPP host system, CUDA accelerated device system, and C++14 standard are used. This can be changed in CMake. More information on configuring your Thrust build and creating a pull request can be found in CONTRIBUTING.md.

Owner
Comments
  • Thrust equivalent to std::complex

    Thrust equivalent to std::complex

    I've put all the code inside thrust/detail/complex. I've also create the unittests and documentation.

    I've ported FreeBSDs c99 complex implementation, as it seems to be the highest quality available. All the functions, except for pow, are accurate to within a few ULPs.

    Complex atan() and atanh() require C++11 due to the lack of real atanh() in previous versions.

    I've tested with g++ and clang++ but I didn't have the opportunity to try with msvc as I don't have access to it.

  • Intermittent compilation failures with thrust, cuda 10.2 and MSVC 2019

    Intermittent compilation failures with thrust, cuda 10.2 and MSVC 2019

    We experience intermittent compilation failure on our CI server. The CXX compiler identification is MSVC 19.25.28612.0. The CUDA compiler identification is NVIDIA 10.2.89.

    Retrying the compilation typically succeeds. Our CI server now retries compiling the project up to 5 times to avoid this issue. (The issue has never occurred 5 times in a row yet.)

    The error looks as follows

    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2993: 'T': illegal type for non-type template parameter '__formal'
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): note: see reference to class template instantiation 'thrust::detail::allocator_traits_detail::has_value_type<T>' being compiled
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2065: 'U1': undeclared identifier
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2923: 'std::_Select<__formal>::_Apply': 'U1' is not a valid template type argument for parameter '<unnamed-symbol>'
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2144: syntax error: 'unknown-type' should be preceded by ')'
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2144: syntax error: 'unknown-type' should be preceded by ';'
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2238: unexpected token(s) preceding ';'
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2059: syntax error: ')'
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2988: unrecognizable template declaration/definition
    [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2059: syntax error: '<end Parse>'
    

    This affects at least two projects

    • https://github.com/apache/incubator-mxnet/issues/17935
    • https://github.com/pytorch/pytorch/issues/25393
  • Add gpuCI GPU testing, automatic GPU detection, and support for more compilers

    Add gpuCI GPU testing, automatic GPU detection, and support for more compilers

    Add gpuCI GPU testing, automatic GPU detection, and support for more compilers:

    • CMake: Add support for detecting the compute archs of the GPUs in your system at configure time.
    • gpuCI: Add a GPU node configuration that builds and tests as little as possible.
    • gpuCI: Cleanup logic for different build and test configurations.
    • gpuCI: Fix an unfortunate typo in determine_build_parallelism.bash which led to the parallelism level not being set.
    • gpuCI: Add support for NVC++.
    • gpuCI: Update to CUDA 11.1 and Ubuntu 20.04.
    • gpuCI: Add NVC++ and ICC configurations to the CPU axis file.
    • gpuCI: Add a GPU axis file.
    • gpuCI: Increase the desired memory per build thread to 4GB.
    • gpuCI: Add a -j switch which controls build parallelism to ci/local/build.bash.
  • OpenCL support

    OpenCL support

    While CUDA is good for Nvidia cards, the fact is that only half of the userbase can use CUDA. OpenCL support would allow for Thrust to be useful for people with AMD cards, as well as embedded Android platforms which are now getting OpenCL support.

    The other advantage from implementing OpenCL would be that OpenGL 4.x has interop with OpenCL, which would allow for more possibilities regarding cross platform support.

    Source on Android getting OpenCL: http://www.androidcentral.com/nexus-4-and-nexus-10-found-have-opencl-drivers

  • Add transform_input_output_iterator

    Add transform_input_output_iterator

    Adds a variant of transform iterator adapter that works as both an input iterator and an output iterator. The given input function is applied after reading from the wrapped iterator while the output function is applied before writing to the wrapped iterator. The implementation is largely based on transform_output_iterator, with additional operators added to the proxy reference.

    Also fixes some typos in transform_output_iterator.

  • Support adding Thrust to CMake projects with `add_subdirectory`

    Support adding Thrust to CMake projects with `add_subdirectory`

    I have been using the github thrust for a while before the recent merge with cuda one. I manage dependency using cmake fetchcontent. After merging with the recent update in this repo, which includes a CMakeLists.txt, I found that it breaks command like add_subdirectory() because of the following error:

     MSVC_COMPILER_FLAGS:                                                                                                                                           
     | WARN_ALL : '/Wall'                                                                                                                                          
     | WARNINGS_AS_ERRORS : '/Wx'                                                                                                                                  
     | RELEASE : '/Ox'                                                                                                                                             
     | DEBUG : '/Zi;-D_DEBUG;/MTd'                                                                                                                                 
     | EXCEPTION_HANDLING : '/EHsc'                                                                                                                                
     | CPP : ''                                                                                                                                                    
     | OMP : '/openmp'                                                                                                                                             
     | TBB : ''                                                                                                                                                    
     | CUDA : ''                                                                                                                                                   
     | CUDA_BULK : ''                                                                                                                                              
     | WORKAROUNDS : '/DNOMINMAX;/wd4503'                                                                                                                          
     | C++03 : ''
     | C++11 : '-std=c++11'
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - found
    -- Found Threads: TRUE
    -- Found CUDA: /net/software/modules-sw/cuda/10.1/Linux/RHEL6/x86_64 (found version "10.1")                                                           
    -- Found OpenMP_CXX: -fopenmp (found version "4.5")
    -- Found OpenMP: TRUE (found version "4.5")
    -- Found 49 examples
    -- Found 5 examples/cuda
    -- Found 4 examples/cpp_integration
    -- Found 152 tests in testing
    -- Found 59 tests in backend
    CMake Error at build_cuda/_deps/thrust-src/testing/CMakeLists.txt:48 (add_custom_target):                                                                     
      add_custom_target cannot create target "check" because another target with
      the same name already exists.  The existing target is a custom target
      created in source directory
      "/home/aznb/mycodes/SCgenome_scbmc/build_cuda/_deps/kokkos-src".                                                                         
      See documentation for policy CMP0002 for more details.
    

    Aside from having name conflicts via add_custom_target, I was expecting using thrust as a header-only library and I don't expect cmake to config building any of the test targets unless I want it so.

  • Uninitialized __global__ memory in thrust::sort (cub::RadixSort) - incorrect results/segfaults in thrust::sort, thrust::remove_if, etc.

    Uninitialized __global__ memory in thrust::sort (cub::RadixSort) - incorrect results/segfaults in thrust::sort, thrust::remove_if, etc.

    We have been getting weird errors in thrust functions sort_by_key, sort and remove_if in our custom code or in third-party code such as flann (kdtree on cuda) and MinkowskiEngine (pytorch custom lib). After a thorough investigation, we discovered that the mentioned functions sometimes randomly produce wrong results (sorted vectors contain values that were not in the original vectors, remove_if does not remove elements matching a condition, etc). Firstly, we thought the issues are related to pytorch, as they occurred when we linked pytorch lib, but afterward, we were able to produce a minimal example with errors even without any pytorch stuff. Also the errors seem to randomly appear or disappear when a line of code is added/removed or a library (eg. pytorch) is linked (but not used). I suppose this suggests there is some problem related to a physical address of the code/data.

    We tested our binaries with compute-sanitizer --tool initcheck and in cases when thrust::sort or thrust::remove_if returned corrupted results we got e.g. Uninitialized __global__ memory read of size 4 bytes... errors. As mentioned above, when we removed/added some code/library that did not affect the actual computation the results were miraculously fixed but compute-sanitizer --tool initcheck still returned the error. Therefore it seems sometimes the uninitialized memory actually contains the value it should be initialized with and everything runs okay-ish.

    We tested many versions of the example (bellow) as well as many versions of our internal code on at least:

    • nvidia devel ubuntu18.04 and ubuntu20.04 docker images with cuda 10.1, 10.2, 11.0, 11.1, 11.2
    • on ubuntu20.04 and arch linux distributions
    • with default thrust/cub and the latest thrust/cub (thrust version 1.12)

    The issues were present in every setup with slight variations - e.g. changing cuda seemed to fix the issue but adding an independent line of code broke the code again.

    We tested this particular example also on Windows and it seems it is the only place where the code runs without Uninitialized __global__ memory warning. But due to compilation difficulties, we were not able to compile our other programs with the same issue and test them yet.


    To reproduce one of the issues, create main.cu, Dockerfile and CMakeLists.txt (file contents below) and run the following commands:

    docker build -t test-docker-image .
    docker run -it --gpus 1 -v $(pwd):/xxx -w /xxx test-docker-image bash
    mkdir build
    cd build
    cmake ..
    make
    compute-sanitizer --tool initcheck bug_test
    

    You should get the following output:

    RUN 0, NUM 128, dev_ptr 0x7fe1c5800000: OK! 
    RUN 0, NUM 256, dev_ptr 0x7fe1c5800000: OK!
    ...
    ========= Uninitialized __global__ memory read of size 4 bytes
    =========     at 0x5b8 in void cub::DeviceRadixSortOnesweepKernel<cub::DeviceRadixSortPolicy<int,cub::NullType,int>::Policy800,bool=0,int,cub::NullType,int,int>(int*,int,bool=0*,int* const *,int*,int* const * const *,cub::DeviceRadixSortPolicy<int,cub::NullType,int>::Policy800*,int* const * const * const *,int*,int,int)
    =========     by thread (214,0,0) in block (0,0,0)
    =========     Address 0x7f13cdc09dd8
    =========     Saved host backtrace up to driver entry point at kernel launch time
    =========     Host Frame:cuLaunchKernel [0x7f1402c1ba6e]
    =========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
    =========     Host Frame: [0x7f1415e1862b]
    =========                in /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudart.so.11.0
    =========     Host Frame:cudaLaunchKernel [0x7f1415e585b1]
    =========                in /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudart.so.11.0
    ...
    

    When pytorch libs and a specific version of thust is linked we also get Host and device vector doesn't match! aside from the Uninitialized __global__ memory warning. Sometimes, in different setups, we got Uninitialized __global__ memory read of size 1 bytes ... or Floating point exception (core dumped).

    Also, we got the uninitialized memory warning when calling thrust::remove_if in one place of our code. Similarily to the thrust::sort the warning occurred when the outcome of the function was incorrect but it also occurred when the outcome was (probably by chance) correct:

    ========= Uninitialized __global__ memory read of size 4 bytes
    =========     at 0x1d68 in void thrust::cuda_cub::core::_kernel_agent<thrust::cuda_cub::__copy_if::CopyIfAgent<thrust::zip_iterator<thrust::tuple<unsigned int*,unsigned int*,unsigned int*,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::cuda_cub::__copy_if::no_stencil_tag_*,thrust::zip_iterator<thrust::tuple<unsigned int*,unsigned int*,unsigned int*,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::detail::unary_negate<minkowski::detail::is_first<unsigned int>>,int,int*>,thrust::zip_iterator<thrust::tuple<unsigned int*,unsigned int*,unsigned int*,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::cuda_cub::__copy_if::no_stencil_tag_*,thrust::zip_iterator<thrust::tuple<unsigned int*,unsigned int*,unsigned int*,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type>>,thrust::detail::unary_negate<minkowski::detail::is_first<unsigned int>>,int,int*,cub::ScanTileState<int,bool=1>,unsigned long>(unsigned int*,unsigned int*,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type,thrust::null_type)
    =========     by thread (30,0,0) in block (0,0,0)
    =========     Address 0x7fa75a017170
    =========     Saved host backtrace up to driver entry point at kernel launch time
    =========     Host Frame:cuLaunchKernel [0x7fa78effea6e]
    =========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
    =========     Host Frame: [0x7fa7fb2ec62b]
    =========                in /usr/local/cuda/lib64/libcudart.so.11.0
    =========     Host Frame:cudaLaunchKernel [0x7fa7fb32c5b1]
    =========                in /usr/local/cuda/lib64/libcudart.so.11.0
    

    Also a similar example of probably the same problem was mentioned by us in thrust issue https://github.com/NVIDIA/thrust/issues/1341#issuecomment-791642454 and pytorch issue https://github.com/pytorch/pytorch/issues/52663.


    The files:

    main.cu

    #include <iostream>
    #include <thrust/host_vector.h>
    #include <thrust/device_vector.h>
    
    int main() 
    {
        for (size_t NUM = 128; NUM < 32768; NUM+=128) 
        {
            for (int run = 0; run < 1; run++) {
                thrust::host_vector<int> h(NUM);
                thrust::device_vector<int> d(NUM);
                for (int i = 0; i < NUM; i++) {
                    int random_number = rand() * 1000;
                    h[i] = random_number;
                    d[i] = random_number;
                }
                thrust::sort(h.begin(), h.end());
                thrust::sort(d.begin(), d.end());
        
                thrust::host_vector<int> d_host(d.begin(), d.end());
                bool sort_ok = thrust::equal(
                    d_host.begin(), d_host.end() - 1, d_host.begin() + 1,	
                    thrust::less_equal<int>());
                bool match = thrust::equal(d_host.begin(), d_host.end(), h.begin());
    
                std::cout << "RUN " << run << ", NUM " << NUM;
                std::cout << ", dev_ptr " << static_cast<void*>(thrust::raw_pointer_cast(d.data())) << ": ";
                if (sort_ok && match) { std::cout << "OK! "; }
                if (!sort_ok) { std::cout << "Wrong sort! "; }
                if (!sort_ok) { std::cout << "Host and device vector doesn't match! "; }
                std::cout << std::endl;
            }
        }
    
        return 0;
    }
    

    Dockerfile

    FROM nvidia/cuda:11.0-devel-ubuntu20.04
    RUN apt-get update && apt-get install -y wget
    RUN wget -qO- "https://cmake.org/files/v3.17/cmake-3.17.5-Linux-x86_64.tar.gz" | tar --strip-components=1 -xz -C /usr/local
    

    CMakeLists.txt

    cmake_minimum_required(VERSION 3.17.5)
    project(bug_test CUDA CXX)
    add_executable(bug_test main.cu)
    target_compile_options(bug_test PUBLIC $<$<COMPILE_LANGUAGE:CUDA>:-arch compute_XX>)
    target_compile_options(bug_test PUBLIC $<$<COMPILE_LANGUAGE:CUDA>:-code sm_XX>)
    target_compile_features(bug_test PRIVATE cuda_std_14)
    

    I'll gladly provide other examples if necessary. @allisonvacanti

  • NVBug 3262468: Bogus data produced in CUDA 11(.1) by thrust::sort_by_key with custom comparison operator over two (zipped) key arrays

    NVBug 3262468: Bogus data produced in CUDA 11(.1) by thrust::sort_by_key with custom comparison operator over two (zipped) key arrays

    Hello,

    while testing GPUSPH on CUDA 11.1 I discovered that the thrust::sort_by_key procedure that we use to sort the particle indices is now producing bogus data. The sort uses a custom comparator that fetches data from two different arrays (using zip_iterator etc). The sort results in one of the (sorted) key arrays being clobbered with invalid data.

    I am currently in the process of finding a minimal test case, but in the mean time the bug can be observed in action on the cuda11-thrust-sort-bug of GPUSPH, by running

    make DamBreak3D && ./DamBreak3D --maxiter 1 | grep '64656 255 '
    

    from the git working directly. This should produce no results (as it does on CUDA 10) if the sort is correct, but it results in numerous hits (all clobbered entries with bogus values) in CUDA 11.

    Thanks for looking into this,

  • Variadic tuple preparation

    Variadic tuple preparation

    Some simplifications preparing Thrust for a variadic tuple implementation (some day... #524). Other changes would require a bit more coordination and can come separately, assuming these sorts of changes are now mergeable.

    With -DTHRUST_DEVICE_SYSTEM=CPP I get:

    100% tests passed, 0 tests failed out of 151
    
    Total Test time (real) = 108.10 sec
    
  • Updated shuffle implementation to use better hash function

    Updated shuffle implementation to use better hash function

    Fixes https://github.com/thrust/thrust/issues/1256 by changing the hash function from taus88 to use wyhash. Adds test for a random distribution of numbers

  • NVBug 2318871: Compilation failure with `zip_iterator` and `complex<float>` in CUDA 9.2

    NVBug 2318871: Compilation failure with `zip_iterator` and `complex` in CUDA 9.2

    CUDA version: 9.2 Thrust version: 1.9.2 (bundled in CUDA 9.2) GCC version: 5.4.0

    The following code fails to compile for cuda backend, specifically the line "auto zipout_end = ...". It compiles however if we make any of the following changes:

    1. change TYPE to float
    2. target openmp backend.
    #include <thrust/device_vector.h>
    #include <thrust/sequence.h>
    #include <thrust/copy.h>
    #include <thrust/gather.h>
    #include <thrust/iterator/counting_iterator.h>
    #include <thrust/iterator/zip_iterator.h>
    #include <thrust/complex.h>
     
    /* g++ -std=c++11 -I/usr/local/cuda/include -O2 -x c++ -fopenmp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -lgomp minimal.cu
     nvcc -std=c++11 --expt-extended-lambda minimal.cu
    */
     
    // typedef float TYPE;
    typedef thrust::complex<float> TYPE;
     
    int main()
    {
       thrust::device_vector<TYPE> d_vec(10);
       thrust::sequence(d_vec.begin(), d_vec.end());
       thrust::device_vector<TYPE> d_res(10);
     
       auto pred = [] __host__ __device__ (TYPE val) { return abs(val) > 5; };
       auto cntit_begin = thrust::make_counting_iterator(0);
       auto cntit_end = cntit_begin + 10;
     
       // generate indices and values in two calls
       thrust::device_vector<int> indices(10);
       auto indices_end = thrust::copy_if(cntit_begin, cntit_end, d_vec.begin(), indices.begin(), pred);
       thrust::gather(indices.begin(), indices_end, d_vec.begin(), d_res.begin());
     
       // generate indices and values in one call
       auto zipin_begin = thrust::make_zip_iterator(thrust::make_tuple(cntit_begin, d_vec.begin()));
       auto zipin_end = thrust::make_zip_iterator(thrust::make_tuple(cntit_end, d_vec.end()));
       auto zipout_begin = thrust::make_zip_iterator(thrust::make_tuple(indices.begin(), d_res.begin()));
       // the following line fails to compile for combination of cuda backend and complex type
       auto zipout_end = thrust::copy_if(zipin_begin, zipin_end, d_vec.begin(), zipout_begin, pred);
    }
    
  • Remove remnants of `throw()`

    Remove remnants of `throw()`

    the throw() specification has been removed with C++20 and will error out on us.

    So rather than that, simply use noexcept, as C++03 is thankfully a thing of the past

    Fixes nvbug3799847

  • Add a `THRUST_HOST_DEVICE` macro

    Add a `THRUST_HOST_DEVICE` macro

    clang-format is generally confused by our __host__ and __device__ macros, severely messing up formatting.

    We can replace that by a single macro and tell clang-format to not mess with its formatting.

    While we are at it add some other common macros to the list of StatementMacros

  • Install rules: PATTERN thrust-header-search EXCLUDE doesn't behave as expected

    Install rules: PATTERN thrust-header-search EXCLUDE doesn't behave as expected

    The PATTERN rules in CMake only matches complete file names, and doesn't do globbing. So the current install rules fails to exlude anything.

    What we need to do is use REGEX thrust-header-search.* EXCLUDE

  • Better error message for no GPU or incompatible GPU

    Better error message for no GPU or incompatible GPU

    Given this small but useless test program:

    #include <thrust/device_vector.h>
    #include <thrust/sort.h>
    int main() {
      thrust::device_vector<int> dv;
      thrust::sort(dv.begin(), dv.end());
    }
    

    When compiled with nvcc -arch=sm_80 tiny.cu and then run on a system that doesn't have any GPUs, the error message is:

    terminate called after throwing an instance of 'thrust::system::system_error'
      what():  radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
    Aborted
    

    When run a system with a Volta GPU (sm_70), it fails with:

    terminate called after throwing an instance of 'thrust::system::system_error'
      what():  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
    Aborted
    

    I don't expect either of those situations to work. The program should crash. What I would like is a better error message, one that gives a naive user some clue about what is happening. The first case should say something about no GPU being available, and the second should mention something about an incompatible GPU (saying that you are trying to run an sm_80 program on an sm_70 GPU would be awesome).

    Making such a change will improve the user experience and make it easier for users to troubleshoot problems.

jkds is a modern header-only C++20 library that complements the standard library.

jkds is a modern header-only C++20 library that complements the standard library. It provides generic atypical data structures, ergonomic functional programming abstractions, and then some.

May 24, 2022
Bionic BSD-3-ClauseBionic - Google's standard library, developed for Android. BSD-3-Clause

bionic bionic is Android's C library, math library, and dynamic linker. Using bionic as an app developer See the user documentation. Working on bionic

Sep 12, 2022
CloudABI's standard C library

NOTE: This project is unmaintained CloudABI is no longer being maintained. It was an awesome experiment, but it never got enough traction to be sustai

Sep 15, 2022
libcu++: The C++ Standard Library for Your Entire System

libcu++, the NVIDIA C++ Standard Library, is the C++ Standard Library for your entire system. It provides a heterogeneous implementation of the C++ Standard Library that can be used in and between CPU and GPU code.

Sep 19, 2022
EASTL stands for Electronic Arts Standard C++ Template Library

EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.

Sep 20, 2022
An open source standard C library that includes useful functions && (Reimplementation of libc functions + own functions).
An open source standard C library that includes useful functions && (Reimplementation of libc functions + own functions).

?? LIBFT-42 : Artistic view of LIBC: ?? HOW DOES IT FEEL HAVING YOUR OWN LIB: SUBJECT : ENGLISH PDF ℹ️ What is LIBFT : This project aims to code a C l

Sep 21, 2022
Your standard library for metaprogramming

Boost.Hana Your standard library for metaprogramming Overview #include <boost/hana.hpp> #include <cassert> #include <string> namespace hana = boost::h

Sep 22, 2022
Reimplementation of some of the Standard C Library functions.
Reimplementation of some of the Standard C Library functions.

42-libft Reimplementation of some of the Standard C Library functions. This repository contains some of the standard library C functions. List of avai

Jan 26, 2022
STXXL: Standard Template Library for Extra Large Data Sets

STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations

Aug 20, 2022
MSVC's implementation of the C++ Standard Library.

Microsoft's C++ Standard Library This is the official repository for Microsoft's implementation of the C++ Standard Library (also known as the STL), w

Sep 22, 2022
mlibc is a C standard library

mlibc is a C standard library Official Discord server: https://discord.gg/7WB6Ur3 Design of the library Directory Purpose options/ (More or less) OS-i

Sep 16, 2022
A standard conforming C++20 implementation of std::optional.

A standard conforming C++20 implementation of std::optional.

Aug 24, 2022
Library that simplify to find header for class from STL library.

Library that simplify to find header for class from STL library. Instead of searching header for some class you can just include header with the class name.

Jun 7, 2022
D++ Extremely Lightweight C++ Discord Library

D++ An incredibly lightweight C++ Discord library This project is in alpha stages of development. Completed so far: Websocket connection with heartbea

Sep 21, 2022
Single-header header-only C++11 / C++14 / C++17 library for easily managing set of auto-generated type-safe flags.
Single-header header-only C++11 / C++14 / C++17 library for easily managing set of auto-generated type-safe flags.

Single-header header-only C++11 / C++14 / C++17 library for easily managing set of auto-generated type-safe flags. Quick start #include <bitflags/bitf

Aug 24, 2022
expected lite - Expected objects in C++11 and later in a single-file header-only library

expected lite: expected objects for C++11 and later expected lite is a single-file header-only library for objects that either represent a valid value

Sep 19, 2022
Guidelines Support Library

GSL: Guidelines Support Library The Guidelines Support Library (GSL) contains functions and types that are suggested for use by the C++ Core Guideline

Sep 25, 2022
gsl-lite – A single-file header-only version of ISO C++ Guidelines Support Library (GSL) for C++98, C++11, and later

gsl-lite: Guidelines Support Library for C++98, C++11 up metadata build packages try online gsl-lite is an implementation of the C++ Core Guidelines S

Sep 20, 2022
C++11 metaprogramming library

Mp11, a C++11 metaprogramming library Mp11 is a C++11 metaprogramming library based on template aliases and variadic templates. It implements the appr

Aug 21, 2022