Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set Release GitHub license

Travis CI Build Status Appveyor Build Status Codacy Badge Total alerts Language grade: C/C++ Coverage Status

robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map / std::unordered_set which is both faster and more memory efficient for real-world use cases.

Installation & Usage

Direct Inclusion

  1. Add robin_hood.h to your C++ project.
  2. Use robin_hood::unordered_map instead of std::unordered_map
  3. Use robin_hood::unordered_set instead of std::unordered_set

Conan, the C/C++ Package Manager

  1. Setup your CMakeLists.txt (see Conan documentation on how to use MSBuild, Meson and others) like this:
    project(myproject CXX)
    
    add_executable(${PROJECT_NAME} main.cpp)
    
    include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake) # Include Conan-generated file
    conan_basic_setup(TARGETS) # Introduce Conan-generated targets
    
    target_link_libraries(${PROJECT_NAME} CONAN_PKG::robin-hood-hashing)
  2. Create conanfile.txt in your source dir (don't forget to update the version)
    [requires]
    robin-hood-hashing/3.10.0
    
    [generators]
    cmake
  3. Install and run Conan, then build your project as always:
    pip install conan
    mkdir build
    cd build
    conan install ../ --build=missing
    cmake ../
    cmake --build .
    The robin-hood-hashing package in Conan is kept up to date by Conan contributors. If the version is out of date, please create an issue or pull request on the conan-center-index repository.

Benchmarks

Please see extensive benchmarks at Hashmaps Benchmarks. In short: robin_hood is always among the fastest maps and uses far less memory than std::unordered_map.

Design Choices

  • Two memory layouts. Data is either stored in a flat array, or with node indirection. Access for unordered_flat_map is extremely fast due to no indirection, but references to elements are not stable. It also causes allocation spikes when the map resizes, and will need plenty of memory for large objects. Node based map has stable references & pointers (NOT iterators! Similar to std::unordered_map) and uses const Key in the pair. It is a bit slower due to indirection. The choice is yours; you can either use robin_hood::unordered_flat_map or robin_hood::unordered_node_map directly. If you use robin_hood::unordered_map It tries to choose the layout that seems appropriate for your data.

  • Custom allocator. Node based representation has a custom bulk allocator that tries to make few memory allocations. All allocated memory is reused, so there won't be any allocation spikes. It's very fast as well.

  • Optimized hash. robin_hood::hash has custom implementations for integer types and for std::string that are very fast and falls back to std::hash for everything else.

  • Depends on good Hashing. For a really bad hash the performance will not only degrade like in std::unordered_map, the map will simply fail with an std::overflow_error. In practice, when using the standard robin_hood::hash, I have never seen this happening.

License

Licensed under the MIT License. See the LICENSE file for details.

by martinus

Comments
  • Insert fails with std::overflow_error again

    Insert fails with std::overflow_error again

    Hello.

    I have encountered std::overflow_error again.

    I am using robin hood version 3.10.0.

    This time I cannot create test case. I hope you can give me guidelines to find source of problem. I am working with robin_hood::unordered_flat_set<int32_t>. My application is working with 200 containers and 10 million elements in total, so I cannot find good way to create test example. It takes about 5 minutes of number crunching before application crashes while trying to insert that specific integer into that specific container.

    My application is basically doing this:

    • Create 200 containers.
    • Do work with all containers (container::insert, container::erase, container::clear). This work takes about 5 minutes.
    • At some moment, application has to process some problematic container.
    • Problematic container contains some elements.
    • I know what are new elements that must be inserted into problematic container.
    • Problematic container is cleared before insertion.
    • New elements are inserted into problematic container one-by-one. Insertion fails with std::overflow_error after some number.

    Exception is not thrown if I recreate standalone test case with same elements.

    Thanks.

  • Support for Unique hash map?

    Support for Unique hash map?

    As discussed on https://github.com/CGAL/cgal/issues/4049 I am investigating using robin_hood::unordered_map for a unique hash map. A Unique hash map is a specialization on insertion only and unique hash values allow for a more time and space-efficient implementation. In the case of the implementation in CGAL the memory addresses of Handle objects are used for key-hashes and therefore by their very nature there cannot be duplicates/collisions.

    The CGAL implementation suffers from poor construction speed due to the default hardcoded table size of 512, and this is what I hope to improve by using robin_hood::unordered_map. However the insert speed of the CGAL::Unique_hash_map does seem to outperform by a factor of 0.2, as can be seen in my benchmark project https://github.com/GilesBathgate/hashmap-benchmarks

    What I'd like to establish is whether improvements can be made to either CGAL::Unique_hash_map that would facilitate faster construction, or whether robin_hood::* can be improved with the techniques used in the CGAL implementation.

    I realise that the main purpose of robin hood hashing is to handle the case when there are duplicate keys, so perhaps its not the best candidate. Are there any other faster hash map implementations that can be used for a unique hash map that would be more suitable?

  • robin_hood::map overflow

    robin_hood::map overflow

    Dear developer,

    I am using robin hood map (3.11.2) under MinGW. I got overflow error when insert some number as below log.

    ... Add 20686628 Add 20790466 Add 20950897 Add 20938052 Add 20481416 terminate called after throwing an instance of 'std::overflow_error' what(): robin_hood::map overflow

    I have attach reproduce file. Please check it. :)

    Thank you.

    test.zip

  • Consider porting to C++11 / C++03

    Consider porting to C++11 / C++03

    A large number of development environments still do not have C++14 support, especially in many embedded environments that only support C++03.

    Can you consider porting this powerful and efficient container to these older versions?

    Thanks :-)

  • support remove/erase idiom

    support remove/erase idiom

    I noticed when applying the remove/erase idiom that the following method is not implemented:

    iterator erase ( const_iterator first, const_iterator last );

    Reference: https://en.cppreference.com/w/cpp/container/unordered_map/erase Example: cache.erase( std::remove_if( cache.begin(), cache.end(), [&] ( const pair<int,int>& i ) { return i.second == 1 ); } ), cache.end() );

  • Fix CRC32 support on Windows ARM64 and use runtime detection

    Fix CRC32 support on Windows ARM64 and use runtime detection

    __cpuid isn't present on ARM64 builds in Windows, and CRC32 is always present for ARM64 processors that Windows 10 supports (making runtime detection unnecessary).

    Also, remove the hardware support detection at compile time. You've already added runtime detection (the more flexible method), so use that.

  • Fast CRC32 based hash for hash_bytes and hash_int

    Fast CRC32 based hash for hash_bytes and hash_int

    • [x] CRC with runtime dispatch
    • [x] Use _mm_crc32_u64
    • [x] Don't use upper bits of the hash, only use lower bits. Then a single crc command would be enough
    • [ ] Implement optimized hash_bytes that makes use of CRC32 instruction
  • Not using intrinsics

    Not using intrinsics

    What was the reason for reverting https://github.com/martinus/robin-hood-hashing/commit/f45a309af7d08cb98fc08dc7903ac9a7091db6d5 ? Any reliability concerns or just performance?

    I'm asking because, on certain contexts, including x86intrin.h on Mingw-w64 triggers a bug on g++.

  • Crashing with overflow_error

    Crashing with overflow_error

    Hello,

    I am currently trying to integrate robin hood hashing into my project. As mentioned on your github, I simply included the robin_hood.h and used robin_hood::unordered_map instead of std::unordered_map.

    However, upon start, my program crashes and reports the following error:

    terminate called after throwing an instance of 'std::overflow_error' what(): robin_hood::map overflow Command terminated by signal 6

    Do you have any idea what could cause this?

    Thanks!

    Cheers, Pierre

  • Please add C++ 17 node API to node based map

    Please add C++ 17 node API to node based map

    For node-based maps, these functions can enable writing much more efficient code:

    https://en.cppreference.com/w/cpp/container/unordered_map/extract

    Indeed, I am not using your map in a number of places precisely due to me using this node ptr API.

  • it = map.erase(it) potentially passes object twice

    it = map.erase(it) potentially passes object twice

    It seems @jcageman has stumbled uppon an issue that probably all robin-hood map with backward shift deletion have (see #18). Here is a simple reproducer

    Insert two elements, where both hash to the last bucket of the map. One entry will be the last one of the map, the other one will wrap around and use the first bucket.

    robin_hood::unordered_node_map<int, int, DummyHash> map;
    map[1024] = 1;
    map[2048] = 2;
    
    it = map.begin(); // it points to 2048, the first element which has wrapped around
    ++it;  // it points to the last element, 1024 which is in its original bucket
    it = map.erase(it); // backward shift deletion removes 1024 and wraps back 2048, so it now (again!) points to 2048 and NOT end
    it = map.erase(it); // finally we are at the end.
    

    Also see https://github.com/Tessil/robin-map/issues/20

  • undefined symbol: std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)

    undefined symbol: std::__1::basic_ostream >& std::__1::operator<<, std::__1::allocator >(std::__1::basic_ostream >&, std::__1::basic_string, std::__1::allocator > const&)

    ld: error: undefined symbol: std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<char, std::__1::char_traits<char>, std::__1::allocator<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
    >>> referenced by doctest.h:862 (/disk-samsung/freebsd-ports/devel/robin-hood-hashing/work/robin-hood-hashing-3.11.5/src/test/thirdparty/doctest/doctest.h:862)
    >>>               CMakeFiles/rh.dir/src/test/unit/unit_insert_or_assign.cpp.o:(doctest::String doctest::detail::stringifyBinaryExpr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char [3]>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, char const (&) [3]))
    >>> referenced by doctest.h:862 (/disk-samsung/freebsd-ports/devel/robin-hood-hashing/work/robin-hood-hashing-3.11.5/src/test/thirdparty/doctest/doctest.h:862)
    >>>               CMakeFiles/rh.dir/src/test/unit/unit_no_intrinsics.cpp.o:(doctest::String doctest::detail::stringifyBinaryExpr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char [5]>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, char const (&) [5]))
    c++: error: linker command failed with exit code 1 (use -v to see invocation)
    ninja: build stopped: subcommand failed.
    

    clang-14 FreeBSD 13.1

  • robin_hoodConfig.cmake missing

    robin_hoodConfig.cmake missing

    I'm trying to make Vulkan Validation layers and the make required robin hood but after using conan to install it and use it, I was expecting there to be a robin_hoodConfig.cmake or robin_hood-config.cmake. Everything works except the robin hood.

    I'm using windows 10, Vulkan 1.3.211.0, cmake 3.10.2

    command to make the validation layers:

    cmake -A x64 -DVULKAN_HEADERS_INSTALL_DIR=C:\libraries\VulkanSDK\1.3.211.0 -DGLSLANG_INSTALL_DIR=C:\libraries\VulkanSDK\1.3.211.0 -DSPIRV_HEADERS_INSTALL_DIR=C:\libraries\VulkanSDK\1.3.211.0 -DSPIRV_TOOLS_INSTALL_DIR=C:\libraries\VulkanSDK\1.3.211.0 -DROBIN_HOOD_HASHING_INSTALL_DIR=C:\robin-hood-hashing-master ..

    output from command:

    -- Selecting Windows SDK version 10.0.22000.0 to target Windows 10.0.19042.


    • NOTE: Not adding target to run update_deps.py automatically. *

    CMake Error at CMakeLists.txt:254 (find_package): Could not find a package configuration file provided by "robin_hood" with any of the following names:

    robin_hoodConfig.cmake
    robin_hood-config.cmake
    

    Add the installation prefix of "robin_hood" to CMAKE_PREFIX_PATH or set "robin_hood_DIR" to a directory containing one of the above files. If "robin_hood" provides a separate development package or SDK, be sure it has been installed.

    -- Configuring incomplete, errors occurred! See also "C:/Vulkan-ValidationLayers-master/build/CMakeFiles/CMakeOutput.log".

  • Idea for robin-hood-hashing V2

    Idea for robin-hood-hashing V2

    • Performance is important but not everything

    • Use an std::vector<std::pair<Key, Value>> (maybe customizable) for the content that is always 100% compact. iteration is perfectly fast.

    • Use an indexing structure into that vector, that cannot overflow, e.g. like so:

      • 3 byte offset from original bucket => 2^24-1 = 16777215 collisions possible, should be plenty
      • 1 byte hash
      • 4 byte index into std::vector
    • => limitations are: no more than 16M collisions (so no overflow check necessary?)

    • 4 byte index, so no more than 4 billion elements are allowed. Should be enough for most use cases.

      • 32bit hash would be enough for indexing. Convert mumx to 32bit would work
      • We could use non-power-of-two-hashmap sizes. Given a well distributed 32bit hash, use idx = ((uint64_t)hash * (uint64_t)mapsize) >> 32
    • 1 byte hash: only every 256th key needs to be checked, that's good enough

    • ordering like robin-hood-hash, we we only need a single < comparison

    • how to deal with capacity() changes / rehash? (resize to capacity + 1, then resize to the new capacity(). This sucks.)

  • robin_hood.h: Change hash val shifts for `info` and `idx` generation

    robin_hood.h: Change hash val shifts for `info` and `idx` generation

    Instead of generating the info hash bits with a mask + shift, we can just use an unsigned shift from the end as that will bring in zeros.

    With that change the idx hash bits can start from the begining so we no longer need to shift out the info hash bits.

    Saves 2x instructions. Only ALU but this function is in the critical path.

    There does not appear to be any issue with false hits changing the tag bits used.

    False Tag Match Rates:

                      :   New,   Cur
    

    Insert Seq Ints: 2^10 : 2.539, 2.832 2^12 : 3.076, 3.003 2^14 : 2.960, 3.497 2^16 : 4.048, 4.306 2^18 : 4.543, 4.732 2^20 : 5.182, 5.553 2^22 : 5.579, 5.557 2^24 : 6.237, 6.017 2^26 : 6.599, 6.432

    Insert Rand Ints (seed = 0): 2^10 : 2.637, 4.199 2^12 : 2.979, 3.613 2^14 : 3.662, 3.955 2^16 : 4.309, 4.849 2^18 : 4.626, 4.564 2^20 : 4.849, 5.044 2^22 : 6.050, 5.279 2^24 : 5.664, 6.350 2^26 : 6.640, 6.771

    Insert Strings: All Words(0) : 4.974, 5.231 10k Words(1) : 3.550, 4.000 URLS(2) : 5.548, 5.683 URLS(2) w/ "http://" : 5.466, 5.831 URLS(2) w/ "https://" : 5.569, 5.761

C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Aug 9, 2022
A repository I'm using to learn hashing with GLib.

GLib Tests Description A repository I'm using to store my progress while learning GNOME's GLib library. Specifically hashing via ghash. Test Files GLi

May 31, 2022
A fast and efficient non-iterating hashmap library

libhash: a fast and efficient non-iterating hashmap library Libhash is a fast and efficient non-iterating hashmap library Usage Usage is easy and simp

May 29, 2022
C++ library to create dynamic structures in plain memory of shared-memory segments

Ищи описание на хабре @mrlolthe1st. #define _CRT_SECURE_NO_WARNINGS #include "shared_structures.h" #include <iostream> #include <fstream> #include <ca

Mar 30, 2022
A family of header-only, very fast and memory-friendly hashmap and btree containers.
A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

Aug 6, 2022
Algo-Tree is a collection of Algorithms and data structures which are fundamentals to efficient code and good software design
Algo-Tree is a collection of Algorithms and data structures which are fundamentals to efficient code and good software design

Algo-Tree is a collection of Algorithms and data structures which are fundamentals to efficient code and good software design. Creating and designing excellent algorithms is required for being an exemplary programmer. It contains solutions in various languages such as C++, Python and Java.

Jul 24, 2022
A simple and efficient c++ KDTree implementation.

A simple and efficient c++ KD-Tree implementation. This code was written when I was learning c++11.

May 11, 2022
This is a curve topology verification tool based on Fast Linking Numbers for Topology Verification of Loopy Structures.
This is a curve topology verification tool based on Fast Linking Numbers for Topology Verification of Loopy Structures.

Fast Linking Numbers This tool, called verifycurves, takes input models that consist of closed-loop curves, and outputs a topology certificate as a .t

Jan 26, 2022
Cross-platform process memory inspector
Cross-platform process memory inspector

Catsight Cross-platform process memory viewer inspired by x64dbg. Features Cross-platform (currently runs on Linux and Windows). Attach to any process

Jul 27, 2022
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

Aug 4, 2022
A GREAT program to fuck your memory or swap

Let everyone enjoy the fun of fucking -- Chi_Tang FuckMemory This is a GREAT program to fuck your memory or Swap Installation Dependencies make g++ Li

Mar 3, 2022
HashTableBenchmark - A simple cross-platform speed & memory-efficiency benchmark for the most common hash-table implementations in the C++ world

Hash-Tables Benchmarks This repository contains a bunch of extendable benchmarks mostly for following containers: std:unordered_map from STL. google::

Jul 11, 2022
A fast hash map/hash table (whatever you want to call it) for the C programming language.

C HashMap A fast hash map/hash table (whatever you want to call it) for the C programming language. It can associate a key with a pointer or integer v

Jul 19, 2022
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Jul 31, 2022
This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer (short for FST) in c++ language.This FST C++ open source project has much significant advantages.
This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer (short for FST) in c++ language.This FST C++ open source project has much significant advantages.

Orchid-Fst 1. Project Overview This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer , which

Jul 25, 2022
Another neofetch-like utility but this time it's fast.

SystemFetch Another neofetch-like utility but this time it's fast. Example Speed Here is a table of the time it took to execute all of these programs,

Jul 22, 2021
merge two sorted lists fast

Py Merge Merge sorted list faster than using list.sort or heapq.merge. import merge # create some sorted lists a = list(range(-100, 1700)) b = list(r

Nov 21, 2021
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

moodycamel::ConcurrentQueue An industrial-strength lock-free queue for C++. Note: If all you need is a single-producer, single-consumer queue, I have

Aug 8, 2022