A fast, memory efficient hash map for C++

I now recommend using the parallel hashmap instead of sparsepp, unless if you are stuck with a non C++11 compatible compiler, or if using a little bit more memory is not acceptable. I will personally switch to using the parallel hashmap, which I believe offers a significantly better tradeoff (slightly higher memory usage, much faster).

Sparsepp: A fast, memory efficient hash map for C++ Build Status

Sparsepp is derived from Google's excellent sparsehash implementation. It aims to achieve the following objectives:

  • A drop-in alternative for unordered_map and unordered_set.
  • Extremely low memory usage (typically about one byte overhead per entry), and most importantly very small memory overhead when resizing.
  • Very efficient, typically faster than your compiler's unordered map/set or Boost's.
  • C++11 support (if supported by compiler).
  • Single header not anymore
  • Tested on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++).

We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's dense_hash_map is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance, and insertions cause a resizing).

This hash map. like Google's dense_hash_map, uses open addressing (meaning that the hash table entries are conceptually stored in a big array, and collisions are not resolved using a linked list, but by probing). A big drawback of open addressing is that when the array needs to be resized larger, the high mark for memory usage is typically 3 times the current map size (as we allocate a new array twice as big as the current one, and copy from the old array to the new one). Sparsepp's implementation resolves this memory usage issue, and the memory usage high mark shows only a small bump when resizing.

For a detailed comparison of various hash implementations, including Sparsepp, please see our write-up.

Example

#include <iostream>
#include <string>
#include <sparsepp/spp.h>

using spp::sparse_hash_map;
 
int main()
{
    // Create an unordered_map of three strings (that map to strings)
    sparse_hash_map<std::string, std::string> email = 
    {
        { "tom",  "[email protected]"},
        { "jeff", "[email protected]"},
        { "jim",  "[email protected]"}
    };
 
    // Iterate and print keys and values 
    for (const auto& n : email) 
        std::cout << n.first << "'s email is: " << n.second << "\n";
 
    // Add a new entry
    email["bill"] = "[email protected]";
 
    // and print it
    std::cout << "bill's email is: " << email["bill"] << "\n";
 
    return 0;
}

Installation

No compilation is needed, as this is a header-only library. The installation consist in copying the sparsepp directory wherever it will be convenient to include in your project(s).

Warning - iterator and reference invalidation on erase/insert

  1. erasing elements is likely to invalidate iterators (for example when calling erase())

  2. inserting new elements is likely to invalidate iterators (iterator invalidation can also happen with std::unordered_map if rehashing occurs due to the insertion)

  3. references to values stored in a sparse_hash_map or set are likely to be invalidated on insert()/erase(). This is not the case for std::unordered_map or set.

Usage

As shown in the example above, you need to include the header file: #include <sparsepp/spp.h>

This provides the implementation for the following classes:

namespace spp
{
    template <class Key, 
              class T,
              class HashFcn  = spp_hash<Key>,  
              class EqualKey = std::equal_to<Key>,
              class Alloc    = libc_allocator_with_realloc<std::pair<const Key, T>>>
    class sparse_hash_map;

    template <class Value,
              class HashFcn  = spp_hash<Value>,
              class EqualKey = std::equal_to<Value>,
              class Alloc    = libc_allocator_with_realloc<Value>>
    class sparse_hash_set;
};

These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences:

  • Calls to erase() may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values:

    for (auto it = c.begin(); it != c.end(); )
        if (it->first % 2 == 1)
           it = c.erase(it);
        else
           ++it;

    As for std::unordered_map, the order of the elements that are not erased is preserved.

  • Since items are not grouped into buckets, Bucket APIs have been adapted: max_bucket_count is equivalent to max_size, and bucket_count returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map.

  • Values inserted into sparsepp have to either be copyable and movable, or just movable. See example movable.cc.

Memory allocator on Windows (when building with Visual Studio)

When building with the Microsoft compiler, we provide a custom allocator because the default one (from the Visual C++ runtime) fragments memory when reallocating.

This is desirable only when creating large sparsepp hash maps. If you create lots of small hash_maps, memory usage may increase instead of decreasing as expected. The reason is that, for each instance of a hash_map, the custom memory allocator creates a new memory space to allocate from, which is typically 4K, so it may be a big waste if just a few items are allocated.

In order to use the custom spp allocator, define the following preprocessor variable before including <spp/spp.h>:

#define SPP_USE_SPP_ALLOC 1

Integer keys, and other hash function considerations.

  1. For basic integer types, sparsepp provides a default hash function which does some mixing of the bits of the keys (see Integer Hashing). This prevents a pathological case where inserted keys are sequential (1, 2, 3, 4, ...), and the lookup on non-present keys becomes very slow.

    Of course, the user of sparsepp may provide its own hash function, as shown below:

    #include <sparsepp/spp.h>
    
    struct Hash64 {
        size_t operator()(uint64_t k) const { return (k ^ 14695981039346656037ULL) * 1099511628211ULL; }
    };
    
    struct Hash32 {
        size_t operator()(uint32_t k) const { return (k ^ 2166136261U)  * 16777619UL; }
    };
    
    int main() 
    {
        spp::sparse_hash_map<uint64_t, double, Hash64> map;
        ...
    }
    
  2. When the user provides its own hash function, for example when inserting custom classes into a hash map, sometimes the resulting hash keys have similar low order bits and cause many collisions, decreasing the efficiency of the hash map. To address this use case, sparsepp provides an optional 'mixing' of the hash key (see Integer Hash Function which can be enabled by defining the proprocessor macro: SPP_MIX_HASH.

Example 2 - providing a hash function for a user-defined class

In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of std::hash for the class into the "std" namespace. For example:

#include <iostream>
#include <functional>
#include <string>
#include <sparsepp/spp.h>

using std::string;

struct Person 
{
    bool operator==(const Person &o) const 
    { return _first == o._first && _last == o._last; }

    string _first;
    string _last;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> 
    struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            std::size_t seed = 0;
            spp::hash_combine(seed, p._first);
            spp::hash_combine(seed, p._last);
            return seed;
        }
    };
}
 
int main()
{
    // As we have defined a specialization of std::hash() for Person, 
    // we can now create sparse_hash_set or sparse_hash_map of Persons
    // ----------------------------------------------------------------
    spp::sparse_hash_set<Person> persons = { { "John", "Galt" }, 
                                             { "Jane", "Doe" } };
    for (auto& p: persons)
        std::cout << p._first << ' ' << p._last << '\n';
}

The std::hash specialization for Person combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value.

spp::hash_combine is provided by the header sparsepp/spp.h. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header sparsepp/spp_utils.h for that purpose:

#include <string>
#include <sparsepp/spp_utils.h>

using std::string;
 
struct Person 
{
    bool operator==(const Person &o) const 
    { 
        return _first == o._first && _last == o._last && _age == o._age; 
    }

    string _first;
    string _last;
    int    _age;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> 
    struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            std::size_t seed = 0;
            spp::hash_combine(seed, p._first);
            spp::hash_combine(seed, p._last);
            spp::hash_combine(seed, p._age);
            return seed;
        }
    };
}

Example 3 - serialization

sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection. This support is implemented in the following APIs:

    template <typename Serializer, typename OUTPUT>
    bool serialize(Serializer serializer, OUTPUT *stream);

    template <typename Serializer, typename INPUT>
    bool unserialize(Serializer serializer, INPUT *stream);

The following example demonstrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS:

#include <cstdio>

#include <sparsepp/spp.h>

using spp::sparse_hash_map;
using namespace std;

class FileSerializer 
{
public:
    // serialize basic types to FILE
    // -----------------------------
    template <class T>
    bool operator()(FILE *fp, const T& value) 
    {
        return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
    }

    template <class T>
    bool operator()(FILE *fp, T* value) 
    {
        return fread((void *)value, sizeof(*value), 1, fp) == 1;
    }

    // serialize std::string to FILE
    // -----------------------------
    bool operator()(FILE *fp, const string& value) 
    {
        const size_t size = value.size();
        return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
    }

    bool operator()(FILE *fp, string* value) 
    {
        size_t size;
        if (!(*this)(fp, &size)) 
            return false;
        char* buf = new char[size];
        if (fread(buf, size, 1, fp) != 1) 
        {
            delete [] buf;
            return false;
        }
        new (value) string(buf, (size_t)size);
        delete[] buf;
        return true;
    }

    // serialize std::pair<const A, B> to FILE - needed for maps
    // ---------------------------------------------------------
    template <class A, class B>
    bool operator()(FILE *fp, const std::pair<const A, B>& value)
    {
        return (*this)(fp, value.first) && (*this)(fp, value.second);
    }

    template <class A, class B>
    bool operator()(FILE *fp, std::pair<const A, B> *value) 
    {
        return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second);
    }
};

int main(int argc, char* argv[]) 
{
    sparse_hash_map<string, int> age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } };

    // serialize age hash_map to "ages.dmp" file
    FILE *out = fopen("ages.dmp", "wb");
    age.serialize(FileSerializer(), out);
    fclose(out);

    sparse_hash_map<string, int> age_read;

    // read from "ages.dmp" file into age_read hash_map 
    FILE *input = fopen("ages.dmp", "rb");
    age_read.unserialize(FileSerializer(), input);
    fclose(input);

    // print out contents of age_read to verify correct serialization
    for (auto& v : age_read)
        printf("age_read: %s -> %d\n", v.first.c_str(), v.second);
}

Thread safety

Sparsepp follows the thread safety rules of the Standard C++ library. In Particular:

  • A single sparsepp hash table is thread safe for reading from multiple threads. For example, given a hash table A, it is safe to read A from thread 1 and from thread 2 simultaneously.

  • If a single hash table is being written to by one thread, then all reads and writes to that hash table on the same or other threads must be protected. For example, given a hash table A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.

  • It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given hash tables A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.

Owner
Gregory Popovitch
C++ developer at Altair Engineering.
Gregory Popovitch
Comments
  • Memory error when calling unserialize on file-scope sparse_hash_map.

    Memory error when calling unserialize on file-scope sparse_hash_map.

    Hi Greg,

    Sorry to bother you with this, but I'm trying to bisect a bug that coincides with my conversion to sparsepp. Strangely, I only see a bug when running on my Mac (i.e. never on a linux system), but the bug also only presents when using sparsepp (config details at the bottom). Is the sparse_hash_map safe to read simultaneously from many threads? It's not the case that iterators are e.g. mutating state as they are searching, right? I'd guess not, and I'll continue looking for the bug elsewhere, but I just wanted to make sure.

    Thanks! Rob

    details of compiler (also built using -stdlib=libc++).

    Apple LLVM version 7.3.0 (clang-703.0.31)
    Target: x86_64-apple-darwin15.6.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    
  • Possible iterator bug?

    Possible iterator bug?

    I'm replacing some usage of std::unordered_map with sparse_hash_map and I've encountered an occasional issue with iterators.

    map_t::const_iterator iter = map.find(keyA); // after this line iter contains a good value
    map[keyB] = (*iter).second; // after this line iter contains a garbage value
    

    With this code, keyB will occasionally map to some garbage values (0xdddddddd, indicating deleted memory), and not the same value as keyA, which is what I would expect. The same code works fine with std::unordered_map.

    If I store keyA's value in a temp variable everything works as expected:

    map_t::const_iterator iter = map.find(keyA);
    tempVal = (*iter).second;
    map[keyB] = tempVal;
    

    (Unfortunately, I can't reproduce this issue with a simple test case, only our internal code.)

    Using the temp variable is fine for my usage, just curious if what I'm seeing is a bug, or expected behaviour. I saw in the readme that calls to erase() can invalidate iterators. Is the same true with insertions as well?

  • Data race with aggressive and moderate configuration for SPP_ALLOC_SZ

    Data race with aggressive and moderate configuration for SPP_ALLOC_SZ

    So, I started using this hash map, and a few days after I pushed the code, random crashes started happening on our tests.

    Apparently, there is a data race inside the function:

    static uint32_t _sizing(uint32_t n)
    {
    #if !defined(SPP_ALLOC_SZ) || (SPP_ALLOC_SZ == 0)                                                                       
            // aggressive allocation first, then decreasing as sparsegroups fill up                                         
            // --------------------------------------------------------------------                                         
            static uint8_t s_alloc_batch_sz[SPP_GROUP_SIZE] = { 0 };                                           
            if (!s_alloc_batch_sz[0])                                               
    

    s_alloc_batch_sz can be accessed from multiple threads. Helgrind immediately complains about using different maps in different threads, so it's not hard to reproduce.

    For my code, I changed the flag to SPP_ALLOC_SZ 1, since I was in this for memory anyway.

    I also tested with adding thread_local to the static, and it fixes the issue. But I wouldn't go with thread local, since I don't want these few extra bytes in the thread stacks of my application.

  • Assertion Failure in erase()

    Assertion Failure in erase()

    I tried replacing a usage of gnu_cxx::hash_map in our code with sparsepp::sparse_hash_map, but I get an assertion failure in erase(): sparsepp/sparsepp.h:2824: bool spp::sparsegroup<T, Alloc>::erase_ne(Alloc&, twod_iter&) [with twod_iter = spp::Two_d_iterator<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t>, spp::sparsegroup<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t>, spp::libc_allocator_with_realloc<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t> > >, std::pair<const croix::hashed_squish_bits_t, croix::subwind_map_value_t>, std::bidirectional_iterator_tag>, T = std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t>, Alloc = spp::libc_allocator_with_realloc<std::pair<croix::hashed_squish_bits_t, croix::subwind_map_value_t> >]: Assertion `_group && it.col_current != ne_end()' failed.

    Most of my test cases work, but a few of them fail in this way. Here I'm inserting thousands of large objects (several hundred bytes each) into the hash_map, then creating a vector of iterators sorted by a custom sort function, then erasing them one-by-one to incrementally free the memory. This works correctly for all of my test cases when using gnu_cxx::hash_map, unordered_map, google::sparse_hash_map, and google::dense_hash_map. It only fails with sparsepp. Also, I'm using the latest version of sparsepp the sparsepp unit tests pass on my machine.

    What is this assertion meant to test? It's not clear whether this is a sparsepp bug or if I'm doing something illegal in my code.

  • unexpected memory usage after unserialization

    unexpected memory usage after unserialization

    Hi,

    sparsepp is really great!

    I create a <uint64_t, string> map. In my test code, about 178M <uint64_t,string> pairs are inserted into the map. And then I look-up them again. While looking up them, the memory usage is ~4.995G.

    And then I add serialization and unserialization code (copied from your example 3) between insertion and look-up part. During the serialization, the memory usage is still ~4.995G. However, after unserialization, the memory usage is ~12.024G. The serialized file is ~4.4G.

    Do you have any idea about this? My computing scenario is building map once and looking-up many times. Thank you!

  • cannot serialize and deserialize map<int,vector<string>> in c++

    cannot serialize and deserialize map> in c++

    when i try to insert values into map<int, vector> and serialize into a .dmp file, there is no error. but when i try to deserialize the .dmp file into a map, it makes some segmentation fault. can you please help me ???

  • When hashing on vector

    When hashing on vector

    I want to store unique vector, so I use spp::sprase_hash_set<vector>, here is my hash function for vector: ` // code namespace std{

     template<>
    
     struct hash<vector<int>>{
         std::size_t operator()(std::vector<int> const& vec) const {
           std::size_t seed = vec.size();
           for(auto& i : vec) {
             seed ^= i + 0x9e3779b9 + (seed << 6) + (seed >> 2);
           }
           return seed;
         }
     };
    

    }

    `

    Here is my test function: ` void test_hash(){ std::vector a = {2,3,4}; std::vector b = {4,5,6}; std::vector c = {2,3,4}; std::vector d = {1,2,3};

     spp::sparse_hash_set<std::vector<int>> s;
     std::unordered_set<vector<int>> u;
    
     s.insert(a);
     s.insert(b);
    
     u.insert(a);
     u.insert(b);
    
    
     if(s.find(a) != s.end()){
         cout << "s has a" << endl;
     }
     if(u.find(a) != u.end()){
         cout << "u has a" << endl;
     }
    

    }`

    Here is the output: u has a

    So the unordered_map can output the correct answer, but sparse_hash_set cannot. What's the problem? By the way, I really enjoy the efficiency of sparse_hash_set.

  • Performance issue

    Performance issue

    Hi,

    Seeing performance issue with finds for hash map <int, int> when inserting keys in order, either ascending or descending:

    Testing sparse map
    2000000 random inserts in 1 seconds
    2000000 random finds in 0 seconds
    2000000 random not-finds in 0 seconds
    2000000 sequential inserts in 0 seconds
    2000000 sequential finds in 0 seconds
    2000000 random not-finds in 19 seconds
    2000000 negative sequential inserts in 1 seconds
    2000000 negative sequential finds in 0 seconds
    2000000 random not-finds in 17 seconds

    Testing STL map
    2000000 random inserts in 2 seconds
    2000000 random finds in 1 seconds
    2000000 random not-finds in 1 seconds
    2000000 sequential inserts in 1 seconds
    2000000 sequential finds in 0 seconds
    2000000 random not-finds in 0 seconds
    2000000 negative sequential inserts in 1 seconds
    2000000 negative sequential finds in 0 seconds
    2000000 random not-finds in 0 seconds

    Code is as follows: void testSparseMap(int count) { spp::sparse_hash_map<int, int> s; time_t t;

    printf ("Testing sparse map\n"); t = time (NULL); srand (0); for (int i = 0; i < count; ++i) { s.insert (make_pair(rand (), i)); } printf ("%d random inserts in %ld seconds\n", count, time (NULL) - t);

    t = time (NULL); srand (0); for (int i = 0; i < count; ++i) { s.find (rand ()); } printf ("%d random finds in %ld seconds\n", count, time (NULL) - t);

    t = time (NULL); srand (1); for (int i = 0; i < count; ++i) { s.find (rand ()); } printf ("%d random not-finds in %ld seconds\n", count, time (NULL) - t);

    s.clear (); s.reserve(0); t = time (NULL); srand (0); for (int i = 0; i < count; ++i) { s.insert (make_pair(i, i)); } printf ("%d sequential inserts in %ld seconds\n", count, time (NULL) - t);

    t = time (NULL); srand (0); for (int i = 0; i < count; ++i) { s.find (i); } printf ("%d sequential finds in %ld seconds\n", count, time (NULL) - t);

    t = time (NULL); srand (1); for (int i = 0; i < count; ++i) { s.find (rand ()); } printf ("%d random not-finds in %ld seconds\n", count, time (NULL) - t);

    s.clear (); t = time (NULL); srand (0); s.reserve(0); for (int i = 0; i < count; ++i) { s.insert (make_pair(-i, -i)); } printf ("%d negative sequential inserts in %ld seconds\n", count, time (NULL) - t);

    t = time (NULL); srand (0); for (int i = 0; i < count; ++i) { s.find (-i); } printf ("%d negative sequential finds in %ld seconds\n", count, time (NULL) - t);

    t = time (NULL); srand (1); for (int i = 0; i < count; ++i) { s.find (rand ()); } printf ("%d random not-finds in %ld seconds\n", count, time (NULL) - t); s.clear (); s.reserve(0); }

    This is not an issue on Mac OS X.

  • Compile error in debug mode

    Compile error in debug mode

    Hi there,

    sparsepp works perfectly fine when compiled in release mode (with an amazingly small memory footprint, thanks!), but doesn't compile with my debug setup.

    The error message comes from the debug version of my STL implementation:

    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/debug/array:86:52: 
    error: too many arguments to function call, expected single argument '__other', have 2 arguments
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/debug/array:86:52: error: too many arguments to function call, expected single argument '__other', have 2 arguments
          noexcept(noexcept(swap(std::declval<_Tp&>(), std::declval<_Tp&>())))
                            ~~~~                       ^~~~~~~~~~~~~~~~~~~~
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/debug/array:264:23: note: in instantiation of exception specification for 'swap' requested here
        noexcept(noexcept(__one.swap(__two)))
                          ^
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_pair.h:195:25: note: in instantiation of exception specification for 'swap<unsigned int, 5>' requested here
          noexcept(noexcept(swap(first, __p.first))
                            ^
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_pair.h:255:23: note: in instantiation of exception specification for 'swap' requested here
        noexcept(noexcept(__x.swap(__y)))
                          ^
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algobase.h:148:7: note: in instantiation of exception specification for 'swap<std::__debug::array<unsigned int, 5>, unsigned long>'
          requested here
          swap(*__a, *__b);
          ^
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:1351:10: note: in instantiation of function template specialization 'std::iter_swap<std::pair<std::__debug::array<unsigned
          int, 5>, unsigned long> *, std::pair<std::__debug::array<unsigned int, 5>, unsigned long> *>' requested here
                      std::iter_swap(__p, __q);
                           ^
    
    /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:1419:12: note: (skipping 7 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
          std::__rotate(__first, __middle, __last,
               ^
    
    [...]/sparsepp/sparsepp/spp.h:2744:26: note: in instantiation of member function 'spp::sparse_hashtable<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long>,
          std::__debug::array<unsigned int, 5>, spp::spp_hash<std::__debug::array<unsigned int, 5> >, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
          spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
          >::SelectKey, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >,
          spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::SetKey, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const
          std::__debug::array<unsigned int, 5>, unsigned long> > >::sparse_hashtable' requested here
            sparse_hashtable tmp(MoveDontCopy, *this, resize_to);
                             ^
    
    [...]/sparsepp/sparsepp/spp.h:3276:21: note: in instantiation of member function 'spp::sparse_hashtable<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long>,
          std::__debug::array<unsigned int, 5>, spp::spp_hash<std::__debug::array<unsigned int, 5> >, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
          spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
          >::SelectKey, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >,
          spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::SetKey, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const
          std::__debug::array<unsigned int, 5>, unsigned long> > >::_resize_delta' requested here
                    if (_resize_delta(1))
                        ^
    
    [...]/sparsepp/sparsepp/spp.h:3797:29: note: in instantiation of function template specialization 'spp::sparse_hashtable<std::pair<const std::__debug::array<unsigned int, 5>, unsigned
          long>, std::__debug::array<unsigned int, 5>, spp::spp_hash<std::__debug::array<unsigned int, 5> >, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
          spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
          >::SelectKey, spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >,
          spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::SetKey, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const
          std::__debug::array<unsigned int, 5>, unsigned long> > >::find_or_insert<spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long, spp::spp_hash<std::__debug::array<unsigned int, 5> >,
          std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> > >::DefaultValue>' requested here
            return rep.template find_or_insert<DefaultValue>(key).second;
                                ^
    
    [...]/foo.cpp:326:21: note: in instantiation of member function 'spp::sparse_hash_map<std::__debug::array<unsigned int, 5>, unsigned long,
          spp::spp_hash<std::__debug::array<unsigned int, 5> >, std::equal_to<std::__debug::array<unsigned int, 5> >, spp::libc_allocator<std::pair<const std::__debug::array<unsigned int, 5>, unsigned long> >
          >::operator[]' requested here
                        hashmap[ key ] = value;
    

    My setup: Clang 3.9.1 Debug flags:

    -O2 -DDEBUG -g -ggdb3 -D_GLIBCXX_DEBUG
    

    that is, with the checked STL implementation.

    Any idea what is going on here?

    Thanks a lot! Lucas

  • Support for noncopyable objects

    Support for noncopyable objects

    I'm trying to add a noncopyable, but movable, object into emplace(). I've also tried insert(). Both return a compiler error like https://gist.github.com/springmeyer/503f8a842f8c9a09bccf21325c03c866. I'm seeing this with clang++ on OS X 10.11.

  • Question about un-expectedly *good* speed

    Question about un-expectedly *good* speed

    Hi Greg,

    First, thanks for this excellent hash table! I've been wanting something like this for some time, as I find the standard unordered_map in C++ to be reasonably fast, but with very poor memory use. The balance struck by sparsepp is near-perfect, and it's a near drop-in replacement for unordered_map which makes it very easy to use. I'm seeing something that, to me, is completely unexpected. I have a project, RapMap, where I want to replace a Google dense_hash_map with the sparsepp sparse_hash_map to reduce memory usage. I've been doing some benchmarking to see what type of speed hit I would have to take in my particular scenario (the memory usage with sparsepp is already a factor of 2 lower). I got some very unexpected results. Specifically, sparsepp seems to be working faster than the Google dense_hash_map. I'm attaching a timing plot and some /usr/bin/time results for your perusal.

    timing

    This image shows the total elapsed time (wall clock time) for each dataset using the google dense_hash_map (dense) and the sparsepp sparse_hash_map (sparse). Interestingly, you can see that sparsepp almost always outperforms the dense_hash_map. Now, my workload isn't entirely looking up hash keys, but no other code here changed apart from which hash was being used for the lookup, so that is the only thing to which I can attribute the timing difference. Also very interesting to me is the actual timing logs:

    timing.zip

    the thing that is interesting to me here is the difference in user, system and elapsed time, and the total CPU usage. For each run, I told the process to use 16 threads. The hashing workload here is completely lookups (i.e. there are no insertions or deletions during these runs, apart from the initial de-serialization of the hash from disk). It looks like sparsepp is consistently coming closer to the pre-specified cpu usage percentage (often 1,300-1,500% compared to the 1,000-1,200% with dense_hash_map). It also appears that sparsepp is consistently consuming more user time and less system time than google's dense_hash_map.

    Do you have any idea why I might be seeing these un-expected performance characteristics? Any light you could shed on why I might see this would be greatly appreciated!

    Thanks, Rob

  • Compiler warning on realloc call within sparsepp.

    Compiler warning on realloc call within sparsepp.

    Hi @greg7mdp,

    Thanks again for both sparsepp and parallel-hashmap! I wanted to report a (fairly verbose) compiler warning I get when using sparsepp within some of our code. Specifically, I get a a compiler trace that ends in the following warning. It looks like the root cause is using realloc on a type that is not trivially copyable. Any suggestions on how to fix this?

    sparsepp/spp_utils.h:425:51: warning: 'void* realloc(void*, size_t)' moving an object of non-trivially copyable type 'struct std::pair<const std::pair<unsigned int, unsigned int>, unsigned int>'; use 'new' and 'delete' instead [-Wclass-memaccess]
      425 |         pointer res = static_cast<pointer>(realloc(p, new_size * sizeof(T)));
    
  • Can sparse_hash_map support unique_ptr?

    Can sparse_hash_map support unique_ptr?

    I have a class member like this:

    HashMap<String, std::unique_ptr<Klass>> _classes;// HashMap is sparse_hash_map alias
    

    And it will get following errors:

    In file included from /usr/include/c++/7/memory:80:0,
                     from /home/lzx/LiVM/include/sparsepp/spp.h:53,
                     from /home/lzx/LiVM/include/shared/hashMap.h:7,
                     from /home/lzx/LiVM/include/LiVM/classpath/system.h:8,
                     from /home/lzx/LiVM/src/LiVM/classpath/system.cpp:4:
    /usr/include/c++/7/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = LiVM::Klass]’:
    blahblah
    

    And when i change it to:

    HashMap<String, Klass*> _classes;
    

    everything is ok,so what happened?

  • SEGV after moving hash_map

    SEGV after moving hash_map

    Linux/Darwin gcc/clang MWE

    spp::sparse_hash_map<uint64_t, uint64_t> map0;
    map0.emplace(0, 0);
    auto map1 = std::move(map0);
    //map0 = spp::sparse_hash_map<uint64_t, uint64_t>{}; // uncomment this to pass the test
    auto found = map0.find(0); // segfaults here
    CHECK(found == end(map0));
    
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Dec 26, 2022
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map

Jan 5, 2023
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Dec 29, 2022
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Dec 23, 2022
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Dec 26, 2022
A fast hash map/hash table (whatever you want to call it) for the C programming language.

C HashMap A fast hash map/hash table (whatever you want to call it) for the C programming language. It can associate a key with a pointer or integer v

Dec 27, 2022
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Dec 23, 2022
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Jan 2, 2023
C++ hash map and hash set which preserve the order of insertion

C++ hash map and hash set which preserves the order of insertion The ordered-map library provides a hash map and a hash set which preserve the order o

Dec 2, 2022
Generate Height map with Generator (OpenGL and imgui) and Construct Splat Map with generated height map using Algorithm
Generate Height map with Generator (OpenGL and imgui) and Construct Splat Map with generated height map using Algorithm

Generate Height map with Generator (OpenGL and imgui) and Construct Splat Map with generated height map using Algorithm(DPS, BFS, Gradient Descent ... etc) . At Renderer, with height map and blend map which are generated in front of this stage, render high quality terrain with OpenGL

Mar 22, 2022
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Jan 2, 2023
Explore building a hash table with two different hash functions that balances chain length

hash-duo Explore building a hash table with two different hash functions that balances chain length. There is a really cool article called Don't Throw

May 7, 2022
A benchmark for hash tables and hash functions in C++, evaluate on different data as comprehensively as possible
A benchmark for hash tables and hash functions in C++, evaluate on different data as comprehensively as possible

Hash Table Benchmark This is yet another benchmark for hash tables(hash maps) with different hash functions in C++, attempting to evaluate the perform

Oct 18, 2022
Bit-Map is a simple bit map.

Bit-Map Bit-Map is a simple bit map. Usage Create a map uint8** bitmap; uint64 map_size = 32; // bit number pfs_create_bitmap(bitmap,

Feb 18, 2022
A fantasy map generator based on Martin O'Leary's "Generating fantasy map" notes
A fantasy map generator based on Martin O'Leary's

Fantasy Map Generator This program is an implementation of a fantasy map generator written in C++ based on the methods described in Martin O'Leary's "

Dec 16, 2022
A lightweight C++14 parsing library for tmx map files created with the Tiled map editor

tmxlite Description A lightweight C++14 parsing library for tmx map files created with the Tiled map editor. Requires no external linking, all depende

Dec 26, 2022
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map

Jan 5, 2023
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map

Dec 31, 2022
BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping.

Jan 3, 2023