Custom memory allocators in C++ to improve the performance of dynamic memory allocation

Table of Contents

 Introduction
 Build instructions
 What's wrong with Malloc?
 Custom allocators
        Linear Allocator
        Stack Allocator
        Pool Allocator
        Free list Allocator
 Benchmarks
        Time complexity
        Space complexity
 Summary
 Last thoughts
 Acknowledgments

Introduction

When applications need more memory this can be allocated in the heap (rather than in the stack) in runtime. This memory is called 'dynamic memory' because it can't be known at compile time and its need changes during the execution. Our programs can ask for dynamic memory usin 'malloc'. Malloc returns an address to a position in memory where we can store our data. Once we're done with that data, we can call 'free' to free the memory and let others processes use it.

For this project I've implemented different ways to manage by ourselves dynamic memory in C++.This means that instead of using native calls like 'malloc' or 'free' we're going to use a custom memory allocator that will do this for us but in a more efficient way. The goal, then, is to understand how the most common allocators work, what they offer and compare them to see which one performs better.

Build instructions

git clone https://github.com/mtrebi/memory-allocators.git
cmake -S memory-allocator -B build 
cmake --build build

What's wrong with Malloc?

  • General purpose: Being a general purpose operation means that it must work in all cases (from 1byte to 1GB or more...). For this reason the implementation is not as efficient as it could be if the needs were more specific.
  • Slow: Sometimes, when allocating memory, malloc needs to change from user to kernel mode to get more memory from the system. When this happens, malloc turns out to be super slow!

Custom allocators

Because every program has specific needs, it makes no sense to use a general purpose allocator. We can choose the right allocator that works best for us. This way we can increase our performance.

In general, custom allocators share some features:

  • Low number of mallocs: Any custom allocator tries to keep the number of mallocs low. To do that, they malloc big chunks of memory and then, they manage this chunk internally to provide smaller allocations.
  • Data structures: Secondary data structures like Linked Lists, Trees, Stacks to manage these big chunks of memory. Usually they are used to keep track of the allocated and/or free portions of memory to speed up operations.
  • Constraints: Some allocators are very specific and have constraints over the data or operations that can be performed. This allows them to achieve a high performance but can only be used in some applications.

Linear allocator

This is the simplest kind of allocator. The idea is to keep a pointer at the first memory address of your memory chunk and move it every time an allocation is done. In this allocator, the internal fragmentation is kept to a minimum because all elements are sequentially (spatial locality) inserted and the only fragmentation between them is the alignment.

Data structure

This allocator only requires a pointer (or an offset) to tell us the position of the last allocation. It doesn't require any extra information or data structure.

Data structure of a Linear Allocator

Complexity: O(1)

Allocate

Simply move the pointer (or offset) forward.

Allocating memory in a Linear Allocator

Complexity: O(1)

Free

Due to its simplicity, this allocator doesn't allow specific positions of memory to be freed. Usually, all memory is freed together.

Stack allocator

This is a smart evolution of the Linear Allocator. The idea is to manage the memory as a Stack. So, as before, we keep a pointer to the current memory address and we move it forward for every allocation. However, we also can move it backwards when a free operation is done. As before, we keep the spatial locality principle and the fragmentation is still very low.

Data structure

As I said, we need the pointer (or offset) to keep track of the last allocation. In order to be able to free memory, we also need to store a header for each allocation that tell us the size of the allocated block. Thus, when we free, we know how many positions we have to move back the pointer.

Data structure of a Stack Allocator

Complexity: O(N*H) --> O(N) where H is the Header size and N is the number of allocations

Allocate

Simply move the pointer (or offset) forward and place a header right before the memory block indicating its size.

Allocating memory in a Stack Allocator

Complexity: O(1)

Free

Simply read the block size from the header and move the pointer backwards given that size.

Freeing memory in a Stack Allocator

Complexity: O(1)

Pool allocator

A Pool allocator is quite different from the previous ones. It splits the big memory chunk in smaller chunks of the same size and keeps track of which of them are free. When an allocation is requested it returns the free chunk size. When a freed is done, it just stores it to be used in the next allocation. This way, allocations work super fast and the fragmentation is still very low.

Splitting scheme in a Pool Allocator

Data structure

To keep track of the free blocks of memory, the Pool allocator uses a Linked List that links the address of each free memory block.

Linked List used in a Pool Allocator

To reduce the space needed, this Linked List is stored in the same free blocks (smart right?). However, this set the constraint that the data chunks must be at least as big as our nodes in the Linked List (so that, we can store the Linked List in the free memory blocks).

In memory Linked List used in a Pool Allocator

Complexity: O(1)

Allocate

An allocation simply means to take (pop) the first free block of the Linked List.

Allocation in a Pool Allocator

The linked list doesn't have to be sorted. Its order its determined by the how the allocations and free are done.

Random State of a Linked List in a Pool Allocator

Complexity: O(1)

Free

Free means to add (push) the freed element as the first element in the Linked List.

Complexity: O(1)

Free list allocator

This is a general purpose allocator that, contrary to the others, doesn't impose any restriction. It allows allocations and deallocations to be done in any order. For this reason, its performance is not as good as its predecessors. Depending on the data structure used to speed up this allocator, there are two common implementations: one that uses a Linked List and one that uses a Red black tree.

Linked list data structure

As the name says, this implementation uses a Linked List to store, in a sorted manner, the start address of each free contiguous block in memory and its size. When an allocation is requested, it searches in the linked list for a block where the data can fit. Then it removes the element from the linked list and places an allocation header (used on deallocations) right before the data (as we did in the Stack allocator). On deallocations, we get back the allocation header to know the size of the block that we are going to free. Once we free it we insert it into the sorted linked list and we try to merge contiguous memory blocks together creating bigger blocks.

Notes: My implementation has some constraints on the size and alignment of the data that can be allocated using this allocator. For example, the minimum size that can be allocated should be equals or bigger than the size required of a Free Node. Otherwise, we would be wasting more space in meta-data than in real data (Something similar happens with the alignment) These constraints are related with my implementation. A better implementation would probably handle these cases. I decided not to do so because performance would be drastically affected. In these cases its probably better to use a different allocator.

Data structure in a Free list Allocator

Complexity: O(NHF + MHA)--> O(M) where N is the number of free blocks, HF is the size of the header of free blocks, M the number of allocated blocks and HA the size of the header of allocated blocks

Linked list Allocate

When an allocation is requested, we look for a block in memory where our data can fit. This means that we have to iterate our linked list until we find a block that has a size equal or bigger than the size requested (it can store this data plus the allocation header) and remove it from the linked list. This would be a first-fit allocation because it stops when it finds the first block where the memory fits. There is another type of search called best-fit that looks for the free memory block of smaller size that can handle our data. The latter operation may take more time because is always iterating through all elements but it can reduce fragmentation.

Allocating in a Free list Allocator

Complexity: O(N) where N is the number of free blocks

Linked list Free

First of all we get back the information about the allocation from the header. Then, we iterate the Linked List to intert the free block in the right position (because is sorted by address). Once we've inserted it, we merge contiguous blocks. We can merge in _O(1) because our Linked List is sorted. We only need to look at the previous and next elements in the linked list to see if we can merge this contiguous blocks. This operation of merging contiguous memory blocks to create bigger blocks is called Coalescence If we used instead a Sorted Linked List of free and allocated blocks, the complexity would be O(1) but the allocation compleixity would be *O(N) where N is the number of free and allocated blocks and space complexity would be much higher. When we free a memory block we also look at the previous and next blocks to see if we can merge them into one bigger block.

Freeing in a Free list Allocator

Complexity: O(N) where N is the number of free blocks

Red black tree data structure

The purpose of using a Red black tree is to speed up allocations and deallocations. In the Linked List (or sequential) implementation every time an operation was made we needed to iterate the Linked List. This was O(N) in all cases.

Using Red Black trees we can reduce its complexity to O(log N) while keeping space complexity quite low because the tree data is stored inside the free memory blocks. In addition, this structure allows a best-fit algorithm to be used, reducing the fragmentation and keeping performance. However, an additional sorted Doubly Linked list is required to store allocated and free elements in order to be able to do coalescence operations in O(1). This implementation is the most common and most used in real systems because it offers high flexibility while keeping performance very high.

Benchmarks

Now its time to make sure that all the effort in designing and implementing custom memory allocators is worth. I've made several benchmarks with different block sizes, number of operations, random order, etc. The time benchmark measures the time execution that takes initializing the allocator 'Init()' (malloc big chunk, setup additional data structures...) and untill the last operation (allocation or free) is performed.

Here I'm only showing what I believe is relevant for the goal of this project.

Time complexity

  • Malloc is without doubt the worst allocator.Due to its general and flexible use. O(n)
  • Free list allocator is A much better choice than malloc as a general purpose allocator.It uses Linked List to speed up allocations/free. It's about three times better than malloc O(n)

The next allocator are even better BUT they are no longer general purpose allocators. They impose restrictions in how we can use them:

  • Pool allocator forces us to always allocate the same size but then we can allocate and deallocate in any order. The complexity of this one is slightly better than the free list allocator, wait what? The complexity of the pool allocator was supposed to be constant not linear! And that's true. What its happening here is that the initialization of the additional data structure (the linked list) is O(n). It has to create all memory chunks in then linked them in the linked list. This operation is hiding the truly complexity of the allocation and free operations that is O(1). So, take into account to initialize the Pool allocator (and all the allocators in general) before to avoid this kind of behaviors.
  • Stack allocator can allocate any size, but deallocations must be done in a LIFO fashion with a O(1) complexity. In the chart the complexity is not completely constant due to init function that has to allocate the first big chunk of memory, similarly as before in the pool allocator.
  • Linear allocator is the simplest and the best performant allocator with a O(1) complexity but its also the most restrictive because single free operations are not allowed. As with the stack, the complexity doesn't look completely constant due to the init function.

Time complexity of different allocators

In the next chart we can see that if we don't include the Init() function in the benchmark, the overall execution time is reduced and as a consequence we can effectively see that the Linear, Stack and Pool allocators are constant while malloc and free list are clearly linear. The free list implementation using black tree can reduce the complexity to O(log n) and therefore its position in the chart would be between the pool allocator and the free list.

Time complexity of different allocators

_Note: The time complexity (in general) scales following a linear fashion regarding the size of the allocation request.

Space complexity

As we can see, even that the space complexity for each allocator is slightly different(due to constants), in the end, all of them have the same space complexity O(N). It is very clear, then, why when denoting big O, constants can be ignored: because its weight in the overall equation is very low when N grows.

Space complexity of different allocators

Summary

This is a brief summary describing when you should use each allocator. From more restrictive and efficient allocators to less efficient and general.

  • Linear allocator. If your data does not follows any specific structure. However, there's a common behavior in time: all data "expires" after a certain time and then is no longer useful and thus can be freed. Think about games for example, you can allocate data in one frame using a this allocator and free all data at the start of the next frame.
  • Stack allocator. The same as the Linear allocator but think if it useful to free elements in a LIFO fashion.
  • Pool allocator. Your data has definitely a structure. All elements of your data have the same size. This is your choice, fast and no fragmentation.
  • Buddy allocator (Not implemented here). Your data is organized in exponential sizes power-of-two (1,2,4,8,16,32...). This allocator performs extremely well when data is structure in that way, being fast and wasting little space.
  • Free list allocator. No structure or common behavior. This allocator allows you to allocate and free memory as you wish. This is a general purpose allocator that works much better than malloc, but is not as good as the previous allocators, given its flexibility to work in all situations.

Last thoughts

  • Avoid dynamic memory as much as possible. Its behavior is unexpected and a source of problems
  • If you are worried about performance and your application uses dynamic memory, think about using a custom allocator instead of malloc
  • Try to understand your data and its behavior to choose the right allocator for you. Specific allocators (that impose restrictions on how we can structure/use our data) are far more better than the generic ones. We saw a huge gap between the specific purpose allocator: Linear, Stack and Pool and the general purpose: Free list and Malloc
  • Always choose a less restrictive (or more general) allocator if unsure. If you later see that your data is structured you can always change to use a more restrictive one.

Future work

  • Implement every memory allocator assuming that the alignment is always 8 bytes and thus everything is always align (we no longer need headers).
  • Implement a Free list allocator using Red Black Trees to improve performance from O(N) to O(log N)
  • Implement a Buddy allocator
  • Implement a Slab allocator
  • Benchmark internal fragmentation
  • Benchmark spatial location (cache misses)

Acknowledgments

Thanks to Vanessa and Krish for helping me in the different stages of this project.

Owner
Mariano Trebino
AI Engine Programmer @Crytek
Mariano Trebino
Comments
  • Can't compile the code.

    Can't compile the code.

    First of all thanks for such a good example and comparison of how memory allocator works.

    Sadly, after I followed the instruction and compiled it in VS, compiler told me I can't use "void* address [m_nOperations]" since m_nOperations is not a constant. How did you get it work?

  • There's a little confusion about the diagram

    There's a little confusion about the diagram

    hi, So lucky,I meet your github, your code of memory-allocator is very nice.But i can not understand the chart. (Time complexity of allocations/free blocksize = 4096)From the result of code(execute main), i don't get any correlation. I would be very grateful if you can answer my puzzlement. thank you.

  • How to use allocated memory

    How to use allocated memory

    Hi! First off, this is an amazing article, and it has really helped me out as a beginner to C/C++. Unfortunately, there lies my problem: as a beginner, I am rather new to how all of this works, so please forgive me if this comes off as an unsophisticated question.

    With that said, the purpose of the stack allocator (or any other allocator), as I understand it, is to segment off a piece of memory in the heap, then from elsewhere in your codebase allocate pieces of that block of memory for individual use, and free it back when you are done. Thus, one (i.e. me, the beginner) would expect that upon allocating memory viamyStack -> Allocate(64, 8), it would return a pointer to the memory you allocated so that you may edit it, and then free it when you are finished.

    Unfortunately for me, while it does return a pointer, I find myself unable to use it as such: upon assigning it (via allocatedMem = &myValue), the compiler throws an error stating that void* is not a ponter-to-object type. (Also, assigning it would be fruitless, as then it would no longer point to the allocated memory.)

    The tl;dr of it is: how do I use memory that I allocate? Are there some steps that I am missing, or is my beginner-ness just getting in the way of me thinking clearly?

    Thanks in advance, -cellman123

  • Can you share rest ppt of Dynamic memory ?

    Can you share rest ppt of Dynamic memory ?

    Thank you for sharing your code and ppt for Dynamic memory. There is only part 1 of your Dynamic memory tutorials and it is really nice to read and to learn. If it is possible, can you share the rest of the docs ! Thank you very much !

  • Correct some broken grammar / spelling / syntax.

    Correct some broken grammar / spelling / syntax.

    See https://duckduckgo.com/?q=english+grammar&ia=web ; https://duckduckgo.com/?q=english+spelling&ia=web and for the lighter side:

    • https://www.youtube.com/watch?v=8Gv0H-vPoDc

    • https://shlomif.livejournal.com/53966.html

    • http://i.imgur.com/HL1ZR.jpg .

  • Fixed build error in MSVC, updated to std library chrono library, etc..

    Fixed build error in MSVC, updated to std library chrono library, etc..

    This error was due to a combination of things, primarily that the Win32 API does not have a few of the time structs defined in <sys/types.h> and <time.h>, as well as <unistd.h>.

    To fix the problem, I changed the time tracking mechanism to the standard library's chrono::high_resolution_clock, and updated all the necessary data structures and functions. I also re-added some of the metrics that were commented out. The time elapsed, operations per second, and time per operation metrics are working as expected now.

    Excerpt from execution:

    
    	BENCHMARK: ALLOCATION
    A	@H 00000292BD9D2070	[email protected] 00000292BD9D2080	S 4112	AP 0	P 16	M 4112	R 99995888
    A	@H 00000292BD9D3080	[email protected] 00000292BD9D3090	S 80	AP 0	P 16	M 4192	R 99995808
    A	@H 00000292BD9D30D0	[email protected] 00000292BD9D30E0	S 4112	AP 0	P 16	M 8304	R 99991696
    A	@H 00000292BD9D40E0	[email protected] 00000292BD9D40F0	S 2064	AP 0	P 16	M 10368	R 99989632
    A	@H 00000292BD9D48F0	[email protected] 00000292BD9D4900	S 528	AP 0	P 16	M 10896	R 99989104
    A	@H 00000292BD9D4B00	[email protected] 00000292BD9D4B10	S 272	AP 0	P 16	M 11168	R 99988832
    A	@H 00000292BD9D4C10	[email protected] 00000292BD9D4C20	S 2064	AP 0	P 16	M 13232	R 99986768
    A	@H 00000292BD9D5420	[email protected] 00000292BD9D5430	S 48	AP 0	P 16	M 13280	R 99986720
    A	@H 00000292BD9D5450	[email protected] 00000292BD9D5460	S 2064	AP 0	P 16	M 15344	R 99984656
    A	@H 00000292BD9D5C60	[email protected] 00000292BD9D5C70	S 4112	AP 0	P 16	M 19456	R 99980544
    	RESULTS:
    		Operations:    	10
    		Time elapsed: 	42 ms
    		Op per sec:    	0.238095 ops/ms
    		Timer per op:  	4.2 ms/ops
    		Memory peak:   	19456 bytes
    
    	BENCHMARK: ALLOCATION/FREE
    A	@H 00000292BD9D0070	[email protected] 00000292BD9D0080	S 4112	AP 0	P 16	M 4112	R 99995888
    A	@H 00000292BD9D1080	[email protected] 00000292BD9D1090	S 80	AP 0	P 16	M 4192	R 99995808
    A	@H 00000292BD9D10D0	[email protected] 00000292BD9D10E0	S 4112	AP 0	P 16	M 8304	R 99991696
    A	@H 00000292BD9D20E0	[email protected] 00000292BD9D20F0	S 2064	AP 0	P 16	M 10368	R 99989632
    A	@H 00000292BD9D28F0	[email protected] 00000292BD9D2900	S 528	AP 0	P 16	M 10896	R 99989104
    A	@H 00000292BD9D2B00	[email protected] 00000292BD9D2B10	S 272	AP 0	P 16	M 11168	R 99988832
    A	@H 00000292BD9D2C10	[email protected] 00000292BD9D2C20	S 2064	AP 0	P 16	M 13232	R 99986768
    A	@H 00000292BD9D3420	[email protected] 00000292BD9D3430	S 48	AP 0	P 16	M 13280	R 99986720
    A	@H 00000292BD9D3450	[email protected] 00000292BD9D3460	S 2064	AP 0	P 16	M 15344	R 99984656
    A	@H 00000292BD9D3C60	[email protected] 00000292BD9D3C70	S 4112	AP 0	P 16	M 19456	R 99980544
    	Merging(n) 00000292BD9D3C60 & 0000000000000000	S 99984656
    F	@ptr 00000292BD9D3C70	[email protected] 00000292BD9D3C60	S 99984656	M 15344
    	Merging(n) 00000292BD9D3450 & 0000000000000000	S 99986720
    F	@ptr 00000292BD9D3460	[email protected] 00000292BD9D3450	S 99986720	M 13280
    	Merging(n) 00000292BD9D3420 & 0000000000000000	S 99986768
    F	@ptr 00000292BD9D3430	[email protected] 00000292BD9D3420	S 99986768	M 13232
    	Merging(n) 00000292BD9D2C10 & 0000000000000000	S 99988832
    F	@ptr 00000292BD9D2C20	[email protected] 00000292BD9D2C10	S 99988832	M 11168
    	Merging(n) 00000292BD9D2B00 & 0000000000000000	S 99989104
    F	@ptr 00000292BD9D2B10	[email protected] 00000292BD9D2B00	S 99989104	M 10896
    	Merging(n) 00000292BD9D28F0 & 0000000000000000	S 99989632
    F	@ptr 00000292BD9D2900	[email protected] 00000292BD9D28F0	S 99989632	M 10368
    	Merging(n) 00000292BD9D20E0 & 0000000000000000	S 99991696
    F	@ptr 00000292BD9D20F0	[email protected] 00000292BD9D20E0	S 99991696	M 8304
    	Merging(n) 00000292BD9D10D0 & 0000000000000000	S 99995808
    F	@ptr 00000292BD9D10E0	[email protected] 00000292BD9D10D0	S 99995808	M 4192
    	Merging(n) 00000292BD9D1080 & 0000000000000000	S 99995888
    F	@ptr 00000292BD9D1090	[email protected] 00000292BD9D1080	S 99995888	M 4112
    	Merging(n) 00000292BD9D0070 & 0000000000000000	S 100000000
    F	@ptr 00000292BD9D0080	[email protected] 00000292BD9D0070	S 100000000	M 0
    	RESULTS:
    		Operations:    	10
    		Time elapsed: 	43 ms
    		Op per sec:    	0.232558 ops/ms
    		Timer per op:  	4.3 ms/ops
    		Memory peak:   	19456 bytes
    
  • Simplify build instructions

    Simplify build instructions

    Use CMake's --build option instead of specify platform specific build tool name. That way we don't need to add any build generator name, CMake choose instead of us.

  • Found a bug in StackAllocator

    Found a bug in StackAllocator

    image There seems to be a bug in this line. headerPtr = &allocationHeader;

    Should be changed to *headerPtr = allocationHeader; otherwise the padding value is not recorded in the header memory. I tested the original code, and the stack head address mismatches after one allocation and one free.

    image

  • Optimize PoolAllocator initialization by adding an Offset variable?

    Optimize PoolAllocator initialization by adding an Offset variable?

    As discussed in readme.md, the pool allocator initialize freeNodeList with O(N) complexity. However before any free() is invoked, it behaves like a LinearAllocator.

    So we could just leave the freeNodeList empty, and add an Offset variable to indicate current free node, and increase this Offset whenever allocation is performed. When free() is called, we add the freed space into freeNodeList, which will be looked upon first for later allocations.

  • StackAllocator is not performant and wastes memory on padding

    StackAllocator is not performant and wastes memory on padding

    I assume that the project has been abandoned since a some pull requests are pending.

    Anyways, I will address a few things here for those coming to this repository in search of knowledge from this code base.

    Problems with using "AllocationHeader"

    • Wasted Memory: AllocationHeader as is used in this repository simply adds overhead in terms of wasted padding memory. Regardless of the fact that the memory(m_start_ptr + m_offset) is already aligned, AllocationHeader header wastes as low as alignment bytes or padding + alignment bytes of memory.
    • Probable Cache Misses: Since the AllocationHeader is stored before the memory pointer that allocate gives away, there is a probability of cache miss when using free if the memory pulled in does not align with cache boundary. It might be a small cost or a big one depending upon how many times free is called. It's bad since the entire point of a custom allocator is performance with solid metrics of memory usage.
    • Current Implementation of AllocationHeader stores garbage: As has been addressed in issue #8, AllocationHeader currently stores garbage since it stores the address of a stack allocated AllocationHeader that would go out of scope once it goes out of scope i.e. as soon as the allocate function ends
    AllocationHeader allocationHeader{padding};
    AllocationHeader * headerPtr = (AllocationHeader*) headerAddress;
    headerPtr = &allocationHeader;   // <---------- allocationHeader would go out of scope
    

    A better way of implementing is to use push/begin constructs that would store the offset of stack and pop/end which would reinstate the stack allocator back to the pushed offset.

  • Fixed StackAllocator having garbage in the header on Free

    Fixed StackAllocator having garbage in the header on Free

    Addressed #8 where calling Free on the StackAllocator would access garbage in the AllocationHeader

    While I was there I made the points declarations more consistent with the codebase and specified the overriding methods in the definition.

    I was going to address #22 in this PR, but after more thought, the highest an alignment should ever be is 64 for 512 SEE intrinsics. So the char shouldn't be an issue and takes up less space than a size_t.

  • freelist allcoator bugfix

    freelist allcoator bugfix

    1. when allocate memory: if rest < sizeof(Node), it should not insert to free list.
    2. when free memory: if free list head is nullptr, we should set head ptr.
    3. when free memory: free node ptr should offset padding bytes.
  • Make AllocationHeader::padding size_t

    Make AllocationHeader::padding size_t

    Compilation Error

    When compiling with Clang on Xcode, I get the following error

    ../src/StackAllocator.cpp:39:39: error: non-constant-expression cannot be narrowed from type 'std::size_t' (aka 'unsigned long') to 'char' in initializer list [-Wc++11-narrowing]
        AllocationHeader allocationHeader{padding};
                                          ^~~~~~~
    ../src/StackAllocator.cpp:39:39: note: insert an explicit cast to silence this issue
        AllocationHeader allocationHeader{padding};
                                          ^~~~~~~
                                          static_cast<char>( )
    

    Compiler Details

    Apple clang version 12.0.0 (clang-1200.0.32.28)
    Target: x86_64-apple-darwin19.6.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    

    In case padding can go beyond 255, it makes sense to make AllocationHeader::padding a size_t instead of a char as suggested by @talhasaruhan in #8.

Allocator bench - bench of various memory allocators

To run benchmarks Install lockless from https://locklessinc.com/downloads/ in lockless_allocator path make Install Hoard from https://github.com/emery

Dec 4, 2022
A memory allocation program, it is used for doing an experiment to find out the detail of Microsoft Windows taskmgr performance information
A memory allocation program, it is used for doing an experiment to find out the detail of Microsoft Windows taskmgr performance information

memory-allocation-test A memory allocation program, it is used for doing an experiment to find out the detail of Microsoft Windows taskmgr performance

Dec 22, 2022
A tool for tracking memory allocation based ld-preload

libmallocTrace A tool for tracking memory allocation based ld-preload how to build make cd example && make how to use a simple way is to execute some

Mar 12, 2022
A dynamic library for enhancing Tomb Raider 1

Tomb1Main This is a dynamic library for the classic Tomb Raider I game (TombATI version). The purpose of the library is to reimplement all the routine

Dec 31, 2022
Custom implementation of C stdlib malloc(), realloc(), and free() functions.

C-Stdlib-Malloc-Implementation NOT INTENDED TO BE COMPILED AND RAN, DRIVER CODE NOT OWNED BY I, ARCINI This is a custom implmentation of the standard

Dec 27, 2021
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Jan 2, 2023
MMCTX (Memory Management ConTeXualizer), is a tiny (< 300 lines), single header C99 library that allows for easier memory management by implementing contexts that remember allocations for you and provide freeall()-like functionality.

MMCTX (Memory Management ConTeXualizer), is a tiny (< 300 lines), single header C99 library that allows for easier memory management by implementing contexts that remember allocations for you and provide freeall()-like functionality.

Oct 2, 2021
Memory-dumper - A tool for dumping files from processes memory

What is memory-dumper memory-dumper is a tool for dumping files from process's memory. The main purpose is to find patterns inside the process's memor

Nov 9, 2022
Mesh - A memory allocator that automatically reduces the memory footprint of C/C++ applications.
Mesh - A memory allocator that automatically reduces the memory footprint of C/C++ applications.

Mesh: Compacting Memory Management for C/C++ Mesh is a drop in replacement for malloc(3) that can transparently recover from memory fragmentation with

Dec 30, 2022
mimalloc is a compact general purpose allocator with excellent performance.
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Dec 30, 2022
A C++ Class and Template Library for Performance Critical Applications

Spirick Tuning A C++ Class and Template Library for Performance Critical Applications Optimized for Performance The Spirick Tuning library provides a

Dec 6, 2021
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

memory The C++ STL allocator model has various flaws. For example, they are fixed to a certain type, because they are almost necessarily required to b

Dec 26, 2022
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

rpmalloc - General Purpose Memory Allocator This library provides a public domain cross platform lock free thread caching 16-byte aligned memory alloc

Dec 28, 2022
OpenXenium JTAG and Flash Memory programmer
OpenXenium JTAG and Flash Memory programmer

OpenXenium JTAG and Flash Memory programmer * Read: "Home Brew" on ORIGINAL XBOX - a detailed article on why and how * The tools in this repo will all

Oct 23, 2022
manually map driver for a signed driver memory space

smap manually map driver for a signed driver memory space credits https://github.com/btbd/umap tested system Windows 10 Education 20H2 UEFI installati

Dec 17, 2022
Memory instrumentation tool for android app&game developers.
Memory instrumentation tool for android app&game developers.

Overview LoliProfiler is a C/C++ memory profiling tool for Android games and applications. LoliProfiler supports profiling debuggable applications out

Jan 6, 2023
A single file drop-in memory leak tracking solution for C++ on Windows

MemLeakTracker A single file drop-in memory leak tracking solution for C++ on Windows This small piece of code allows for global memory leak tracking

Jul 18, 2022
Dump the memory of a PPL with a userland exploit
Dump the memory of a PPL with a userland exploit

PPLdump This tool implements a userland exploit that was initially discussed by James Forshaw (a.k.a. @tiraniddo) - in this blog post - for dumping th

Dec 29, 2022
Implementation of System V shared memory (a type of inter process communication) in xv6 operating system.

NOTE: we have stopped maintaining the x86 version of xv6, and switched our efforts to the RISC-V version (https://github.com/mit-pdos/xv6-riscv.git)

Feb 21, 2022