Comparison of Memory Allocation Methods

Memory Allocation

Problems

Limit Memory Allocation (if not necessary)

Multithreaded programs often do not scale because the heap is a bottleneck.

When multiple threads simultaneously allocate or deallocate memory from the allocator, the allocator will serialize them. Programs making intensive use of the allocator actually slow down as the number of processors increases.

Malloc (libc) is the worse memory allocation API to use.

Programs should avoid, if possible, allocating/deallocations memory too often and in particular whenever a packet is received.

In the Linux kernel there are available kernel/driver patches for recycling skbuff (kernel memory used to store incoming/outgoing packets).

Using PF_RING (into the driver) for copying packets from the NIC to the circular buffer without any memory allocation increases the capture performance (around 10%) and reduces congestion issues.

Design Evolution

Basic design of malloc() is to dynamically pre-allocate a pool of memory from the OS in which applications can then take smaller pieces from. malloc() is a standard API having a choice of different allocation algorithms and to mitigate the expensive OS system calls (typically done at program initialization time) during allocation of its system memory. The first memory allocation scheme started with a stack-based memory allocation.

Next came the dynamic-based memory allocation scheme where linked-list and bucket-heap mechanism are used to divide the private-heap using size class approach.

Soon, garbage collection algorithm introduced the initial backend of the memory allocation scheme. Frontend covers the usual malloc() API, et. al.

In 2006, a third pool was introduced (after operating system memory pool and library-based memory pool) called the “arena”. Arena is a jemalloc-term and is intended to deal with different memory types such as different-speed memory bank or NUMA-architecture, as well as memory tied to specific to each of the multiple CPU core or even CPU infinity.

Frontend Evolution

Frontend manages the memory being given to the application.

Within the frontend of the memory allocation system, the evolution went in the following order:

link-list free space
heap-bucket size classes (eliminating an object header)
(Process) Owner encoding
single core local allocation buffers (CLABs)
Epoch encoding
Large-size class memory block by direct mmap()
Hazard pointers (safe memory reclamation for lock-free objects) (M.M. Michael, 2004)
Arena memory pool
thread-specific local allocation buffers (TLABs)
constant-time modulo synchronization (early return to OS pool, or FreeBSD madvise call)

Backend Evolution

Backend of the memory allocation system manages the empty, straggling, fragmented or no-longer used memory blocks back to the OS (thereby reducing RSS).

Pool semantic: Remote f-list encoding, using Treiber stack), (R.K. Treiber,
1986)
buddy algorithm
binary buddy algorithm
BIPOP Table (span-based allocator)(S. Schneider, 2006) aka local free list and
remote free list
segment queue (Quasi-linearizability, Y. Afek, 2010)
multi-core distributed queue (A. Haas, 2013)
k-FIFO queue (T.A. Henzinger, 2013)

Competition

There are better ones out there that does not worsen as more threads/processes performs memory allocation system calls; they are listed in best-to-good performance order[seed with source]:

Comparison of malloc design

name	Author	link	repo
ssmalloc		whitepaper
jemalloc	Jason Evans (Facebook	FreeBSD)	whitepaper ,
sfmalloc	“SNU	Korea”	,
Streamflow
ptmalloc3
nedmalloc-1.02
lockless
ptmalloc2	GNU Libc (dlmalloc-forked)
Hoard
libumem	SMI Solaris
mtmalloc
nedmalloc-1
tcmalloc	Google
ctmalloc	Google Code
ptmalloc	GNU Libc (dlmalloc-forked)
dlmalloc	“Doug Lea	FreeBSD”	blog, FTP
pkmalloc		whitepaper
phkmalloc	FreeBSD
malloc	GNU Libc (glibc)

Features of malloc

Algorithms - best-first

Wilderness Preservation - The ``wilderness’‘ chunk represents the space bordering the topmost address allocated from the system. Because it is at the border, it is the only chunk that can be arbitrarily extended (via sbrk in Unix) to be bigger than it is
Deferred Coalescing - Rather than coalescing freed chunks, leave them at their current sizes in hopes that another request for the same size will come along soon. This saves a coalesce, a later split, and the time it would take to find a non-exactly-matching chunk to split.
Preallocation - Rather than splitting out new chunks one-by one, pre-split many at once. This is normally faster than doing it one-at-a-time.

name	Algorithm	Wilderness Preservation	Deferred coalescing free chunk	Pre-split free chunk
ssmalloc
jemalloc
sfmalloc
Streamflow
ptmalloc3
nedmalloc-1.02
lockless
ptmalloc2
Hoard
libumem
mtmalloc
nedmalloc-1
tcmalloc
ctmalloc
ptmalloc
dlmalloc	best-first	Y	Y	Y
pkmalloc
phkmalloc
malloc

Unit Test Approaches

Malloc testing

Test various patterns of memory allocation, aiming for various levels of fragmentation. Perform the tests both in single-threaded and multi-threaded environments.

Checks for correctness:

None of the allocated blocks should overlap, and all should be successfully writable for the requested number of bytes.
Allocations made by posix_memalign should be correctly aligned and freeable by free.
Arguments to calloc which would overflow size_t when multiplied should result in allocation failure, not under-allocation.
Allocating a block so large that subtracting two pointers within that block could overflow ptrdiff_t should not be possible.

Further implementation-specific correctness checks: checking consistency of bookkeeping information before and after each allocated block.

Possible quality-of-implementation checks: Attempting to obtain pathological fragmentation and allocation failure where it should not happen.

Corruption check: The attacker could overflow a buffer dynamically allocated by malloc(3) and:

overwrite the next contiguous boundary tag (Netscape browsers exploit or
underflow such a buffer and overwrite the boundary tag stored just before Secure Locate exploit), or
cause the vulnerable program to perform an incorrect free(3) call LBML traceroute exploit, or
freeing multiple frees, or
overwrite a single byte of a boundary tag with a NUL byte (Sudo exploit)

Known malloc exploits:

[http://www.phrack.org/issues.html?issue=57&id=8#article]
[https://sploitfun.wordpress.com/2015/03/04/heap-overflow-using-malloc-maleficarum/]
[http://phrack.org/issues/66/10.html]

References

[https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc]
Origin of the word malloc is here.
History of malloc is [ here].
Heap and allocators
here
Anatomy of a Program in Memory