Download presentation
Presentation is loading. Please wait.
Published byKiya Hauke Modified over 9 years ago
1
Reconsidering Custom Memory Allocation Emery D. Berger Benjamin G. Zorn Kathryn S. McKinley November 2002 Proceedings of the Conference on Object-Oriented Programming: Systems, Languages, and Applications (OOPSLA) 2002
2
Lecture Topics Custom memory allocators General purpose allocators Regions (good performance) Reaps (very good performance and more) Results and Conclusions
3
Key Contributions of the paper A comprehensive evaluation of custom allocators Custom allocations vs. General-Purpose allocators (memory consumption and performance) Most programmers seeking faster memory allocation should use Lea allocator rather than writing their own
4
Key Contributions of the paper – Cont. The custom allocators that do provide higher performance use regions Reaps are even better
5
Key Contributions of the paper – Cont. If you need fast regions – use reaps Otherwise use Lea allocator, rather than any custom allocator.
6
Related Works Articles in the trade press claim Custom Allocators are a good idea “ Effective C++ ” “ C++ Programming language ” Benjamin Zorn in 1993 claims it to be a waste of time Articles on region allocation (arenas, groups, zones) We find that all of them are true
7
General-purpose memory allocators Windows XP allocator Lea allocator (Linux)
8
Lea Allocator An approximate best-fit allocator with different behavior based on object size Small Objects (<64 bytes) allocated by exact- size quicklists Medium Objects (<128K) – coalesce quicklists Large Objects – allocate and free by mmap The best allocator known
9
Our Benchmarks
10
Emulating Custom Semantics Custom allocators often support different semantics from C interface Region emulator Full region semantics General allocator Records a pointer to each allocated object to allow region deletion The pointer recorded in an out-of-band array – no impact on drag
11
Custom memory allocators - Definition Memory allocation mechanism that differs from general-purpose allocator in at least one of two: May provide more than one object for every allocated chunk of memory May not immediately return objects to the system/general-purpose allocator No wrappers
12
Custom allocators – widespread use Recommended as an optimization technique in a trade press Apache web server, GCC, C++ STL Direct support by C++ (by overloading new and delete operators)
13
Why programmers use Custom Allocators? Improving runtime performance Reducing memory consumption Improving software engineering (?)
14
Improving runtime performance 16% (average) of the run-time in the memory allocator Most our benchmarks reason Per-operation cost of general allocators In programs with intensive use of allocator
15
Improving runtime performance – Cont.
16
Reducing memory consumption
17
Improving software engineering (?) Memory allocated by a custom allocator can ’ t be managed by another allocator Free on custom allocated object may cause a segmentation fault Difficult to understand the source of memory consumption in the program No Purify No parallel allocator for SMP scalability No GC No shared multi-language heap
18
Improving software engineering (!) Region-based allocator simplifies memory management Memory area can be deleted by a single call Separate memory areas Regions are good for multithreaded server applications Memory spaces isolation Memory leaks prevention Apache web server
19
A Taxonomy of Custom Allocators Apply your knowledge about some set of objects Use regions to free objects dead at the same time Take advantage of object sizes Use known allocation patterns
20
Benchmark allocators characteristics Per-class allocators Regions Nested regions Obstack Custom patterns
21
Per-class allocators Objects of the same size (type) Eliding size checks Freelist with objects of the specific type The same API like malloc and free
22
Regions Allocation by incrementing a pointer to a large chunks of memory Only entire region deletion - no deletion of individual objects freeAll function Nested regions Nested object lifetime Obstack ( “ Object Stack ” ) Deletion of every object allocated after a certain object
23
Custom patterns A general – purpose allocator optimized for a particular pattern of object behavior
24
Custom allocators characteristics – Cont.
25
Problems with regions Excessive memory retention Unbounded memory consumption Unbounded buffers Dynamic arrays Producer – consumer patterns Complicated programming of server applications (Apache)
26
The ideal allocator Region Semantics + General-Puspose Allocation (heap) = Reaps
27
Heaps malloc free Regions malloc freeAll Reaps malloc free freeAll
28
Reaps - Example r region chunks heap reapCreate(r); x1 x1 = reapMalloc(r, 8); x2 x2 = reapMalloc(r, 8); x3 x3 = reapMalloc(r, 16); x4 x4 = reapMalloc(r, 8); x3 reapFree(r, x3);
29
Implementation Issues Initially, Region – similar behavior Allocation by bumping a pointer Geometrically-increasing chunks of memory threaded onto a linked list Header for every allocated object Freed objects (reapFree) are placed in an associated heap Allocations use memory from this heap
30
Reap allocation interface void reapCreate (void ** reap, void ** parent); void reapDestroy (void ** reap); void reapFreeAll (void ** reap); //clear void * reapMalloc (void ** reap, size_t size); void reapFree (void ** reap, void * object);
31
Design issues Heap Layers Mixins
32
Design issues – Cont. RegionHeap CoalesceableHeap ClearOptimizedHeap NestedHeap LeaHeap Sbrk
33
Design issues – Layers LeaHeap layer high speed low fragmentation NestedHeap layer ClearOptimizedHeap layer nothingOnHeap flag Fast allocations by pointer bumping on first heap Second heap – after freeing an object CoalesceableHeap layer adds per-object metadata RegionHeap layer Linked list of allocated objects clear()
34
Benchmark allocation statistics
35
Benchmark allocation statistics – Cont. Programs with general-purpose allocators Not allocation-intensive Spend little time in memory allocator Programs with custom allocators Tend to allocate many small objects More time in memory allocator Correct pinpointing of memory manager as a significant factor in the performance
36
Results Different memory management policies compared (general, custom, reaps) Execution time Memory consumption
37
Results - technicalities Runtime – the best of three Visual C++ 6.0 compilation Pentium III 600MHz 320Mb under Windows XP
38
Runtime Performance
39
Runtime Performance – Cont. Custom Vs Windows – justifies the use of custom allocator Lea provides almost the same performance as custom - except regions Reaps are comparable to Lea and to custom
40
Memory Consumption
41
Memory Consumption – Cont. No Windows XP – no equivalent way to keep track of memory consumption Reaps – don ’ t use individual deletion Mixed results Region space advantage - misleading
42
Evaluating Region Allocation Total drag – an average ratio of heap sizes with and without immediate object deallocation Immediate free of every dead object – total drag of 1 Non-region allocators – minimal drag Region allocators – high drag, substantial increase in memory consumption
43
Evaluating Region Allocation – Cont.
44
Experimental Comparison to previous work
45
Reaps in Apache Using space consumption advantages by allowing individual deletion bc – an arbitrary-precision calculator language Apache region rerouting to reaps + reapFree ( ap_pfree ) call Redefinition of malloc and free in bc Computing 1000 th prime consumes 7.4Mb without ap_free and 240 kilobytes with
46
Why programmers use custom allocators to no effect Recommended practice Premature optimization Drift Improved competition
47
Conclusions Despite widespread belief custom allocator doesn ’ t always improve performance Lea allocator is as fast or even faster The exception is region-based allocator Reaps – high-performance and reduction in memory consumption
48
Future plans Reaps integration with Hoard scalable memory allocator Reaps integration into garbage- collected setting
49
Questions ?
50
The End
51
Custom Allocator implementation Standard C++ way (inheritance) Significant overhead of virtual method dispatch Limits compiler optimizations Fixed relations between classes, single inheritance structure – difficult reuse
52
Mixins Can be reparented template class Mixin : public Super{}; No single class hierarchy class Composition1 : public A {}; class Composition2 : public A {};
53
Heap Layers Mixin Provides Malloc and Free Coding Guidelines Handle NULL returned by malloc() correctly Destructor must free any memory held by layer Top heaps – system-provided memory wrappers
54
Example – Composing a Per-Class Allocator Per – class pool of memory Same-sized objects Singly-linked freelist for memory management No change of source code for the original class PerClassHeap Utility Class - to adapt a class to use heap layer as its allocator FreeListHeap Heap Layer
55
Example - PerClassHeap Template class PerClassHeap : public Object { public: inline void * opertor new (size_t sz){ return getHeap().malloc (sz);} inline void * opertor delete (void * ptr){ return getHeap().free (ptr);} private: static SuperHeap& GetHeap (){ static SuperHeap theHeap; return theHeap;}};
56
Example - FreeListHeap
57
Example - Combination Foo subclass that uses per-class pools Class FasterFoo: public PerClassHeap > {};
58
The End!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.