Download presentation
Presentation is loading. Please wait.
Published byBrian Roley Modified over 10 years ago
1
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts, Amherst
2
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 2 High-Performance Applications Web servers, search engines, scientific codes C or C++ (still…) Run on one or cluster of server boxes software compiler runtime system operating system hardware Needs support at every level
3
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 3 New Applications, Old Memory Managers Applications and hardware have changed Multiprocessors now commonplace Object-oriented, multithreaded Increased pressure on memory manager ( malloc, free ) But memory managers have not kept up Inadequate support for modern applications
4
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 4 Current Memory Managers Limit Scalability As we add processors, program slows down Caused by heap contention Larson server benchmark on 14-processor Sun
5
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 5 The Problem Current memory managers inadequate for high-performance applications on modern architectures Limit scalability, application performance, and robustness
6
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 6 This Talk Building memory managers Heap Layers framework [PLDI 2001] Problems with current memory managers Contention, false sharing, space Solution: provably scalable memory manager Hoard [ASPLOS-IX] Extended memory manager for servers Reap [OOPSLA 2002]
7
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 7 Implementing Memory Managers Memory managers must be Space efficient Very fast Heavily-optimized code Hand-unrolled loops Macros Monolithic functions Hard to write, reuse, or extend
8
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 8 Building Modular Memory Managers Classes Overhead Rigid hierarchy Mixins No overhead Flexible hierarchy
9
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 9 A Heap Layer template class GreenHeapLayer : public SuperHeap {…}; Mixin with malloc & free methods
10
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 10 Example: Thread-Safe Heap Layer LockedHeap protect the superheap with a lock LockedMallocHeap
11
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 11 Empirical Results Heap Layers vs. originals: KingsleyHeap vs. BSD allocator LeaHeap vs. DLmalloc 2.7 Competitive runtime and memory efficiency
12
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 12 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap
13
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 13 Problems with General-Purpose Memory Managers Previous work for multiprocessors Concurrent single heap [Bigler et al. 85, Johnson 91, Iyengar 92] Impractical Multiple heaps [Larson 98, Gloger 99] Reduce contention but cause other problems: P-fold or even unbounded increase in space Allocator-induced false sharing we show
14
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 14 Multiple Heap Allocator: Pure Private Heaps One heap per processor: malloc gets memory from its local heap free puts memory on its local heap STL, Cilk, ad hoc x1= malloc(1) free(x1)free(x2) x3= malloc(1) x2= malloc(1) x4= malloc(1) processor 0processor 1 = in use, processor 0 = free, on heap 1 free(x3) free(x4) Key:
15
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 15 Problem: Unbounded Memory Consumption Producer-consumer: Processor 0 allocates Processor 1 frees Unbounded memory consumption Crash! free(x1) x2= malloc(1) free(x2) x1= malloc(1) processor 0processor 1 x3= malloc(1) free(x3)
16
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 16 Multiple Heap Allocator: Private Heaps with Ownership free returns memory to original heap Bounded memory consumption No crash! “Ptmalloc” (Linux), LKmalloc x1= malloc(1) free(x1) free(x2) x2= malloc(1) processor 0processor 1
17
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 17 Problem: P-fold Memory Blowup Occurs in practice Round-robin producer- consumer processor i mod P allocates processor (i+1) mod P frees Footprint = 1 (2GB), but space = 3 (6GB) Exceeds 32-bit address space: Crash! free(x2) free(x1) free(x3) x1= malloc(1) x2= malloc(1) x3=malloc(1) processor 0processor 1processor 2
18
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 18 Problem: Allocator-Induced False Sharing False sharing Non-shared objects on same cache line Bane of parallel applications Extensively studied All these allocators cause false sharing! processor 0processor 1 x2= malloc(1)x1= malloc(1) cache line thrash…
19
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 19 So What Do We Do Now? Where do we put free memory? on central heap: on our own heap: (pure private heaps) on the original heap: (private heaps with ownership) How do we avoid false sharing? Heap contention Unbounded memory consumption P-fold blowup
20
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 20 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap
21
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 21 Hoard: Key Insights Bound local memory consumption Explicitly track utilization Move free memory to a global heap Provably bounds memory consumption Manage memory in large chunks Avoids false sharing Reduces heap contention
22
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 22 Overview of Hoard Manage memory in heap blocks Page-sized Avoids false sharing Allocate from local heap block Avoids heap contention Low utilization Move heap block to global heap Avoids space blowup global heap … processor 0processor P-1
23
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 23 Summary of Analytical Results Space consumption: near optimal worst-case Hoard:O(n log M/m + P) {P « n} Optimal: O(n log M/m) [Robson 70]: ≈ bin-packing Private heaps with ownership: O(P n log M/m) Provably low synchronization n = memory required M = biggest object size m = smallest object size P = processors
24
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 24 Empirical Results Measure runtime on 14-processor Sun Allocators Solaris (system allocator) Ptmalloc (GNU libc) mtmalloc (Sun’s “MT-hot” allocator) Micro-benchmarks Threadtest:no sharing Larson: sharing (server-style) Cache-scratch:mostly reads & writes (tests for false sharing) Real application experience similar
25
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 25 Runtime Performance: threadtest speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors) Many threads, no sharing Hoard achieves linear speedup
26
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 26 Runtime Performance: Larson Many threads, sharing (server-style) Hoard achieves linear speedup
27
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 27 Runtime Performance: false sharing Many threads, mostly reads & writes of heap data Hoard achieves linear speedup
28
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 28 Hoard in the “Real World” Open source code www.hoard.org 13,000 downloads Solaris, Linux, Windows, IRIX, … Widely used in industry AOL, British Telecom, Novell, Philips Reports: 2x-10x, “impressive” improvement in performance Search server, telecom billing systems, scene rendering, real-time messaging middleware, text-to-speech engine, telephony, JVM Scalable general-purpose memory manager
29
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 29 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap
30
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 30 Custom Memory Allocation Programmers often replace malloc / free Attempt to increase performance Provide extra functionality (e.g., for servers) Reduce space (rarely) Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective, except for regions. [OOPSLA 2002]
31
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 31 Overview of Regions + Fast + Pointer-bumping allocation + Deletion of chunks + Convenient + One call frees all memory regionmalloc(r, sz) regiondelete(r) Separate areas, deletion only en masse regioncreate(r) r - Risky - Accidental deletion - Too much space
32
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 32 Why Regions? Apparently faster, more space-efficient Servers need memory management support: Avoid resource leaks Tear down memory associated with terminated connections or transactions Current approach (e.g., Apache): regions
33
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 33 Drawbacks of Regions Can’t reclaim memory within regions Problem for long-running computations, producer-consumer patterns, off-the-shelf “malloc/free” programs unbounded memory consumption Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
34
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 34 Reap = region + heap Adds individual object deletion & heap Reap Hybrid Allocator reapmalloc(r, sz) reapdelete(r) reapcreate(r) r reapfree(r,p) Can reduce memory consumption Fast Adapts to use (region or heap style) Cheap deletion
35
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 35 Using Reap as Regions Reap performance nearly matches regions
36
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 36 Reap: Best of Both Worlds Combining new / delete with regions usually impossible: Incompatible API’s Hard to rewrite code Use Reap: Incorporate new / delete code into Apache “mod_bc” (arbitrary-precision calculator) Changed 20 lines (out of 8000) Benchmark: compute 1000 th prime With Reap: 240K Without Reap: 7.4MB
37
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 37 Open Questions Grand Unified Memory Manager? Hoard + Reap Integration with garbage collection Effective Custom Allocators? Exploit sizes, lifetimes, locality and sharing Challenges of newer architectures NUMA, SMT/CMP, 64-bit, predication
38
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 38 Current Work: Robust Performance Currently: no VM-GC communicaton BAD interactions under memory pressure Our approach (with Eliot Moss, Scott Kaplan): Cooperative Robust Automatic Memory Management Garbage collector / allocator Virtual memory manager LRU queue memory pressure empty pages reduced impact
39
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 39 Current Work: Predictable VMM Recent work on scheduling for QoS E.g., proportional-share Under memory pressure, VMM is scheduler Paged-out processes may never recover Intermittent processes may wait long time Scheduler-faithful virtual memory (with Scott Kaplan, Prashant Shenoy) Based on page value rather than order
40
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 40 Conclusion Memory management for high-performance applications Heap Layers framework [PLDI 2001] Reusable components, no runtime cost Hoard scalable memory manager [ASPLOS-IX] High-performance, provably scalable & space-efficient Reap hybrid memory manager [OOPSLA 2002] Provides speed & robustness for server applications Current work: robust memory management for multiprogramming
41
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 41 The Obligatory URL Slide http://www.cs.umass.edu/~emery
42
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 42 If You Can Read This, I Went Too Far
43
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 43 Hoard: Under the Hood select heap based on size malloc from local heap, free to heap block get or return memory to global heap
44
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 44 Custom Memory Allocation Very common practice Apache, gcc, lcc, STL, database servers… Language-level support in C++ Replace new / delete, bypassing general-purpose allocator Reduce runtime – often Expand functionality – sometimes Reduce space – rarely “Use custom allocators”
45
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 45 Drawbacks of Custom Allocators Avoiding memory manager means: More code to maintain & debug Can’t use memory debuggers Not modular or robust: Mix memory from custom and general-purpose allocators → crash! Increased burden on programmers
46
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 46 Overview Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose allocators Advantages and drawbacks of regions Reaps – generalization of regions & heaps
47
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 47 (1) Per-Class Allocators a b c a = new Class1; b = new Class1; c = new Class1; delete a; delete b; delete c; a = new Class1; b = new Class1; c = new Class1; Recycle freed objects from a free list + Fast + Linked list operations + Simple + Identical semantics + C++ language support - Possibly space-inefficient
48
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 48 (II) Custom Patterns Tailor-made to fit allocation patterns Example: 197.parser (natural language parser) char[MEMORY_LIMIT] a = xalloc(8); b = xalloc(16); c = xalloc(8); xfree(b); xfree(c); d = xalloc(8); a b c d end_of_array + Fast + Pointer-bumping allocation - Brittle - Fixed memory size - Requires stack-like lifetimes
49
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 49 (III) Regions + Fast + Pointer-bumping allocation + Deletion of chunks + Convenient + One call frees all memory regionmalloc(r, sz) regiondelete(r) Separate areas, deletion only en masse regioncreate(r) r - Risky - Accidental deletion - Too much space
50
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 50 Overview Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose allocators Advantages and drawbacks of regions Reaps – generalization of regions & heaps
51
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 51 Custom Allocators Are Faster…
52
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 52 Not So Fast…
53
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 53 The Lea Allocator (DLmalloc 2.7.0) Optimized for common allocation patterns Per-size quicklists ≈ per-class allocation Deferred coalescing (combining adjacent free objects) Highly-optimized fastpath Space-efficient
54
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 54 Space Consumption Results
55
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 55 Overview Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose allocators Advantages and drawbacks of regions Reaps – generalization of regions & heaps
56
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 56 Why Regions? Apparently faster, more space-efficient Servers need memory management support: Avoid resource leaks Tear down memory associated with terminated connections or transactions Current approach (e.g., Apache): regions
57
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 57 Drawbacks of Regions Can’t reclaim memory within regions Problem for long-running computations, producer-consumer patterns, off-the-shelf “malloc/free” programs unbounded memory consumption Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
58
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 58 Reap = region + heap Adds individual object deletion & heap Reap Hybrid Allocator reapmalloc(r, sz) reapdelete(r) reapcreate(r) r reapfree(r,p) Can reduce memory consumption + Fast + Adapts to use (region or heap style) + Cheap deletion
59
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 59 Using Reap as Regions Reap performance nearly matches regions
60
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 60 Reap: Best of Both Worlds Combining new / delete with regions usually impossible: Incompatible API’s Hard to rewrite code Use Reap: Incorporate new / delete code into Apache “mod_bc” (arbitrary-precision calculator) Changed 20 lines (out of 8000) Benchmark: compute 1000 th prime With Reap: 240K Without Reap: 7.4MB
61
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 61 Conclusion Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective, except for regions Reaps: Nearly matches region performance without other drawbacks Take-home message: Stop using custom memory allocators!
62
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 62 Software http://www.cs.umass.edu/~emery (part of Heap Layers distribution)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.