Download presentation
Presentation is loading. Please wait.
1
Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William and Mary The 2006 International Symposium on Memory Management June 10, 2006
2
Outline Introduction Related Work Streamflow design: data structures and operations Experimental Evaluation Conclusions
3
Introduction Multithreading is becoming more common Sophistication of system software trails hardware Synchronization mechanisms used in system software can greatly effect performance
4
Related Work Hoard Emery Berger et al., ASPLOS 2000 Lock based, per-processor and global heaps Michael’s Maged Michael, PLDI 2004 Lock-free Tcmalloc Sanjay Ghemawat, part of Google’s perftools Lock based
5
Streamflow Promote scalability and reduce latency Lock-free algorithms and data structures Synchronization-free in the common case Decoupled remote object deallocation Promote locality Favors locally recycled objects in private heaps Thread-local heaps reduces false-sharing Removing object headers Custom page manager
6
Design: Data Structures
7
heapspageblocks
8
Design: Data Structures
14
Design: Allocation
17
Design: Local Free
18
pageblock belongs to current thread
19
Design: Local Free pageblock belongs to current thread
20
Design: Remote Free pageblock does not belong to current thread
21
Design: Remote Free pageblock does not belong to current thread
22
Design: Page Manager Manages pageblocks Implemented using superpages; 4MB vs. 4K Allows Streamflow to allocate pageblocks in contiguous physical memory regions Reduces TLB misses and minor page faults Superpage headers are managed similar to small objects Pageblocks are allocated within a superpage using buddy allocation
23
Evaluation: System 4 processor Dell PowerEdge 6650 Hyper-Threaded Intel Xeon processors at 2.0GHz 2 GB RAM Suse Linux 9.1 with kernel 2.6.13.4 and glibc 2-3.3 Hoard version 3.3.0 Tcmalloc version 0.4 Custom 32-bit implementation of Michael’s
24
Evaluation: Benchmarks Sequential Parser: SPECINT2000 English parser Multithreaded Synthetic Recycle: stresses local allocation and frees Larson: server simulator; stresses remote frees Consume: producer-consume Applications MPCDM: Multithreaded mesh generation
25
Evaluation: Sequential sequentialStreamflowmultithreaded
26
Evaluation: Multithreaded
30
Conclusions Presented a new memory allocator design Uses lock-free algorithms and data structures Synchronization-free in the common case Promotes locality at multiple levels Experimental evaluation shows the designs performs in practice http://www.cs.wm.edu/streamflow
31
Evaluation: Multithreaded
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.