Composing High-Performance Memory Allocators Emery Berger, Ben Zorn, Kathryn McKinley.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,
Reconsidering Custom Memory Allocation Emery D. Berger Benjamin G. Zorn Kathryn S. McKinley November 2002 Proceedings of the Conference on Object-Oriented.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
1 Smart Memory for Smart Phones Chris Clack University College London
KERNEL MEMORY ALLOCATION Unix Internals, Uresh Vahalia Sowmya Ponugoti CMSC 691X.
U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley.
Malloc Recitation Section K (Kevin Su) November 5 th, 2012.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
CORK: DYNAMIC MEMORY LEAK DETECTION FOR GARBAGE- COLLECTED LANGUAGES A TRADEOFF BETWEEN EFFICIENCY AND ACCURATE, USEFUL RESULTS.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000 Emery Berger, Kathryn McKinley *, Robert Blumofe, Paul.
Chapter 9 – Real Memory Organization and Management
Multiprocessing Memory Management
1 Optimizing Malloc and Free Professor Jennifer Rexford
Memory Management 2010.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science PLDI 2006 DieHard: Probabilistic Memory Safety for Unsafe Programming Languages Emery.
Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 1 Emery Berger Memory Management for High-Performance Applications Department.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2006 Exterminator: Automatically Correcting Memory Errors Gene Novark, Emery Berger.
Oakkar Fall The Need for Decision Engine Automate business processes Implement complex business decision logic Separation of rules and process Business.
Who am I? ● Catalin Comanici ● QA for 10 years, doing test automation for about 6 years ● fun guy and rock star wannabe.
Real-Time Concepts for Embedded Systems Author: Qing Li with Caroline Yao ISBN: CMPBooks.
Previous Next 06/18/2000Shanghai Jiaotong Univ. Computer Science & Engineering Dept. C+J Software Architecture Shanghai Jiaotong University Author: Lu,
CSE 332: C++ templates This Week C++ Templates –Another form of polymorphism (interface based) –Let you plug different types into reusable code Assigned.
Paper by Engler, Kaashoek, O’Toole Presentation by Charles Haiber.
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
Copyright © 2005 Andrei Alexandrescu 1 Chromed Metal Safe and Fast C++ Andrei Alexandrescu
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.
OOPLs /FEN March 2004 Object-Oriented Languages1 Object-Oriented Languages - Design and Implementation Java: Behind the Scenes Finn E. Nordbjerg,
University of Washington Today Finished up virtual memory On to memory allocation Lab 3 grades up HW 4 up later today. Lab 5 out (this afternoon): time.
1 Advanced Memory Management Techniques  static vs. dynamic kernel memory allocation  resource map allocation  power-of-two free list allocation  buddy.
Hoard: A Scalable Memory Allocator for Multithreaded Applications Emery Berger, Kathryn McKinley, Robert Blumofe, Paul Wilson Presented by Dimitris Prountzos.
CS 241 Discussion Section (11/17/2011). Outline Review of MP7 MP8 Overview Simple Code Examples (Bad before the Good) Theory behind MP8.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Yi Feng & Emery Berger University of Massachusetts Amherst A Locality-Improving.
11th Nov 2004PLDI Region Inference for an Object-Oriented Language Wei Ngan Chin 1,2 Joint work with Florin Craciun 1, Shengchao Qin 1,2, Martin.
Modularly Typesafe Interface Dispatch in JPred Christopher Frost and Todd Millstein University of California, Los Angeles
Processes and Virtual Memory
Computer Graphics 3 Lecture 1: Introduction to C/C++ Programming Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Memory Management Overview.
Consider Starting with 160 k of memory do: Starting with 160 k of memory do: Allocate p1 (50 k) Allocate p1 (50 k) Allocate p2 (30 k) Allocate p2 (30 k)
Efficient Detection of All Pointer and Array Access Errors Todd M.Austin Scott E.Breach Gurindar S.Sohi Computer Sciences Department University of Wisconsin-Madison.
Efficient Dynamic Heap Allocation of Scratch-Pad Memory Ross McIlroy, Peter Dickman and Joe Sventek Carnegie Trust for the Universities of Scotland.
CS 241 Discussion Section (2/9/2012). MP2 continued Implement malloc, free, calloc and realloc Reuse free memory – Sequential fit – Segregated fit.
CS 241 Discussion Section (12/1/2011). Tradeoffs When do you: – Expand Increase total memory usage – Split Make smaller chunks (avoid internal fragmentation)
CSE 351 Dynamic Memory Allocation 1. Dynamic Memory Dynamic memory is memory that is “requested” at run- time Solves two fundamental dilemmas: How can.
VM: Chapter 7 Buffer Overflows. csci5233 computer security & integrity (VM: Ch. 7) 2 Outline Impact of buffer overflows What is a buffer overflow? Types.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
Region-Based Software Distributed Shared Memory Song Li, Yu Lin, and Michael Walker CS Operating Systems May 1, 2000.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
Ruby Classes, Modules & Mixins
Chapter 9 – Real Memory Organization and Management
PA1 is out Best by Feb , 10:00 pm Enjoy early
Checking Memory Management
Reconsidering Custom Memory Allocation
Optimizing Malloc and Free
Yet Another Memory Manager
Programming with Regions
Yet Another Memory Manager
Memory Management Overview
Presentation transcript:

Composing High-Performance Memory Allocators Emery Berger, Ben Zorn, Kathryn McKinley

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 2 Motivation & Contributions Programs increasingly allocation intensive –spend more than half of runtime in malloc / free  programmers require high performance allocators –often build own custom allocators Heap layers infrastructure for building memory allocators –composable, extensible, and high-performance –based on C++ templates –custom and general-purpose, competitive with state-of-the-art

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 3 Outline High-performance memory allocators –focus on custom allocators –pros & cons of current practice Previous work Heap layers –how it works –examples Experimental results –custom & general-purpose allocators

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 4 Using Custom Allocators Can be very fast: –Linked lists of objects for highly-used classes –Region (arena, zone) allocators “Best practices” [Meyers 1995, Bulka 2001] –Used in 3 SPEC2000 benchmarks (parser, gcc, vpr), Apache, PGP, SQLServer, etc.

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 5 Custom Allocators Work Using a custom allocator reduces runtime by 60%

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 6 Problems with Current Practice Brittle code –written from scratch –macros/monolithic functions to avoid overhead  hard to write, reuse or maintain Excessive fragmentation –good memory allocators: complicated, not retargettable

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 7 Allocator Conceptual Design People think & talk about heaps as if they were modular: Select heap based on size mallocfree Manage small objects System memory manager Manage large objects

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 8 Infrastructure Requirements Flexible –can add functionality Reusable –in other contexts & in same program Fast –very low or no overhead High-level –as component-like as possible

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 9 Possible Solutions FlexibleReusableFastHigh-level Indirect function calls (Vmalloc [Vo 1996]) function call overhead function-pointer assignment Object-oriented (CMM [Attardi et al. 1998]) rigid hierarchy virtual method overhead Mixins (our approach)

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 10 Ordinary Classes vs. Mixins Ordinary classes –fixed inheritance dag –can’t rearrange hierarchy –can’t use class multiple times Mixins –no fixed inheritance dag –multiple hierarchies possible –can reuse classes –fast: static dispatch

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 11 A Heap Layer void * malloc (sz) { do something; void * p = SuperHeap::malloc (sz); do something else; return p; } heap layer template class HeapLayer : public SuperHeap {…}; Provides malloc and free methods “Top heaps” get memory from system –e.g., mallocHeap uses C library’s malloc and free

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 12 LockedHeap mallocHeap void * malloc (sz) { acquire lock; void * p = release lock; return p; } Example: Thread-safety LockedHeap protects the parent heap with a single lock class LockedMallocHeap: public LockedHeap {}; SuperHeap::malloc (sz);

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 13 Example: Debugging DebugHeap Protects against invalid & multiple frees. DebugHeap class LockedDebugMallocHeap: public LockedHeap > {}; LockedHeap void free (p) { check that p is valid; check that p hasn’t been freed before; } SuperHeap::free (p); mallocHeap

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 14 Implementation in Heap Layers Modular design and implementation SegHeap mallocfree SizeHeap FreelistHeap manage objects on freelist add size info to objects select heap based on size

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 15 Experimental Methodology Built replacement allocators using heap layers –custom allocators: XallocHeap (197.parser), ObstackHeap (176.gcc) –general-purpose allocators: KingsleyHeap (BSD allocator) LeaHeap (based on Lea allocator 2.7.0) –three weeks to develop –500 lines vs. 2,000 lines in original Compared performance with original allocators –SPEC benchmarks & standard allocation benchmarks

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 16 Experimental Results: Custom Allocation – gcc

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 17 Experimental Results: General-Purpose Allocators

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 18 Experimental Results: General-Purpose Allocators

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 19 Conclusion Heap layers infrastructure for composing allocators Useful experimental infrastructure Allows rapid implementation of high-quality allocators –custom allocators as fast as originals –general-purpose allocators comparable to state-of-the-art in speed and efficiency

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 20

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 21 A Library of Heap Layers Top heaps mallocHeap, mmapHeap, sbrkHeap Building-blocks AdaptHeap, FreelistHeap, CoalesceHeap Combining heaps HybridHeap, TryHeap, SegHeap, StrictSegHeap Utility layers ANSIWrapper, DebugHeap, LockedHeap, PerClassHeap, STLAdapter

PLDI Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 22 Heap Layers as Experimental Infrastructure Kingsley allocator averages 50% internal fragmentation what’s the impact of adding coalescing? Just add coalescing layer two lines of code! Result: Almost as memory-efficient as Lea allocator Reasonably fast for all but most allocation- intensive apps