Institute of Computing Technology On Improving Heap Memory Layout by Dynamic Pool Allocation Zhenjiang Wang Chenggang Wu Institute of Computing Technology,

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Memory.
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Dynamic Memory Allocation in C.  What is Memory What is Memory  Memory Allocation in C Memory Allocation in C  Difference b\w static memory allocation.
Carnegie Mellon 1 Dynamic Memory Allocation: Basic Concepts : Introduction to Computer Systems 17 th Lecture, Oct. 21, 2010 Instructors: Randy Bryant.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Chris Riesbeck, Fall 2007 Dynamic Memory Allocation Today Dynamic memory allocation – mechanisms & policies Memory bugs.
1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
Instructor: Alexander Stoytchev CprE 185: Intro to Problem Solving (using C)
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
Access Region Locality for High- Bandwidth Processor Memory System Design Sangyeun Cho Samsung/U of Minnesota Pen-Chung Yew U of Minnesota Gyungho Lee.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
1 Utilizing Field Usage Patterns for Java Heap Space Optimization Z. Guo, N. Amaral, D. Szafron and Y. Wang Department of Computing Science University.
Run-time Environment and Program Organization
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
Using Generational Garbage Collection To Implement Cache- conscious Data Placement Trishul M. Chilimbi & James R. Larus מציג : ראובן ביק.
Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.
Real-Time Concepts for Embedded Systems Author: Qing Li with Caroline Yao ISBN: CMPBooks.
COP4020 Programming Languages
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
A Lightweight Hybrid Hardware/Software Approach for Object-Relative Memory Profiling Licheng Chen, Zehan Cui, Yungang Bao, Mingyu Chen, Yongbing Huang,
Cache Locality for Non-numerical Codes María Jesús Garzarán University of Illinois at Urbana-Champaign.
Lecture 10 : Introduction to Java Virtual Machine
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
Compiler Construction
Optimizing dynamic dispatch with fine-grained state tracking Salikh Zakirov, Shigeru Chiba and Etsuya Shibayama Tokyo Institute of Technology Dept. of.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008.
Cache-Conscious Structure Definition By Trishul M. Chilimbi, Bob Davidson, and James R. Larus Presented by Shelley Chen March 10, 2003.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Yi Feng & Emery Berger University of Massachusetts Amherst A Locality-Improving.
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Instructor: Alexander Stoytchev CprE 185: Intro to Problem Solving (using C)
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
Protecting C and C++ programs from current and future code injection attacks Yves Younan, Wouter Joosen and Frank Piessens DistriNet Department of Computer.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
MSP’05 1 Gated Memory Control for Memory Monitoring, Leak Detection and Garbage Collection Chen Ding, Chengliang Zhang Xipeng Shen, Mitsunori Ogihara University.
Ran Liu (Fudan Univ. Shanghai Jiaotong Univ.)
Interpreted languages Jakub Yaghob
‘99 ACM/IEEE International Symposium on Computer Architecture
Department of Electrical & Computer Engineering
Mark Claypool and Jonathan Tanner Computer Science Department
Ann Gordon-Ross and Frank Vahid*
Adaptive Code Unloading for Resource-Constrained JVMs
Getting to the root of concurrent binary search tree performance
Department of Computer Science University of California, Santa Barbara
Garbage Collection Advantage: Improving Program Locality
Run-time environments
Presentation transcript:

Institute of Computing Technology On Improving Heap Memory Layout by Dynamic Pool Allocation Zhenjiang Wang Chenggang Wu Institute of Computing Technology, Chinese Adacemy of Sciences Pen-Chung Yew University of Minnesota

Institute of Computing Technology Outline Introduction Dynamic Pool Allocation Evaluation Conclusion

Institute of Computing Technology Dynamic Memory Allocation Dynamic heap memory allocation is widely used in modern programs. General-purpose heap allocators focus more on runtime overhead and memory utilization. List 1 Nodes List 2 Nodes Tree Nodes Lea allocator (dlmalloc, in glibc) :

Institute of Computing Technology Pool Allocation Pool allocation aggregates heap objects into separate memory pools at the time of their allocation. List 1 Nodes List 2 Nodes Tree Nodes Pool Allocation: Pool 3Pool 2Pool 1

Institute of Computing Technology Related Work Garbage collector [Chilimbi, 1998] [Huang, 2004] [Serrano, 2009] GC can move objects at runtime Compiler [Lattner, 2005] Data structure Profiling [Seidl, 1998] [Barret, 1993] [Chilimbi, 2006] [Calder, 1998] Hot data stream, lifetime, etc Runtime [Zhao, 2006] Call site based

Institute of Computing Technology Outline Introduction Dynamic Pool Allocation Evaluation Conclusion

Institute of Computing Technology Allocation Site Heap objects allocated from the same call instruction are often affinitive. However, sometimes …

Institute of Computing Technology Allocation Site Heap objects allocated from the same call instruction are often affinitive. However, sometimes …

Institute of Computing Technology Allocation Site Heap objects allocated from the same call instruction are often affinitive. However, sometimes it could trick the call-site based scheme to aggregate all heap objects into one pool.

Institute of Computing Technology Example main: … p = safe_malloc (16) … q = safe_malloc (28) … r = safe_malloc (40) … Pool 1 Pool 2 Pool 3 Pool 1 safe_malloc: … w = malloc (n) …

Institute of Computing Technology Full Call Chain main foo malloc main aaa bbb wrapper malloc main foo bar wrapper malloc main ccc main wrapper malloc foo

Institute of Computing Technology Fixed-length Call Chain main foo malloc main aaa bbb wrapper malloc main foo bar wrapper malloc main ccc main wrapper malloc foo

Institute of Computing Technology Adaptive Partial Call Chain main foo malloc main aaa bbb wrapper malloc main foo bar wrapper malloc main ccc main wrapper malloc foo

Institute of Computing Technology Need for Pool Merging foo: … malloc(16) … bar: … malloc(16) … List Nodes

Institute of Computing Technology Pool 1 Pool 2 Pool 3 Affinity Same type Objects are of type-I affinity if they are linked to form a data structure. Objects are of type-II affinity if their pointers are saved in the same fields of type-I affinitive objects. List Nodes Data 1 Data 2

Institute of Computing Technology Pool 1 Pool 2 Pool 4Pool 3 Pool Merging Example Suppose objects of Data 2 are allocated from two sites. List Nodes Data 1 Data 2 Before merging

Institute of Computing Technology Pool 1 Pool 2 Pool 3 Pool Merging Example Suppose objects of Data 2 are allocated from two sites. List Nodes Data 1 Data 2 After merging

Institute of Computing Technology Pool 1 Pool 2 Pool 3 Data Structure DPA Data structure based List Nodes Data 1 Data 2

Institute of Computing Technology Thresholds A pool may not be beneficial if it has few objects, or the objects sizes are large. A pool forwards its first 100 allocation requests to the system allocator. (object number threshold) The sizes of these objects must be less than 128 bytes. (object size threshold)

Institute of Computing Technology Outline Introduction Dynamic Pool Allocation Evaluation Conclusion

Institute of Computing Technology Platforms and Benchmarks 12 SPEC 2000 and 2006 benchmarks Platform #1Platform #2 CPUIntel Pentium 4Intel Xeon FamilyNorthwoodHarpertown Frequency2.40GHz2.33GHz L1I Cache32KB L1D Cache32KB L2 Cache512KB6144KB Cache Line64B Memory2GB16GB OSLinux Linux

Institute of Computing Technology Overall Performance

Institute of Computing Technology Cache and TLB Misses

Institute of Computing Technology Object Number Threshold

Institute of Computing Technology Object Size Threshold

Institute of Computing Technology Overhead Time: less than 1% on average Stack unwinding and hash table looking up (for every allocation request, can be reduced by instrumentation) Wrapper recognition (for every function, amortized) SSG building and analysis (for every new call chain, amortized) Space: Hash table (8K) IR (several times larger than code) and SSG (~10K) Metadata for pages in pools (20 bytes per page)

Institute of Computing Technology Outline Introduction Dynamic Pool Allocation Evaluation Conclusion

Institute of Computing Technology Conclusion We proposed an approach to control the layout of heap data dynamically. adaptive partial call chain pool merging We studied some factors that could affect the effectiveness of such layout. We got an average speedup of 12.1% and 10.8% on two x86 machines.

Institute of Computing Technology The End Thanks.