Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP 2005 Chris Lattner Vikram Adve.

Slides:



Advertisements
Similar presentations
C Structures What is a structure? A structure is a collection of related variables. It may contain variables of many different data types---in contrast.
Advertisements

Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Lecture 12 Reduce Miss Penalty and Hit Time
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Memory management.
File Systems.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap June 13, 2005 PLDI Chris.
Week 7 - Friday.  What did we talk about last time?  Allocating 2D arrays.
Praveen Yedlapalli Emre Kultursay Mahmut Kandemir The Pennsylvania State University.
SAFECode SAFECode: Enforcing Alias Analysis for Weakly Typed Languages Dinakar Dhurjati University of Illinois at Urbana-Champaign Joint work with Sumant.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Memory Organization.
Chapter 3.2 : Virtual Memory
Run time vs. Compile time
1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.
The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:
Run-time Environment and Program Organization
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Automatic Pool Allocation for Disjoint Data Structures Presented by: Chris Lattner Joint work with: Vikram Adve ACM.
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
Memory Allocation CS Introduction to Operating Systems.
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
Compiler Construction Lecture 17 Mapping Variables to Memory.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
Making Object-Based STM Practical in Unmanaged Environments Torvald Riegel and Diogo Becker de Brum ( Dresden University of Technology, Germany)
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
Chapter 0.2 – Pointers and Memory. Type Specifiers  const  may be initialised but not used in any subsequent assignment  common and useful  volatile.
Good Programming Practices for Building Less Memory-Intensive EDA Applications Alan Mishchenko University of California, Berkeley.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
Dynamic Array. An Array-Based Implementation - Summary Good things:  Fast, random access of elements  Very memory efficient, very little memory is required.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap Paper by: Chris Lattner and Vikram Adve University of.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Review 1 Polish Notation Prefix Infix Postfix Precedence of Operators Converting Infix to Postfix Evaluating Postfix.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Memory Management Overview.
Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines Kris Venstermans, Lieven Eeckhout, Koen De Bosschere Department of Electronics.
ISA's, Compilers, and Assembly
Automatic Pool Allocation for better Memory System Performance Presented by: Chris Lattner Joint work with: Vikram Adve
MORE POINTERS Plus: Memory Allocation Heap versus Stack.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Operating Systems A Biswas, Dept. of Information Technology.
CE 454 Computer Architecture
Seminar in automatic tools for analyzing programs with dynamic memory
Ke Bai and Aviral Shrivastava Presented by Bryce Holton
Department of Computer Science University of California, Santa Barbara
Closure Representations in Higher-Order Programming Languages
Memory Management Overview
Virtual Memory Hardware
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 7: Flexible Address Translation
CS703 - Advanced Operating Systems
Lecture 4: Instruction Set Design/Pipelining
rePLay: A Hardware Framework for Dynamic Optimization
Introduction to Computer Systems Engineering
Presentation transcript:

Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve

Chris Lattner Growth of 64-bit computing 64-bit architectures are increasingly common: 64-bit architectures are increasingly common:  New architectures and chips (G5, IA64, X86-64, …)  High-end systems have existed for many years now 64-bit address space used for many purposes: 64-bit address space used for many purposes:  Address space randomization (security)  Memory mapping large files (databases, etc)  Single address space OS’s  Many 64-bit systems have < 4GB of phys memory 64-bits is still useful for its virtual address space 64-bits is still useful for its virtual address space

Chris Lattner Cost of a 64-bit virtual address space Pointers must be 64 bits (8 bytes) instead of 32 bits: Pointers must be 64 bits (8 bytes) instead of 32 bits:  Significant impact for pointer-intensive programs! Pointer intensive programs suffer from: Pointer intensive programs suffer from:  Reduced effective L1/L2/TLB cache sizes  Reduced effective memory bandwidth  Increased alignment requirements, etc Pointer intensive programs are increasingly common: Pointer intensive programs are increasingly common:  Recursive data structures (our focus)  Object oriented programs BIGGER POINTERS

Chris Lattner Previously Published Approaches Simplest approaches: Use 32-bit addressing Simplest approaches: Use 32-bit addressing  Compile for 32-bit pointer size “-m32”  Force program image into 32-bits [Adl-Tabatabai’04]  Loses advantage of 64-bit address spaces! Other approaches: Exotic hardware support Other approaches: Exotic hardware support  Compress pairs of values, speculating that pointer offset is small [Zhang’02]  Compress arrays of related pointers [Takagi’03]  Requires significant changes to cache hierarchy No previous fully-automatic compiler technique to shrink pointers in RDS’s

Chris Lattner Our Approach (1/2) 1.Use Automatic Pool Allocation [PLDI’05] to partition heap into memory pools:  Infers and captures pool homogeneity information Original heap layout Layout after pool allocation

Chris Lattner Our Approach (2/2) 2.Replace pointers with 64-bit integer offsets from the start of the pool  Change *Ptr into *(PoolBase+Ptr) 3.Shrink 64-bit integers to 32-bit integers  Allows each pool to be up to 4GB in size Layout after pointer compression

Chris Lattner Talk Outline Introduction & Motivation Introduction & Motivation Automatic Pool Allocation Background Automatic Pool Allocation Background Pointer Compression Transformation Pointer Compression Transformation Experimental Results Experimental Results Conclusion Conclusion

Chris Lattner 1.Compute points-to graph:  Ensure each pointer has one target “unification-based” approach “unification-based” approach 2.Infer pool lifetimes:  Uses escape analysis 3.Rewrite program:  malloc  poolalloc, free  poolfree  Insert calls to poolinit/pooldestroy  Pass pool descriptors to functions Automatic Pool Allocation Pool 1 Pool 2 Pool 3 Pool 4 For more info: see MSP paper or talk at PLDI tomorrow L1 List L2 List Pool 1 Pool 2

Chris Lattner A Simple Pointer-intensive Example A list of pointers to doubles A list of pointers to doubles List *L = 0; for (…) { List *N = malloc(List); List *N = malloc(List); N->Next = L; N->Next = L; N->Data = malloc(double); N->Data = malloc(double); L = N; L = N;} L double List double L List Points-to graph summary List *L = 0; for (…) { List *N = poolalloc(PD1, List); List *N = poolalloc(PD1, List); N->Next = L; N->Next = L; N->Data = poolalloc(PD2,double); N->Data = poolalloc(PD2,double); L = N; L = N;} Pool allocate PD1 PD2

Chris Lattner Effect of Automatic Pool Allocation 1/2 Heap is partitioned into separate pools Heap is partitioned into separate pools  Each individual pool is smaller than total heap L double List List *L = 0; for (…) { List *N = malloc(List); List *N = malloc(List); N->Next = L; N->Next = L; N->Data = malloc(double); N->Data = malloc(double); L = N; L = N;} List *L = 0; for (…) { List *N = poolalloc(PD1, List); List *N = poolalloc(PD1, List); N->Next = L; N->Next = L; N->Data = poolalloc(PD2,double); N->Data = poolalloc(PD2,double); L = N; L = N;} PD1 PD2 … …

Chris Lattner Effect of Automatic Pool Allocation 2/2 Each pool has a descriptor associated with it: Each pool has a descriptor associated with it:  Passed into poolalloc/poolfree We know which pool each pointer points into: We know which pool each pointer points into:  Given the above, we also have the pool descriptor  e.g. “ N ”, “ L ”  PD1 and N->Data  PD2 List *L = 0; for (…) { List *N = malloc(List); List *N = malloc(List); N->Next = L; N->Next = L; N->Data = malloc(double); N->Data = malloc(double); L = N; L = N;} List *L = 0; for (…) { List *N = poolalloc(PD1, List); List *N = poolalloc(PD1, List); N->Next = L; N->Next = L; N->Data = poolalloc(PD2,double); N->Data = poolalloc(PD2,double); L = N; L = N;}

Chris Lattner Talk Outline Introduction & Motivation Introduction & Motivation Automatic Pool Allocation Background Automatic Pool Allocation Background Pointer Compression Transformation Pointer Compression Transformation Experimental Results Experimental Results Conclusion Conclusion

Chris Lattner Index Conversion of a Pool Force pool memory to be contiguous: Force pool memory to be contiguous:  Normal PoolAlloc runtime allocates memory in chunks  Two implementation strategies for this (see paper) Change pointers into the pool to integer offsets/indexes from pool base: Change pointers into the pool to integer offsets/indexes from pool base:  Replace “ *P ” with “ *(PoolBase + P) ” A pool can be index converted if pointers into it only point to heap memory (no stack or global mem)

Chris Lattner Index Compression of a Pointer Shrink indexes in type-homogenous pools Shrink indexes in type-homogenous pools  Shrink from 64-bits to 32-bits Replace structure definition & field accesses Replace structure definition & field accesses  Requires accurate type-info and type-safe accesses struct List { struct List *Next; struct List *Next; int Data; int Data;}; struct List { int64 Next; int64 Next; int Data; int Data;}; struct List { int32 Next; int32 Next; int Data; int Data;}; List *L = malloc(16); L = malloc(16); L = malloc(8); index conversion index compression

Chris Lattner Index Conversion Example 1 L double List Two pointers are compressible Previous Example Index conversion changes pointer dereferences, but not memory layout 64-bit integer indexes L double List Index Convert Pools After Index Conversion

Chris Lattner Index Compression Example 1 Example after index conversion 64-bit integer indices L double List Compress pointers, change accesses to and size of structure L double List 32-bit integer indices ‘Pointer’ registers remain 64-bits Compress both indexes from 64 to 32-bit ints

Chris Lattner Impact of Type-Homogeneity/safety Compression requires rewriting structures: Compression requires rewriting structures:  e.g. malloc(32)  malloc(16)  Rewriting depends on type-safe memory accesses We can’t know how to rewrite unions and other cases We can’t know how to rewrite unions and other cases  Must verify that memory accesses are ‘type-safe’ Pool allocation infers type homogeneity: Pool allocation infers type homogeneity:  Unions, bad C pointer tricks, etc  non-TH  Some pools may be TH, others not Can’t index compress ptrs in non-TH pools! Can’t index compress ptrs in non-TH pools!

Chris Lattner Index Conversion Example 2 L List unknown type Heap pointer points from TH pool to a heap-only pool: compress this pointer! L List unknown type Index conversion changes pointer dereferences, but not memory layout 64-bit integer indexes Compression of TH memory, pointed to by non-TH memory

Chris Lattner Index Compression Example 2 L List unknown type Example after index conversion 64-bit integer indexes Next step: compress the pointer in the heap unknown type Compress pointer in type- safe pool, change offsets and size of structure 64-bit integer indexes 32-bit integer index L List Compression of TH memory, pointed to by non-TH memory

Chris Lattner Pointer Compression Example 3 L Array unknown type L Array unknown type Index convert non-TH pool to shrink TH pointers L Array unknown type Index compress array of pointers! Compression of TH pointers, pointing to non-TH memory

Chris Lattner Static Pointer Compression Impl. Inspect graph of pools provided by APA: Inspect graph of pools provided by APA:  Find compressible pointers  Determine pools to index convert Use rewrite rules to ptr compress program: Use rewrite rules to ptr compress program:  e.g. if P 1 and P 2 are compressed pointers, change: P 1 = *P 2  P 1 ’ = *(int*)(PoolBase+P 2 ’) Perform interprocedural call graph traversal: Perform interprocedural call graph traversal:  Top-down traversal from main()

Chris Lattner Talk Outline Introduction & Motivation Introduction & Motivation Automatic Pool Allocation Background Automatic Pool Allocation Background Pointer Compression Transformation Pointer Compression Transformation Dynamic Pointer Compression Dynamic Pointer Compression Experimental Results Experimental Results Conclusion Conclusion

Chris Lattner Dynamic Pointer Compression Idea Static compression can break programs: Static compression can break programs:  Each pool is limited to 2 32 bytes of memory Program aborts when 2 32 nd byte allocated! Program aborts when 2 32 nd byte allocated! Expand pointers dynamically when needed! Expand pointers dynamically when needed!  When 2 32 nd byte is allocated, expand ptrs to 64-bits  Traverse/rewrite pool, uses type information  Similar to (but a bit simpler than) a copying GC pass

Chris Lattner Dynamic Pointer Compression Cost Structure offset and sizes depend on whether a pool is currently compressed: Structure offset and sizes depend on whether a pool is currently compressed: P 1 = *P 2  Use standard optimizations to address this: Use standard optimizations to address this:  Predication, loop unswitching, jump threading, etc. See paper for details See paper for details if (PD -> isCompressed) P 1 ’ = *(int32*)(PoolBase + P 2 ’*C1 + C2); else P 1 ’ = *(int64*)(PoolBase + P 2 ’*C3 + C4);

Chris Lattner Talk Outline Introduction & Motivation Introduction & Motivation Automatic Pool Allocation Background Automatic Pool Allocation Background Pointer Compression Transformation Pointer Compression Transformation Experimental Results Experimental Results Conclusion Conclusion

Chris Lattner Experimental Results: 2 Questions 1.Does Pointer Compression improve the performance of pointer intensive programs? Cache miss reductions, memory bandwidth improvements Cache miss reductions, memory bandwidth improvements Memory footprint reduction Memory footprint reduction 2.How does the impact of Pointer Compression vary across 64-bit architectures? Do memory system improvements outweigh overhead? Do memory system improvements outweigh overhead? Built in the LLVM Compiler Infrastructure:

Chris Lattner Static PtrComp Performance Impact UltraSPARC IIIi w/1MB Cache PAPCPC/PA bh8MB 1.00 bisort64MB32MB0.50 em3d47MB 1.00 perimeter299MB171MB0.57 power882KB816KB0.93 treeadd128MB64MB0.50 tsp128MB96MB0.75 ft9MB4MB0.51 ks47KB 1.00 llubench4MB2MB0.50 Peak Memory Usage 1.0 = Program compiled with LLVM & PA but no PC

Chris Lattner Evaluating Effect of Architecture Pick one program that scales easily: Pick one program that scales easily:  llubench – Linked list micro-benchmark  llubench has little computation, many dereferences Best possible case for pointer compression Best possible case for pointer compression Evaluate how ptrcomp impacts scalability: Evaluate how ptrcomp impacts scalability:  Compare to native and pool allocated version Evaluate overhead introduced by ptrcomp: Evaluate overhead introduced by ptrcomp:  Compare PA32 with PC32 (‘compress’ 32  32 bits) How close is ptrcomp to native 32-bit pointers? How close is ptrcomp to native 32-bit pointers?  Compare to native-32 and poolalloc-32 for limit study

Chris Lattner SPARC V9 PtrComp (1MB Cache) PtrComp overhead is very low Significant scalability impact

Chris Lattner AMD64 PtrComp (1MB Cache) PtrComp is faster than PA32 on AMD64! Similar scalability impact See paper for IA64 and IBM-SP

Chris Lattner Pointer Compression Conclusion Pointer compression can substantially reduce footprint of pointer-intensive programs: Pointer compression can substantially reduce footprint of pointer-intensive programs:  … without specialized hardware support! Significant perf. impact for some programs: Significant perf. impact for some programs:  Effectively higher memory bandwidth  Effectively larger caches Dynamic compression for full generality: Dynamic compression for full generality:  Speculate that pools are small, expand if not  More investigation needed, see paper! Questions? Questions?