Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn,

Slides:

Advertisements

Similar presentations

Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.

Advertisements

1 Wake Up and Smell the Coffee: Performance Analysis Methodologies for the 21st Century Kathryn S McKinley Department of Computer Sciences University of.

Cooperative Cache Scrubbing Jennifer B. Sartor, Wim Heirman, Steve Blackburn*, Lieven Eeckhout, Kathryn S. McKinley^ PACT 2014 * ^

Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.

1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.

Michael Bond Kathryn McKinley The University of Texas at Austin Presented by Na Meng Most of the slides are from Mike’s original talk. Many thanks go to.

CPSC 171 Introduction to Computer Science Efficiency of Algorithms.

MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.

An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –

Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.

Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.

380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.

Increasing Memory Usage in Real-Time GC Tobias Ritzau and Peter Fritzson Department of Computer and Information Science Linköpings universitet

1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.

CIS 101: Computer Programming and Problem Solving Lecture 8 Usman Roshan Department of Computer Science NJIT.

File System Implementation CSCI 444/544 Operating Systems Fall 2008.

Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.

U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.

An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.

Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.

1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.

CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.

Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.

Taking Off The Gloves With Reference Counting Immix

Connectivity-Based Garbage Collection Martin Hirzel University of Colorado at Boulder Collaborators: Amer Diwan, Michael Hind, Hal Gabow, Johannes Henkel,

380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.

Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.

1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.

An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.

Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.

CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)

P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.

Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.

1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.

CS380 C lecture 20 Last time –Linear scan register allocation –Classic compilation techniques –On to a modern context Today –Jenn Sartor –Experimental.

Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.

Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.

How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia

Statistical Analysis of Inlining Heuristics in Jikes RVM Jing Yang Department of Computer Science, University of Virginia.

September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct

Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Yi Feng & Emery Berger University of Massachusetts Amherst A Locality-Improving.

380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and.

1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

Department of Computer Sciences ISMM No Bit Left Behind: The Limits of Heap Data Compression Jennifer B. Sartor* Martin Hirzel †, Kathryn S. McKinley*

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.

A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines Kris Venstermans, Lieven Eeckhout, Koen De Bosschere Department of Electronics.

1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.

Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.

Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from

Cork: Dynamic Memory Leak Detection with Garbage Collection

No Bit Left Behind: The Limits of Heap Data Compression

David F. Bacon, Perry Cheng, and V.T. Rajan

Jipeng Huang, Michael D. Bond Ohio State University

No Bit Left Behind: The Limits of Heap Data Compression

Garbage Collection Advantage: Improving Program Locality

Program-level Adaptive Memory Management

Presentation transcript:

Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn, Daniel Frampton, Martin Hirzel, Kathryn S. McKinley

Department of Computer Sciences Arrays [Zuse 46] June >50% Time/space tradeoff IBM WebSphere, AICAS Jamaica VM, Fiji VM

Department of Computer Sciences Why Discontiguous?  Real-time June  Problem: large arrays Collection: space & time unbounded for scan/copy Fragmentation [Siebert 00, Bacon et al. 03/05, Chen et al. 03] Sacrifice throughput for predictability

Department of Computer Sciences Z-Rays  Flexible, memory and time efficient  Spine of indirection pointers to arraylets  Space optimizations Lazy allocation Zero compression Novel arraylet copy-on-write  Time optimizations Inline first-N bytes into spine Fast array copy June *Most effective 12.7%  Prior work optimizations: 27-32%

Department of Computer Sciences Naïve Discontiguous Arrays June Arraylet HeaderArraylet Pointers Remaining Elements Uniform access Array Remove expensive indirection?

Department of Computer Sciences Access Statistics June >85% accesses to first 4KB mean

Department of Computer Sciences Optimized Discontiguous Arrays June HeaderInline d FirstN Arraylet Pointer s Remainin g Elements Array Spine Arraylet Space Arraylet...Arraylet *Fast Slow access

Department of Computer Sciences Flexible Arraylets  Memory management Spine in generational spaces Spine defines array “age” for timely reclamation [Hosking et al. 92] Arraylets non-moving  Space optimizations [inspired by Chen et al. 03] Lazy allocation Zero compression  Array copy optimizations Time: fast array copy Space: arraylet sharing with copy-on-write June

Department of Computer Sciences Lazy Allocation & Zero Compression June HeaderInline d FirstN Arraylet Pointer s Remainin g Elements Array Spine Arraylet Space Arraylet...Arraylet Zero Arraylet Lazy allocate Zero compress

Department of Computer Sciences Copy & Share Arraylets June Hdr1st N PtrsRemai n Src Arraylet Space Arraylet...Arraylet Hdr1st N PtrsRemai n Dest arraylet write Fast array copy Arraylet copy-on-write

Department of Computer Sciences Methodology  Jikes Research Virtual Machine Results are % overhead above contiguous Adaptive compilation: 10 th iteration, 20 JVM invocations  GenMarkSweep, 2x min heap  19 benchmarks: DaCapo, pseudojbb2005, SPECjvm98  Core 2 Duo with 2 processors June

Department of Computer Sciences Overall Result June Configuration Parameter/ Optimization Naïve NaïveA (Chen) NaïveB (Bacon) Z-ray Perf Z-ray Arraylet Bytes First-N2 12 Lazy Alloc ✔✔✔ Zero Compress ✔✔ Array Copy ✔✔ Copy-on-write ✔ 31.9 % Time Overhead 12

Department of Computer Sciences Time Optimizations  Benchmark accesses, firstN=2 12 bytes Fast firstN 91% of array accesses are to fast path  Removing first-N from Z-ray adds 10%  No fast array copy adds 2.8% June Fast Write% Slow Write% Fast Read% Slow Read% min max mean

Department of Computer Sciences Space Optimizations  Lazy allocation most effective  Xalan saves 56% of collection time, 5.5% total time  All: lazy allocation, zero compression, copy-on-write Best: xalan 25%, compress 49% of heap Average: save 6% of heap June

Department of Computer Sciences Z-Ray Takeaways  Flexible, time and space efficient discontiguous arrays Tunable optimization options Reduce previous overhead by 2- 3x Save 6% of heap space  More efficient for real-time  Feasible for future chip multiprocessors June