CACHETOR Detecting Cacheable Data to Remove Bloat Khanh Nguyen Guoqing Xu UC Irvine USA.

Slides:



Advertisements
Similar presentations
Uncovering Performance Problems in Java Applications with Reference Propagation Profiling PRESTO: Program Analyses and Software Tools Research Group, Ohio.
Advertisements

Runtime Techniques for Efficient and Reliable Program Execution Harry Xu CS 295 Winter 2012.
Object-Orientation Meets Big Data Language Techniques towards Highly- Efficient Data-Intensive Computing Harry Xu UC Irvine.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Pallavi Joshi  Chang-Seo Park  Koushik Sen  Mayur Naik ‡  Par Lab, EECS, UC Berkeley‡
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Go with the Flow: Profiling Copies to Find Run-time Bloat Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Gary Sevitsky Ohio State University.
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan RamanJisheng ZhaoVivek Sarkar Rice University June 13, 2012 Martin.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Dynamic Slicing Khanh Nguyen Donald Bren School of Information & Computer Science University of California, Irvine.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
Sept-Dec w1d21 Third-Generation Information Architecture CMPT 455/826 - Week 1, Day 2 (based on R. Evernden & E. Evernden)
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
IntroductionIntroduction  Computer program: an ordered sequence of statements whose objective is to accomplish a task.  Programming: process of planning.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
Scaling CFL-Reachability-Based Points- To Analysis Using Context-Sensitive Must-Not-Alias Analysis Guoqing Xu, Atanas Rountev, Manu Sridharan Ohio State.
Programming Language Semantics Mooly SagivEran Yahav Schrirber 317Open space html://
Semantics with Applications Mooly Sagiv Schrirber html:// Textbooks:Winskel The.
Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications UC Irvine USA Khanh Nguyen Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang,
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
The Design and Analysis of Algorithms
Bell: Bit-Encoding Online Memory Leak Detection Michael D. Bond Kathryn S. McKinley University of Texas at Austin.
Data Structures 1- Course Syllabus. 2- Introduction about Data Structures.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Adapted from Prof. Necula UCB CS 1641 Overview of COOL ICOM 4029 Lecture 2 ICOM 4029 Fall 2008.
Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat Guoqing Xu CSE Department Ohio State University Ph.D. Thesis Defense Aug.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
CSE 219 Computer Science III Program Design Principles.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
What does a computer program look like: a general overview.
Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.
Static Detection of Loop-Invariant Data Structures Harry Xu, Tony Yan, and Nasko Rountev University of California, Irvine Ohio State University 1.
BEGINNING PROGRAMMING.  Literally – giving instructions to a computer so that it does what you want  Practically – using a programming language (such.
CASE/Re-factoring and program slicing
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Heap liveness and its usage in automatic memory management Ran Shaham Elliot Kolodner Mooly Sagiv ISMM’02 Unpublished TVLA.
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
DATA STRUCTURES (CS212D) Overview & Review Instructor Information 2  Instructor Information:  Dr. Radwa El Shawi  Room: 
Experience with Software Watermarking Jens Palsberg, Sowmya Krishnaswamy, Minseok Kwon, Di Ma, Qiuyun Shao, Yi Zhang CERIAS and Department of Computer.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Optimistic Hybrid Analysis
Chapter 0: Introduction
Paul Ammann & Jeff Offutt
Optimizing Parallel Algorithms for All Pairs Similarity Search
The Design and Analysis of Algorithms
Cork: Dynamic Memory Leak Detection with Garbage Collection
No Bit Left Behind: The Limits of Heap Data Compression
Rajkishore Barik Efficient Computation of May-Happen-in-Parallel Information for Concurrent Java Programs Rajkishore Barik
CACHETOR Detecting Cacheable Data to Remove Bloat
Paul Ammann & Jeff Offutt
Paul Ammann & Jeff Offutt
Jipeng Huang, Michael D. Bond Ohio State University
Demand-Driven Context-Sensitive Alias Analysis for Java
Introduction to Data Structure
No Bit Left Behind: The Limits of Heap Data Compression
ICOM 4029 Fall 2003 Lecture 2 (Adapted from Prof. Necula UCB CS 164)
Presentation transcript:

CACHETOR Detecting Cacheable Data to Remove Bloat Khanh Nguyen Guoqing Xu UC Irvine USA

Introduction Bloat: Excessive work to accomplish simple tasks Modern software suffers from bloat [Xu et.al., FoSER 2010] It is difficult for compilers to remove the penalty One pattern: repeated computations that have the same inputs and produce the same outputs 4 out of 18 best practices (IBM’s) * are to reuse data Khanh Nguyen - UC Irvine * ‎

Example float[] fValues = {0.0, 1.0, 2.3, 1.0, 1.0, 3.4, 1.0, 1.0,..., 1.0}; int[] iValues = new int[fValues.length] ; for (int i = 0; i < fValues.length; i++){ iValues[i] = Float.floatToIntBits(fValues[i]); } {adapted from sunflow, an open-source image rendering system} Khanh Nguyen - UC Irvine if (fValues[i] == 1.0) iValues[i] = cached_result; else iValues[i] = Float.floatToIntBits(fValues[i]); int cached_result = Float.floatToIntBits(1.0); float[] fValues = {?, ?, ?, ?,..., ?};

The Big Picture Khanh Nguyen - UC Irvine Dynamic Dependence Analysis Dependence Profile/Graph I-CachetorD-CachetorM-Cachetor

Cachetor Introduction Scalable algorithms for the dependence analysis 3 detectors Evaluations Khanh Nguyen - UC Irvine

In Theory Khanh Nguyen - UC Irvine Full Value Profiling Full Dynamic Slicing Cachetor In Practice Abstract Value Profiling Abstract Dynamic Slicing

Overview Khanh Nguyen - UC Irvine Combine value profiling and dynamic slicing in a mutually-beneficial and scalable manner Distinct values are used to abstract instruction instances Result: an abstract dependence graph Nodes: abstract representations of runtime instances Edges: dependence relationships between nodes

Equivalence Class Khanh Nguyen - UC Irvine e1 … en Inst. instances f 1 Instruction i

Equivalence Class Inst. instances Values created f 1 ( inst. instance ) = value created

Inst. instances Values created f1f1 f2f2 -Top-N ? - Hashing ?

Inst. instances Values created f1f1 f2f2 - Hashing

Another Abstraction Level Context sensitive: To distinguish entities based on the calling context To improve the tool’s precision Please refer to our paper for details Khanh Nguyen - UC Irvine

Cacheability Khanh Nguyen - UC Irvine

Cachetor Introduction Scalable algorithms for the dependence analysis 3 detectors Evaluations Khanh Nguyen - UC Irvine

I-Cachetor Detect instructions that create identical values Compute cacheability for each static instruction (Inst.CM) Cacheability: 0312

D-Cachetor: Overview 2 steps: Step 1: detect cacheable individual objects Step 2: detect cacheable data structure Compute cacheability for each allocation site node

D-Cachetor: Step 1 Compute cacheability for each object (Obj.CM), not considering reference relationships Focus: instructions that write primitive-typed fields 12 …t

D-Cachetor: Step 2 Group objects using the reference relationships Compute DataStructureCM Focus: instructions that write reference-typed fields Add only objects whose Obj.CM is within a range

M-Cachetor Detect method calls that have the same inputs and produce the same outputs Compute CallSiteCM For each call site c: a = f( ), CallSiteCM is: If a is primitive: CallSiteCM = Inst.CM c If a is reference: CallSiteCM = the average of DataStructureCM of all data structures rooted at a

Implementation Jikes RVM Optimizing-compiler-only mode Context-sensitive Evaluated on 14 benchmarks from DaCapo & Java Grande Khanh Nguyen - UC Irvine

Overheads Khanh Nguyen - UC Irvine

Case Studies Khanh Nguyen - UC Irvine Program Time Reduction Space Reduction GC runs Reduction GC time Reduction montecarlo12.1% 98.7% 70.0%89.2% raytracer19.1%1.2%33.3%30.2% euler 20.5% 0.4%40.0%44.8% bloat13.1%12.6%-7.3%-4.0% xalan5.2%0.1%-0.7%-1.1%

False Positives Khanh Nguyen - UC Irvine ProgramD-CachetorM-Cachetor montecarlo26 raytracer34 euler17 bloat14 xalan45 Numbers of false positives identified among top 20 items in the reports of D-Cachetor and M-Cachetor.

False Positives Sources Handling of floating point values Context-sensitive reporting Missing the actual values Hashing-induced false positives Khanh Nguyen - UC Irvine

Conclusions Cachetor - novel tool, supports detection of cacheable data to improve performance Scalable combination of value profiling and dynamic slicing 3 detectors that can detect cacheable: o Instructions o Data structures o Method calls Large optimization opportunities can be found from Cachetor’s reports Khanh Nguyen - UC Irvine

THANK YOU! Questions - Comments? Khanh Nguyen - UC Irvine

What happened in montecarlo? public void runSerial() { results = new Vector(nRunsMC); // Now do the computation. PriceStock ps; for( int iRun=0; iRun < nRunsMC; iRun++ ) {ps = new PriceStock(); ps.setInitAllTasks(initAllTasks); ps.setTask(tasks.elementAt(iRun)); ps.run(); results.addElement(ps.getResult()); } {Calculate the result on the fly} private void processSerial() { processResults(); } ps.setTask(iRun, (long)iRun*11); private void initTasks(int nRunsMC) { tasks = new Vector(nRunsMC); for( int i=0; i < nRunsMC; i++ ) { String header= "MC run “ + String.valueOf(i); ToTask task = new ToTask(header, (long)i*11); tasks.addElement((Object) task); } Khanh Nguyen - UC Irvine