Go with the Flow: Profiling Copies to Find Run-time Bloat Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Gary Sevitsky Ohio State University.

Slides:



Advertisements
Similar presentations
Uncovering Performance Problems in Java Applications with Reference Propagation Profiling PRESTO: Program Analyses and Software Tools Research Group, Ohio.
Advertisements

Dynamic Memory Allocation in C.  What is Memory What is Memory  Memory Allocation in C Memory Allocation in C  Difference b\w static memory allocation.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.
Names and Bindings.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Program Representations. Representing programs Goals.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
JOLT: REDUCING OBJECT CHURN AJ Shankar, Matt Arnold, Ras Bodik UC Berkeley, IBM Research | OOPSLA
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
Previous finals up on the web page use them as practice problems look at them early.
Software Bloat Analysis: Detecting, Removing, and Preventing Performance Problems in Modern Large- Scale Object-Oriented Applications Guoqing Xu, Nick.
Scaling CFL-Reachability-Based Points- To Analysis Using Context-Sensitive Must-Not-Alias Analysis Guoqing Xu, Atanas Rountev, Manu Sridharan Ohio State.
Class canceled next Tuesday. Recap: Components of IR Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Regression Test Selection for AspectJ Software Guoqing Xu and Atanas.
Java An introduction. Example 1 public class Example1 { public static void main (String [] args) { System.out.println (“This is the first example”); int.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Reference Types. 2 Objectives Introduce reference types –class –array Discuss details of use –declaration –allocation –assignment –null –parameter –aggregation.
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
CACHETOR Detecting Cacheable Data to Remove Bloat Khanh Nguyen Guoqing Xu UC Irvine USA.
Precise Memory Leak Detection for Java Software Using Container Profiling Guoqing Xu, Atanas Rountev Program analysis and software tools group Ohio State.
Chapter 1 Algorithm Analysis
1 Chapter 5: Names, Bindings and Scopes Lionel Williams Jr. and Victoria Yan CSci 210, Advanced Software Paradigms September 26, 2010.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs Atanas (Nasko) Rountev Kevin Van Valkenburgh Dacong Yan P. Sadayappan Ohio.
CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)
Runtime Environments Compiler Construction Chapter 7.
Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat Guoqing Xu CSE Department Ohio State University Ph.D. Thesis Defense Aug.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Rethinking Soot for Summary-Based Whole- Program Analysis PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Dacong Yan.
Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
Run-Time Storage Organization Compiler Design Lecture (03/23/98) Computer Science Rensselaer Polytechnic.
Static Detection of Loop-Invariant Data Structures Harry Xu, Tony Yan, and Nasko Rountev University of California, Irvine Ohio State University 1.
Estimating the Run-Time Progress of a Call Graph Construction Algorithm Jason Sawin, Atanas Rountev Ohio State University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
Spring 2009 Programming Fundamentals I Java Programming XuanTung Hoang Lecture No. 8.
5/7/03ICSE Fragment Class Analysis for Testing of Polymorphism in Java Software Atanas (Nasko) Rountev Ohio State University Ana Milanova Barbara.
Sept 12ICSM'041 Precise Identification of Side-Effect-Free Methods in Java Atanas (Nasko) Rountev Ohio State University.
Object Naming Analysis for Reverse- Engineered Sequence Diagrams Atanas (Nasko) Rountev Beth Harkness Connell Ohio State University.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part R2. Elementary Data Structures.
CS 412/413 Spring 2005Introduction to Compilers1 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 30: Loop Optimizations and Pointer Analysis.
Static Analysis of Object References in RMI-based Java Software
Context-Sensitive Analysis
Compositional Pointer and Escape Analysis for Java Programs
Data types and variables
Graph-Based Operational Semantics
CSCE 210 Data Structures and Algorithms
CACHETOR Detecting Cacheable Data to Remove Bloat
CSc 453 Interpreters & Interpretation
CSE 373, Copyright S. Tanimoto, 2002 Performance Analysis -
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Demand-Driven Context-Sensitive Alias Analysis for Java
Interactive Exploration of Reverse-Engineered UML Sequence Diagrams
CSc 453 Interpreters & Interpretation
Presentation transcript:

Go with the Flow: Profiling Copies to Find Run-time Bloat Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Gary Sevitsky Ohio State University IBM T. J. Watson Research Center

Motivation Higher-level languages such as Java – Large libraries – Framework-intensive applications Consequences – Massive amounts of wasteful computation – Performance problems that current JIT compilers cannot solve (grand challenge) Runtime bloat – Performance problems caused by inefficiency – time (slow) and space (memory hog) 2

Bloat Example A commercial document server – Run on top of IBM WebSphere application server – method invocations and 3000 temporary objects to insert a simple document Example of bloat – Decodes the whole cookie in an 8-element hash map, but extracts only 1 or 2 elements – 1000 method calls 3

Optimization Challenges Many large applications are free of hot-spots – For the document server, no single method contributes more than 3.19% of total application time – Applications are suffering from systemic runtime bloat The current JITs do not optimize bloat away – The JIT cannot perform expensive analysis – The JIT does not understand the semantics – The JIT cannot identify “useless” operations Our goal: assistance with hand-tuning 4

Copying as Indicator of Bloat Data copying is an excellent indicator of bloat – Chains of data flow that carry values from one storage location to another, often via temporary objects – No values are changed – The same data is contained in many locations: both wasted space and wasteful operations Identifying data copying – Profile data copying for postmortem diagnosis – Data-based metric 5

Copy Profiling A data copy – Load-store pair without computation in between – In particular a heap load - heap store pair – Example: int a = b.f; int c = a; A.g = c; Data copy profiles – Percent of instructions that are copies, compared to other activities (e.g., computation or comparison) – Observation: copy activity is often concentrated in a small number of methods For the document server, the top 50 methods explain 82% of total copies, but only 24% of total running time 6

Copy Chain Profiling Copy chain profiling – Understanding the chains as a whole A copy chain is – A sequence of copies that carries a value through two or more heap locations – Node: a heap location (instance field and static field) – Edge: a sequence of copies that transfers a value from one heap location to another – An edge abstracts away intermediate copies via stack locations 7

8 An Example 8 class List{ Object[] data; int count = 0; List(){ data = new Object[1000]; // O3} void add(Object o) { data[count++] = o; } Object get(int i){ return data[i]; } List clone(){ List newL = new List(); // O2 for(int j = 0; j < count; j++) newL.add(get(j)); } } static void main(){ List l = new List(); // O1 for(int i = 0; i < 1000; i++) l. add(new Integer(i));//O4 List l2 = l.clone(); for(int i = 0; i < 1000; i++) { System.out.println (l2.get(i)); } } O4  O3.ELEMO4  O3.ELEM  O3.ELEMO3.ELEM  O3.ELEM

Chain Augmentation We augment a chain with – A producer node (source of data) A constant value A new expression A computation instruction creating new data – A consumer node C (sink of data) Has only one instance C Shows that the data goes to a computation instruction or a native method Example – O4  O3.ELEM  O3.ELEM  C 9

Copy Graph It can be prohibitively expensive to profile copy chains – Both time and space expensive Abstractions – Static abstractions of objects – Profile copy chain edges only: copy graph 10 O1.fO2.gO3.e O4.mO1.fO3.e O2.gO4.mO1.f O2.g O3.eO4.m

Copy Graph Profiling Nodes – Allocation site nodes “O i ” – Instance field nodes “O i.f” – Static field nodes “A.f” – Consumer node “C” Edges annotated with two integers – Frequency and the number of copied bytes (1, 2, 4, 8) Example – Copy chain: O4  O3.ELEM  O3.ELEM  C – Copy graph: 11

Context Sensitive Copy Graph Imprecision from context insensitive copy graph – Invalid copy chains may be derived due to nodes merging Context sensitivity – k-Object sensitivity (Milanova-ISSTA 02) 1-object-sensitive naming scheme – An object is named as its allocation site + the allocation site of the receiver object of the method containing the allocation site Example: 12

Runtime Flow Tracking  JVM flow tracking – Implementation in an IBM production JVM: J9 – Tag all application data with dataflow metadata information, referred to as tracking data  Shadow locations – Stack location: an extra local variable – Object field: a shadow heap that has the same size of the Java heap – The shadow location of a runtime instance a can be calculated through addr(a) + distance 13

Client Analyses to Find Bloat Hot chain detector – Recover hot copy chains from copy graph based on heuristics Clone detector – Find pairs of objects (O1, O2) with a large volume of copies occurring between the two object sub-graphs reachable from them Not assigned to heap (NATH) analysis – Find allocation site nodes that do not have outgoing edges 14

Detect Real World Bloat DaCapo bloat – Data copy profile: 28% instructions executed are copies – 50% of all data copies came from a variety of toString and append methods What we found from hot copy chains – Most of these calls centered around code of the form Assert.isTrue(cond, “bug: ” + node) – The second argument is printed only when cond evaluates to true Elimination of these strings resulted in – 65% reduction in objects created – 29% - 35% reduction in running time 15

Detect Real World Bloat (Cond.) Eclipse 3.1 – A large framework-intensive application – Performance problems result from the pile-up of wasteful operations in its plugins What we found from NATH analysis report – Millions of objects were created solely for the purpose of carrying data across one-level method invocations 9.3% running time reduction 16

Copy Graph Characteristics Memory overhead – Shadow heap size: the same as the size of the Java heap – Space for storing copy graph: less than 27M for DaCapo, 150M for IBM document server for 1-object- sensitive analysis Time overhead – On average 37X slowdown for 1-object-sensitive analysis – Optimization may be achieved by employing sampling- based profiling 17

Conclusions Copy activity is a good indicator of bloat Profiling copies – Data copy profiles show performance problems – Copy graph profiling helps pinpoint certain performance bottlenecks in an application Three client analyses based on copy graph Experimental results – Problems were found in real world large applications – Although incurring significant overhead, the tool works for large scale long-running programs 18

Thank you 19 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University

Data Flow Tracking Example 20 PRESTO: Program Analyses and Software Tools Research Group, Ohio State University programshadow stack shadow heap copy graph C c = new C(); //alloc1 … int a = c.f; … int b = a; … E e = new E(); //alloc2 … e.m = b; c’ = 0x1alloc 1 0x1 f g alloc 1 addr(c) + d … a’ = 0x1+offset(f) alloc 2 m … n b’ = a’ e’ = 0xa 0xa context alloc alloc 2 addr(e) + d context alloc