Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Artemis: Practical Runtime Monitoring of Applications for Execution Anomalies Long Fei and Samuel P. Midkiff School of Electrical and Computer Engineering.
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Uncovering Performance Problems in Java Applications with Reference Propagation Profiling PRESTO: Program Analyses and Software Tools Research Group, Ohio.
Runtime Techniques for Efficient and Reliable Program Execution Harry Xu CS 295 Winter 2012.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Go with the Flow: Profiling Copies to Find Run-time Bloat Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Gary Sevitsky Ohio State University.
Refinement-Based Context-Sensitive Points-To Analysis for JAVA Soonho Kong 17 January Work of Manu Sridharan and Rastislav Bodik, UC Berkeley.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Precise Memory Leak Detection for Java Software Using Container Profiling.
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
CORK: DYNAMIC MEMORY LEAK DETECTION FOR GARBAGE- COLLECTED LANGUAGES A TRADEOFF BETWEEN EFFICIENCY AND ACCURATE, USEFUL RESULTS.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Memory Systems Performance Workshop 2004© David Ryan Koes MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani.
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
Software Bloat Analysis: Detecting, Removing, and Preventing Performance Problems in Modern Large- Scale Object-Oriented Applications Guoqing Xu, Nick.
Scaling CFL-Reachability-Based Points- To Analysis Using Context-Sensitive Must-Not-Alias Analysis Guoqing Xu, Atanas Rountev, Manu Sridharan Ohio State.
LeakChaser: Helping Programmers Narrow Down Causes of Memory Leaks Guoqing Xu, Michael D. Bond, Feng Qin, Atanas Rountev Ohio State University.
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Java Alias Analysis for Online Environments Manu Sridharan 2004 OSQ Retreat Joint work with Rastislav Bodik, Denis Gopan, Jong-Deok Choi.
Overview of program analysis Mooly Sagiv html://
CS 206 Introduction to Computer Science II 01 / 23 / 2009 Instructor: Michael Eckmann.
CACHETOR Detecting Cacheable Data to Remove Bloat Khanh Nguyen Guoqing Xu UC Irvine USA.
Precise Memory Leak Detection for Java Software Using Container Profiling Guoqing Xu, Atanas Rountev Program analysis and software tools group Ohio State.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams Atanas (Nasko) Rountev Ohio State University with Olga Volgin and Miriam.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs Atanas (Nasko) Rountev Kevin Van Valkenburgh Dacong Yan P. Sadayappan Ohio.
Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012.
Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat Guoqing Xu CSE Department Ohio State University Ph.D. Thesis Defense Aug.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Effective Real-time Android Application Auditing
CSE 219 Computer Science III Program Design Principles.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Rethinking Soot for Summary-Based Whole- Program Analysis PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Dacong Yan.
CS 206 Introduction to Computer Science II 09 / 10 / 2009 Instructor: Michael Eckmann.
Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.
Static Detection of Loop-Invariant Data Structures Harry Xu, Tony Yan, and Nasko Rountev University of California, Irvine Ohio State University 1.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Estimating the Run-Time Progress of a Call Graph Construction Algorithm Jason Sawin, Atanas Rountev Ohio State University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
Copyright (c) Systems and Computer Engineering, Carleton University * Object-Oriented Software Development Unit 13 The Collections Framework.
Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
5/7/03ICSE Fragment Class Analysis for Testing of Polymorphism in Java Software Atanas (Nasko) Rountev Ohio State University Ana Milanova Barbara.
Sept 12ICSM'041 Precise Identification of Side-Effect-Free Methods in Java Atanas (Nasko) Rountev Ohio State University.
Object Naming Analysis for Reverse- Engineered Sequence Diagrams Atanas (Nasko) Rountev Beth Harkness Connell Ohio State University.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting Tian Tan, Yue Li and Jingling Xue SAS 2016 September,
Atanas (Nasko) Rountev Ohio State University
Harry Xu University of California, Irvine & Microsoft Research
Online Subpath Profiling
Building a Whole-Program Type Analysis in Eclipse
CACHETOR Detecting Cacheable Data to Remove Bloat
Human Complexity of Software
Demand-Driven Context-Sensitive Alias Analysis for Java
Context Sensitive Points-to Analysis
Presentation transcript:

Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University

Container Bloat Pervasively used in large Java applications – Many types and implementations in JDK libraries – User-defined container types Container inefficiency becomes a significant source of runtime bloat – Many memory leak bugs reported in the Sun Java bug database are caused by misuse of containers [Xu-ICSE’08] – Optimizing a misused HashMap reduced the number of created objects by 66% [Xu-PLDI’09] 2

The Targeted Container Inefficiencies Underutilized container – from jflex Vector v = new Vector(); v.addElement(new Interval(’\n’,’\r’)); v.addElement(new Interval(’\u0085’,’\u0085’)); RegExp1 c = new RegExp1(sym.CCLASS, v); … Overpopulated container – from muffin Map getCookies(Request query){ StringTokenizer st = … // decode the query while (st.hasMoreTokens()){ … attrs.put(key, value); // populate the map } return attrs; } Map cookies = getCookies(query); String con= cookies.get ("config"); 3

Use Static Analysis for Bloat Detection Dynamic analysis for bloat detection – Used by all state-of-art tools – “from-symptom-to-cause” diagnosis (heuristics based) – Dependent of specific runs – Many produce false positives – The reports may be hard to interpret Static information may be helpful to reduce false positives – Programmers’ intentions inherent in the code (instead of heuristics) – Independent of inputs and runs 4

Our Approach Step 1: A base static analysis – Automatically extracts container semantics – Based on the context-free-language (CFL) reachability formulation of points-to analysis [Sridharan-Bodik-PLDI’06] – Abstract container operations into ADD and GET Step 2: Dynamic or static analysis comparing frequencies of ADD and GET – For dynamic analysis: instrument and profile – For static analysis: frequency approximation Step 3: Inefficiency detection – #ADD is very small  underutilized container – #ADD >> #GET  overpopulated container 5

CFL-Reachability Formulation Example 6 a = new A(); // o 1 b = new A(); // o 2 c = a; o1o1 a b c o2o2 x y pret o3o3 (1(1 )1)1 (2(2 )2)2 [f[f [ f ]f]f o4o4 e id(p){ return p;} x = id(a); //call 1 y = id(b); //call 2 a.f = new C(); //o 3 b.f = new C(); //o 4 e = x.f; o  pts(v) if o flowsTo v

Extracting Container Semantics For a container object o c, Inside objects and element objects An ADD is concretized as a store operation that – Writes an element object to a field of an inside object A GET is concretized as a load operation that – Reads an element object from a field of an inside object 7

ReachFrom Relation Relation o reachFrom o c – o flowsTo a – b.f = a – b flowsTo o b – o b reachFrom o c A reachFrom path automatically guarantees the context- and field-sensitivity – Demand-driven – Instead of traversing a points-to graph 8 Matched [ ] and ( ) in a reachFrom path oa [f[f b obob ococ flowsTo reachFrom (ArrayList) (Object[]) (A)

Detecting Inside and Element Objects An inside object o i of a container o c is such that – o i reachFrom o c – o i is created in o c ’s class or super class, or other classes specified by the user An element object o e of a container o c is such that – o e is not an inside object – o e reachFrom o c – All objects except o e and o c on this reachFrom path are inside objects o e o i … o i … o c 9 Container internal structure

addTo Reachability An addTo path from an element object o e to a container object o c – o e flowsTo a – b.f = a – b flowsTo o b where o b is an inside object – o a reachFrom o c 10 Matched [ ] and { } in an addTo path : store achieving ADD oa [f[f b obob ococ flowsTo reachFrom (ArrayList) (Object[]) addTo (A)

getFrom Reachability A getFrom path from an element object o e to container object o c – o e flowsTo a – a = b.f – b flowsTo o b where o b is an inside object – o b reachFrom o c 11 : load achieving GET Matched [ ] and ( ) in a getFrom path oa ]f]f b obob ococ flowsTo reachFrom (ArrayList) (Object[]) getFrom (A)

Relevant Contexts Identify calling contexts to associate each ADD/GET with a particular container object The chain of unbalanced method entry/exit edges ( 1 ( 2 ( 2 … before the store/load (achieving ADD/GET) on each addTo/getFrom path Details can be found in the paper 12

Execution Frequency Comparison Dynamic analysis – Instrument the ADD/GET sites and do frequency profiling Static analysis: approximation – Based on loop nesting information– inequality graph – Put ADD/GET with relevant contexts onto inequality graph and check the path Example 13 List l = new List(); while(*) { while(*){ l.add(…); } l.get(…); } Overpopulated Container #ADD >> #GET while(*){ List l = new List(); l.add(…); } Underutilized Container # ADD is small

Evaluation 14 Implemented based on Soot and the Sridharan- Bodik points-to analysis framework [PLDI’06] The static analysis is scalable – 19 large apps including eclipse (41k methods, 1623 containers) analyzed Low false positive rate – For each benchmark, randomly selected 10 container problems for manual check – No FP found for 14 programs – 1-3 FPs found for the remaining 5 apps

Static Analysis vs Dynamic Analysis Static report is easier to understand and verify – Optimization solutions can be easily seen for most problems reported by static analysis – For most problems reported by dynamic analysis, we could not have solutions Highly dependent on inputs and runs Dynamic frequency information can be incorporated for performance tuning – Statically analyze hot containers Case studies can be found in the paper 15

Conclusions The first static analysis targeting bloat detection Find container inefficiency based on loop nesting – Demand driven and client driven – Unsound but has low false positive rate Usage – Find container problems during development – Help performance tuning with the help of dynamic frequency info Static analysis to detect other kinds of bloat? 16

Thank you 17

Inefficiency Detection Mount ADD/GET operations with the relevant contexts onto an inequality graph, and check the inequality edges involved Example 18 for(…){ List l = new List(); A a = new A(); B b = new B(); l.add(a); //call 1 l.add(b); //call 2 } … void add(Object x){ t 1 = this 2.arr; t 1 [ind++] = x; } (1(1 Relevant contexts (2(2 for (1(1 entry_add (2(2 IG UC: No inequality edge on the IG path between the alloc site of l and the ADD store List l = new List(); for(…){ while(…){ A a = new A(); l.add(a); //call 1 } l.get(i); // call 2 } Object get(int j){ t 2 = this 3.arr; return t 2 [j]; } for (1(1 entry_add (2(2 IG while ≤ entry_get Relevant contexts (2(2 OC: Inequality edge exits on the IG path from the GET load to the ADD store

Extracting Container Semantics For a container object o c, Inside object and element object An ADD is concretized as a store that – Writes an element object to a field of an inside object A GET is concretized as a load that – Reads an element object from a field of an inside object Relation o reachFrom o c  Obj  Obj – o flowsTo a – b.f = a – b flowsTo o b – o b reachFrom o c 19 Matched [ ] and { } in a reachFrom path