Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Artemis: Practical Runtime Monitoring of Applications for Execution Anomalies Long Fei and Samuel P. Midkiff School of Electrical and Computer Engineering.
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Uncovering Performance Problems in Java Applications with Reference Propagation Profiling PRESTO: Program Analyses and Software Tools Research Group, Ohio.
Runtime Techniques for Efficient and Reliable Program Execution Harry Xu CS 295 Winter 2012.
An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Go with the Flow: Profiling Copies to Find Run-time Bloat Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Gary Sevitsky Ohio State University.
Refinement-Based Context-Sensitive Points-To Analysis for JAVA Soonho Kong 17 January Work of Manu Sridharan and Rastislav Bodik, UC Berkeley.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Precise Memory Leak Detection for Java Software Using Container Profiling.
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
CORK: DYNAMIC MEMORY LEAK DETECTION FOR GARBAGE- COLLECTED LANGUAGES A TRADEOFF BETWEEN EFFICIENCY AND ACCURATE, USEFUL RESULTS.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
Software Bloat Analysis: Detecting, Removing, and Preventing Performance Problems in Modern Large- Scale Object-Oriented Applications Guoqing Xu, Nick.
Scaling CFL-Reachability-Based Points- To Analysis Using Context-Sensitive Must-Not-Alias Analysis Guoqing Xu, Atanas Rountev, Manu Sridharan Ohio State.
LeakChaser: Helping Programmers Narrow Down Causes of Memory Leaks Guoqing Xu, Michael D. Bond, Feng Qin, Atanas Rountev Ohio State University.
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Java Alias Analysis for Online Environments Manu Sridharan 2004 OSQ Retreat Joint work with Rastislav Bodik, Denis Gopan, Jong-Deok Choi.
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.
CS 206 Introduction to Computer Science II 01 / 23 / 2009 Instructor: Michael Eckmann.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
Precise Memory Leak Detection for Java Software Using Container Profiling Guoqing Xu, Atanas Rountev Program analysis and software tools group Ohio State.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams Atanas (Nasko) Rountev Ohio State University with Olga Volgin and Miriam.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs Atanas (Nasko) Rountev Kevin Van Valkenburgh Dacong Yan P. Sadayappan Ohio.
Semantics-Aware Performance Optimization Harry Xu CS Departmental Seminar 01/13/2012.
Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat Guoqing Xu CSE Department Ohio State University Ph.D. Thesis Defense Aug.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Effective Real-time Android Application Auditing
CSE 219 Computer Science III Program Design Principles.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Coverage Criteria for Testing of Object Interactions in Sequence Diagrams Atanas (Nasko) Rountev Scott Kagan Jason Sawin Ohio State University.
Rethinking Soot for Summary-Based Whole- Program Analysis PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Dacong Yan.
Java.util.Vector Brian Toone 10/3/07 Updated 10/10/07.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.
Static Detection of Loop-Invariant Data Structures Harry Xu, Tony Yan, and Nasko Rountev University of California, Irvine Ohio State University 1.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Estimating the Run-Time Progress of a Call Graph Construction Algorithm Jason Sawin, Atanas Rountev Ohio State University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Aspect Mining Jin Huang Huazhong University of Science & Technology, China
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
5/7/03ICSE Fragment Class Analysis for Testing of Polymorphism in Java Software Atanas (Nasko) Rountev Ohio State University Ana Milanova Barbara.
Object Naming Analysis for Reverse- Engineered Sequence Diagrams Atanas (Nasko) Rountev Beth Harkness Connell Ohio State University.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Atanas (Nasko) Rountev Ohio State University
Harry Xu University of California, Irvine & Microsoft Research
Online Subpath Profiling
Building a Whole-Program Type Analysis in Eclipse
Authors: Khaled Abdelsalam Mohamed Amr Kamel
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Demand-Driven Context-Sensitive Alias Analysis for Java
Presentation transcript:

Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University

Container Bloat Pervasively used in large Java applications – Many types and implementations in JDK libraries – User-defined container types Container inefficiency becomes a significant source of runtime bloat – Many memory leak bugs reported in the Sun Java bug database are caused by misuse of containers [Xu-ICSE’08] – Optimizing a misused HashMap reduced the number of created objects by 66% [Xu-PLDI’09] 2

The Targeted Container Inefficiencies Underutilized container – from jflex Vector v = new Vector(); v.addElement(new Interval(’\n’,’\r’)); v.addElement(new Interval(’\u0085’,’\u0085’)); RegExp1 c = new RegExp1(sym.CCLASS, v); … Overpopulated container – from muffin Map getCookies(Request query){ StringTokenizer st = … // decode the query while (st.hasMoreTokens()){ … attrs.put(key, value); // populate the map } return attrs; } Map cookies = getCookies(query); String con= cookies.get ("config"); 3

Use Static Analysis for Bloat Detection Dynamic analysis for bloat detection – Used by all state-of-art tools – “from-symptom-to-cause” diagnosis (heuristics based) – Dependent of specific runs – Many produce false positives – The reports may be hard to interpret Static information may be helpful to reduce false positives – Programmers’ intentions inherent in the code (instead of heuristics) – Independent of inputs and runs 4

Our Approach Step 1: A base static analysis – Automatically extracts container semantics – Based on the context-free-language (CFL) reachability formulation of points-to analysis [Sridharan-Bodik-PLDI’06] – Abstract container operations into ADD and GET Step 2: Dynamic or static analysis comparing frequencies of ADD and GET – For dynamic analysis: instrument and profile – For static analysis: frequency approximation Step 3: Inefficiency detection – #ADD is very small  underutilized container – #ADD >> #GET  overpopulated container 5

CFL-Reachability Formulation Example 6 a = new A(); // o 1 b = new A(); // o 2 c = a; o1o1 a b c o2o2 x y pret o3o3 (1(1 )1)1 (2(2 )2)2 [f[f [ f ]f]f o4o4 e id(p){ return p;} x = id(a); //call 1 y = id(b); //call 2 a.f = new C(); //o 3 b.f = new C(); //o 4 e = x.f; o  pts(v) if o flowsTo v

Extracting Container Semantics For a container object o c, Inside objects and element objects An ADD is concretized as a store operation that – Writes an element object to a field of an inside object A GET is concretized as a load operation that – Reads an element object from a field of an inside object 7

ReachFrom Relation Relation o reachFrom o c – o flowsTo a – b.f = a – b flowsTo o b – o b reachFrom o c A reachFrom path automatically guarantees the context- and field-sensitivity – Demand-driven – Instead of traversing a points-to graph 8 Matched [ ] and ( ) in a reachFrom path oa [f[f b obob ococ flowsTo reachFrom (ArrayList) (Object[]) (A)

Detecting Inside and Element Objects An inside object o i of a container o c is such that – o i reachFrom o c – o i is created in o c ’s class or super class, or other classes specified by the user An element object o e of a container o c is such that – o e is not an inside object – o e reachFrom o c – All objects except o e and o c on this reachFrom path are inside objects o e o i … o i … o c 9 Container internal structure

addTo Reachability An addTo path from an element object o e to a container object o c – o e flowsTo a – b.f = a – b flowsTo o b where o b is an inside object – o a reachFrom o c 10 Matched [ ] and { } in an addTo path : store achieving ADD oa [f[f b obob ococ flowsTo reachFrom (ArrayList) (Object[]) addTo (A)

getFrom Reachability A getFrom path from an element object o e to container object o c – o e flowsTo a – a = b.f – b flowsTo o b where o b is an inside object – o b reachFrom o c 11 : load achieving GET Matched [ ] and ( ) in a getFrom path

Relevant Contexts Identify calling contexts to associate each ADD/GET with a particular container object The chain of unbalanced method entry/exit edges ( 1 ( 2 ( 2 … before the store/load (achieving ADD/GET) on each addTo/getFrom path Details can be found in the paper 12

Execution Frequency Comparison Dynamic analysis – Instrument the ADD/GET sites and do frequency profiling Static analysis: approximation – Based on loop nesting information– inequality graph – Put ADD/GET with relevant contexts onto inequality graph and check the path Example 13 List l = new List(); while(*) { while(*){ l.add(…); } l.get(…); } Overpopulated Container #ADD >> #GET while(*){ List l = new List(); l.add(…); } Underutilized Container # ADD is small

Evaluation 14 Implemented based on Soot and the Sridharan- Bodik points-to analysis framework [PLDI’06] The static analysis is scalable – 21 large apps including eclipse (41k methods, 1623 containers) analyzed Low false positive rate – For each benchmark, randomly selected 20 container problems for manual check – No FP found for 14 programs – 1-3 FPs found for the remaining 5 apps

Static Analysis vs Dynamic Analysis Static report is easier to understand and verify – Optimization solutions can be easily seen for most problems reported by static analysis – For most problems reported by dynamic analysis, we could not have solutions Highly dependent on inputs and runs Dynamic frequency information can be incorporated for performance tuning – Statically analyze hot containers Case studies can be found in the paper 15

Conclusions The first static analysis targeting bloat detection Find container inefficiency based on loop nesting – Demand driven and client driven – Unsound but has low false positive rate Usage – Find container problems during development – Help performance tuning with the help of dynamic frequency info Static analysis to detect other kinds of bloat? 16