Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas.

Slides:



Advertisements
Similar presentations
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Advertisements

Uncovering Performance Problems in Java Applications with Reference Propagation Profiling PRESTO: Program Analyses and Software Tools Research Group, Ohio.
A Structure Layout Optimization for Multithreaded Programs Easwaran Raman, Princeton Robert Hundt, Google Sandya S. Mannarswamy, HP.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Advanced Indexing Techniques with
INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun.
Parallel Symbolic Execution for Structural Test Generation Matt Staats Corina Pasareanu ISSTA 2010.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
FIT FIT1002 Computer Programming Unit 19 Testing and Debugging.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.
Introduction to Computers and Programming Lecture 9: For Loops New York University.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
Introduction to Computers and Programming for Loops  2000 Prentice Hall, Inc. All rights reserved. Modified for use with this course. Introduction to.
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 th Ed Chapter Chapter 6 Repetition Statements.
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 th Ed Chapter Chapter 6 Repetition Statements.
Dynamic Purity Analysis for Java Programs Haiying Xu, Christopher J.F. Pickett, Clark Verbrugge School of Computer Science, McGill University PASTE ’07.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs Atanas (Nasko) Rountev Kevin Van Valkenburgh Dacong Yan P. Sadayappan Ohio.
Analyzing Large-Scale Object-Oriented Software to Find and Remove Runtime Bloat Guoqing Xu CSE Department Ohio State University Ph.D. Thesis Defense Aug.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Recursion. What is recursion? Rules of recursion Mathematical induction The Fibonacci sequence Summary Outline.
(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads Zoltán Majó Thomas R. Gross Department of Computer Science ETH Zurich,
CSC 142 D 1 CSC 142 Instance methods [Reading: chapter 4]
Static Detection of Loop-Invariant Data Structures Harry Xu, Tony Yan, and Nasko Rountev University of California, Irvine Ohio State University 1.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
Challenges and Solutions for Embedded Java Michael Wortley Computer Integrated Surgery March 1, 2001.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Detecting Inefficiently-Used Containers to Avoid Bloat Guoqing Xu and Atanas Rountev Department of Computer Science and Engineering Ohio State University.
By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Language Find the latest version of this document at
Microsoft Excel 2013 Chapter 9 Formula Auditing, Data Validation, and Complex Problem Solving.
Dynamic Programming & Memoization. When to use? Problem has a recursive formulation Solutions are “ordered” –Earlier vs. later recursions.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Adaptive Inlining Keith D. CooperTimothy J. Harvey Todd Waterman Department of Computer Science Rice University Houston, TX.
Chapter 2: Fundamental Programming Structures in Java Adapted from MIT AITI Slides Control Structures.
Midterm Review Tami Meredith. Primitive Data Types byte, short, int, long Values without a decimal point,..., -1, 0, 1, 2,... float, double Values with.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
JMVA Comprehension and Analysis 475 Software Engineering for Industry - Coursework 1 Zhongxi Ren Tianyi Ma Qian Wang Zi Wang.
Mid-Year Review. Coding Problems In general, solve the coding problems by doing it piece by piece. Makes it easier to think about Break parts of code.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Static Analysis Introduction Emerson Murphy-Hill.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Advanced Algorithms Analysis and Design
Concurrency 2 CS 2110 – Spring 2016.
Contents Introduction Bus Power Model Related Works Motivation
An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server
Augmented Sketch: Faster and More Accurate Stream Processing
Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold Xu Zhao*, Kirk Rodrigues*, Yu Luo*, Michael Stumm*,
Algorithms Furqan Majeed.
Eclat: Automatic Generation and Classification of Test Inputs
Objective of This Course
Searching, Sorting, and Asymptotic Complexity
CISC/CMPE320 - Prof. McLeod
ENERGY 211 / CME 211 Lecture 11 October 15, 2008.
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas R. Gross – ETH Zurich October 30 th, OOPSLA15 1

MemoizeIt 2 Dynamic analysis Memoization opportunities Automatic 9 new real-world memoization opportunities

Apache POI – Issue Performance Issue

public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue

public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue Java profiler Ranked 10 (189), 4000 calls Java profiler Ranked 10 (189), 4000 calls Java profiler No additional bottleneck info Java profiler No additional bottleneck info

public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue Research tools Sympthoms are not there* Research tools Sympthoms are not there* No nested loops No memory bloat * [Nistor, ISCE13], [Xu, OOPSLA12]

public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue Observation Many calls have the same input and output values! Observation Many calls have the same input and output values! Output Returned value Output Returned value Input Parameters + accessed fields Input Parameters + accessed fields true false 0, “m/d/yy” 1, “h:mm” Memoization ?

public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue Purity analysis? Too conservative! Purity analysis? Too conservative! Side effect s Side effect s Side effect s Ignore side effects!

public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue MemoizeIt 1 st ranked method! MemoizeIt 1 st ranked method! MemoizeIt Finds calls with the same input and output values. MemoizeIt Finds calls with the same input and output values. Memoization!

boolean cache_value; int cache_key1; String cache_key2; public boolean isADateFormatSlow(int idx, String format) { // Slow isADateFormat code } public boolean isADateFormat(int idx, String format) { if (cache_key1 == idx && cache_key2.equals(format)) { return cache_value; } // Update cache keys and value return isADateFormatSlow(idx, format); } Apache POI – Issue Single entry instance cache Up to 25% speed-up!

MemoizeIt – Contributions 4 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods

MemoizeIt – Contributions 5 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods Challenge boolean DateUtil.isADateFormat(int idx, MyClass format) Heap

MemoizeIt – Contributions 6 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods Challenge MemoizeIt == Memoization + Iterative

MemoizeIt 7 ProgramProfiling Input CPU-Time Profiling Filtering of methods: 1.Number of executions 2.Average execution time 3.Relative execution time Filtering of methods: 1.Number of executions 2.Average execution time 3.Relative execution time Initial method candidates

MemoizeIt 8 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling

Input-Output Profiling 9 Input: Parameters + accessed fields Output: Returned value Input-output tuple (T) main … … … 1. For each call of candidate method 3. Select method candidates T1T1 T2T2 multiplicity(T 1 ) = 3 multiplicity(T 2 ) = 2 Repeated Input-Output  Memoization boolean DateUtil.isADateFormat(int idx, String format) 2. Trace method input-output values true false 0, “m/d/yy” 1, “h:mm”

Challenge – Complex Objects 10 boolean DateUtil.isADateFormat(int idx, MyClass format)

Challenge – Complex Objects 10 … x: 45 MyClass y: 1 z: B a: equals? Structural and content equivalence … x: 45 MyClass y: 0 z: B a:

Challenge – Complex Objects 11 flat(object) (MyClass 1, [45, 1, (B 1, [...])]) … x: 45 MyClass y: 1 z: B a:

Challenge – Complex Objects 12 Heap … x: 45 MyClass y: 1 z: B a: Can’t keep everything!

Challenge – Complex Objects 13 depth = 1depth = 2 x: 45 MyClass y: 0 z: B a: x: 45 MyClass y: 1 z: B a: Heap ref 1 ref 2 equals? Exhaustive traversal is expensive!

Solution - Iterative Profiling 14 depth = 1depth = 2 x: 45 MyClass y: 0 z: B a: x: 45 MyClass y: 1 z: B a: Heap ref 1 ref 2 equals? Iterative approach can analyze programs with complex structures

MemoizeIt 15 ProgramProfiling input CPU-Time Profiling Input-Output Profiling Candidates ranking Fix suggestions Initial method candidates Input-Output Profiling Filter method candidates if max depth || time limit new candidates depth++ exit() d = 1

MemoizeIt 16 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling Ranking of Candidates ! Ranked candidate methods Ranking based 1.Estimated saved time 2.Estimated hit-ratio Ranking based 1.Estimated saved time 2.Estimated hit-ratio

MemoizeIt 17 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling Ranking of Candidates Fix Suggestions Optimal cache configuration ! Ranked candidate methods Suggests configuration among: Single Instance Single Global Multi Instance Multi Global + need for invalidation

Experimental Setup 18 ProgramDescription DaCapo 2006 MR2antlr, bloat, chart, fop, luindex, pmd Checkstyle - 5.6Source-code style checker Soot – ae0cec69c0Static program analysis / manipulation Apache Tika - 1.3Content analysis toolkit Apache POI - 3.9MS Office documents manipulation

Evaluation – Research Question Is MemoizeIt effective at finding new memoization opportunities? 1.Manually select realistic input 2.Execute MemoizeIt 3.Manually inspect methods 4.Implement MemoizeIt’s suggestions Timeout for profiling: 1 hour 19

Evaluation – Results 20 9 new opportunities DaCapo-antlr, DaCapo-bloat, DaCapo-fop Soot, Apache-Tika, Apache-POI, Checkstyle 1 duplicate method in Apache-Tika, Apache-POI 31 memoization opportunities Is MemoizeIt effective at finding new memoization opportunities?

Evaluation – Results 21 Small workload [speed-up] Large workload [speed-up] DaCapo-antlr 1.04 ± ± 0.02 DaCapo-bloat 1.08 ± DaCapo-fop 1.05 ± 0.01NA Checkstyle ± 0.10 Soot 1.27 ± ± 0.05 Apache-Tika Excel ± 0.02 Apache-Tika Jar 1.09 ± ± 0.02 Apache-POI (1) 1.11 ± ± 0.01 Apache-POI (2) 1.07 ± ± 0.01

Evaluation – Research Question 22 Is the iterative or exhaustive approach more efficient?

Evaluation – Results 22 Iterative Time [minutes] Exhaustive Time [minutes] DaCapo-antlr timeout DaCapo-bloat timeout DaCapo-chart 22 DaCapo-fop 18timeout DaCapo-luindex 32timeout DaCapo-pmd timeout Checkstyle 622 Soot timeout Apache-Tika Excel 5856 Apache-Tika Jar 4135 Apache-POI 2337 Iterative wins Exhaustive wins Is the iterative or exhaustive approach more efficient?

Related Work Performance problems Detecting [Xu, OOPSLA12], [Zaparanuks, PLDI12] Understanding [Song, OOPSLA14], [Yu, ASPLOS14] Fixing [Nistor, ICSE15] 23 Compiler optimizations [Ding, CGO04], [Costa, CGO13], [St-Amour, OOPSLA12] Incremental computations [Pugh, POPL89] Other caching techniques [Ma, WWW15]

Conclusions Profiling of memoization opportunities New real-world opportunities Relevant speed-ups Iterative strategy beneficial Suggests cache configurations Suggestions easy to implement Artifact evaluated 24 Heap Single Global Multi Instance Multi Global Single Instance