Download presentation
Presentation is loading. Please wait.
Published byAgatha Bell Modified over 9 years ago
1
Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas R. Gross – ETH Zurich October 30 th, 2015 - OOPSLA15 1
2
MemoizeIt 2 Dynamic analysis Memoization opportunities Automatic 9 new real-world memoization opportunities
3
Apache POI – Issue 55611 3 Performance Issue
4
public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3
5
public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Java profiler Ranked 10 (189), 4000 calls Java profiler Ranked 10 (189), 4000 calls Java profiler No additional bottleneck info Java profiler No additional bottleneck info
6
public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Research tools Sympthoms are not there* Research tools Sympthoms are not there* No nested loops No memory bloat * [Nistor, ISCE13], [Xu, OOPSLA12]
7
public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Observation Many calls have the same input and output values! Observation Many calls have the same input and output values! Output Returned value Output Returned value Input Parameters + accessed fields Input Parameters + accessed fields true false 0, “m/d/yy” 1, “h:mm” Memoization ?
8
public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 Purity analysis? Too conservative! Purity analysis? Too conservative! Side effect s Side effect s Side effect s Ignore side effects!
9
public boolean DateUtil.isADateFormat(int idx, String format) { StringBuilder sb = new StringBuilder(format.length()); for (int i = 0; i < sb.length(); i++) { // Modify format and write to sb } String f = sb.toString(); // Process f using date pattern matching return date_ptrn.matcher(f).matches(); } Apache POI – Issue 55611 3 MemoizeIt 1 st ranked method! MemoizeIt 1 st ranked method! MemoizeIt Finds calls with the same input and output values. MemoizeIt Finds calls with the same input and output values. Memoization!
10
boolean cache_value; int cache_key1; String cache_key2; public boolean isADateFormatSlow(int idx, String format) { // Slow isADateFormat code } public boolean isADateFormat(int idx, String format) { if (cache_key1 == idx && cache_key2.equals(format)) { return cache_value; } // Update cache keys and value return isADateFormatSlow(idx, format); } Apache POI – Issue 55611 3 Single entry instance cache Up to 25% speed-up!
11
MemoizeIt – Contributions 4 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods
12
MemoizeIt – Contributions 5 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods Challenge boolean DateUtil.isADateFormat(int idx, MyClass format) Heap
13
MemoizeIt – Contributions 6 1. Automatic analysis to find memoization opportunities 2. Suggest fix configurations for candidate methods Challenge MemoizeIt == Memoization + Iterative
14
MemoizeIt 7 ProgramProfiling Input CPU-Time Profiling Filtering of methods: 1.Number of executions 2.Average execution time 3.Relative execution time Filtering of methods: 1.Number of executions 2.Average execution time 3.Relative execution time Initial method candidates
15
MemoizeIt 8 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling
16
Input-Output Profiling 9 Input: Parameters + accessed fields Output: Returned value Input-output tuple (T) main … … … 1. For each call of candidate method 3. Select method candidates T1T1 T2T2 multiplicity(T 1 ) = 3 multiplicity(T 2 ) = 2 Repeated Input-Output Memoization boolean DateUtil.isADateFormat(int idx, String format) 2. Trace method input-output values true false 0, “m/d/yy” 1, “h:mm”
17
Challenge – Complex Objects 10 boolean DateUtil.isADateFormat(int idx, MyClass format)
18
Challenge – Complex Objects 10 … x: 45 MyClass y: 1 z: B a: equals? Structural and content equivalence … x: 45 MyClass y: 0 z: B a:
19
Challenge – Complex Objects 11 flat(object) (MyClass 1, [45, 1, (B 1, [...])]) … x: 45 MyClass y: 1 z: B a:
20
Challenge – Complex Objects 12 Heap … x: 45 MyClass y: 1 z: B a: Can’t keep everything!
21
Challenge – Complex Objects 13 depth = 1depth = 2 x: 45 MyClass y: 0 z: B a: x: 45 MyClass y: 1 z: B a: Heap ref 1 ref 2 equals? Exhaustive traversal is expensive!
22
Solution - Iterative Profiling 14 depth = 1depth = 2 x: 45 MyClass y: 0 z: B a: x: 45 MyClass y: 1 z: B a: Heap ref 1 ref 2 equals? Iterative approach can analyze programs with complex structures
23
MemoizeIt 15 ProgramProfiling input CPU-Time Profiling Input-Output Profiling Candidates ranking Fix suggestions Initial method candidates Input-Output Profiling Filter method candidates if max depth || time limit new candidates depth++ exit() d = 1
24
MemoizeIt 16 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling Ranking of Candidates ! Ranked candidate methods Ranking based 1.Estimated saved time 2.Estimated hit-ratio Ranking based 1.Estimated saved time 2.Estimated hit-ratio
25
MemoizeIt 17 ProgramProfiling Input CPU-Time Profiling Input-Output Profiling Ranking of Candidates Fix Suggestions Optimal cache configuration ! Ranked candidate methods Suggests configuration among: Single Instance Single Global Multi Instance Multi Global + need for invalidation
26
Experimental Setup 18 ProgramDescription DaCapo 2006 MR2antlr, bloat, chart, fop, luindex, pmd Checkstyle - 5.6Source-code style checker Soot – ae0cec69c0Static program analysis / manipulation Apache Tika - 1.3Content analysis toolkit Apache POI - 3.9MS Office documents manipulation
27
Evaluation – Research Question Is MemoizeIt effective at finding new memoization opportunities? 1.Manually select realistic input 2.Execute MemoizeIt 3.Manually inspect methods 4.Implement MemoizeIt’s suggestions Timeout for profiling: 1 hour 19
28
Evaluation – Results 20 9 new opportunities DaCapo-antlr, DaCapo-bloat, DaCapo-fop Soot, Apache-Tika, Apache-POI, Checkstyle 1 duplicate method in Apache-Tika, Apache-POI 31 memoization opportunities Is MemoizeIt effective at finding new memoization opportunities?
29
Evaluation – Results 21 Small workload [speed-up] Large workload [speed-up] DaCapo-antlr 1.04 ± 0.031.05 ± 0.02 DaCapo-bloat 1.08 ± 0.03- DaCapo-fop 1.05 ± 0.01NA Checkstyle -9.95 ± 0.10 Soot 1.27 ± 0.0312.93 ± 0.05 Apache-Tika Excel -1.25 ± 0.02 Apache-Tika Jar 1.09 ± 0.011.12 ± 0.02 Apache-POI (1) 1.11 ± 0.011.92 ± 0.01 Apache-POI (2) 1.07 ± 0.011.12 ± 0.01
30
Evaluation – Research Question 22 Is the iterative or exhaustive approach more efficient?
31
Evaluation – Results 22 Iterative Time [minutes] Exhaustive Time [minutes] DaCapo-antlr timeout DaCapo-bloat timeout DaCapo-chart 22 DaCapo-fop 18timeout DaCapo-luindex 32timeout DaCapo-pmd timeout Checkstyle 622 Soot timeout Apache-Tika Excel 5856 Apache-Tika Jar 4135 Apache-POI 2337 Iterative wins Exhaustive wins Is the iterative or exhaustive approach more efficient?
32
Related Work Performance problems Detecting [Xu, OOPSLA12], [Zaparanuks, PLDI12] Understanding [Song, OOPSLA14], [Yu, ASPLOS14] Fixing [Nistor, ICSE15] 23 Compiler optimizations [Ding, CGO04], [Costa, CGO13], [St-Amour, OOPSLA12] Incremental computations [Pugh, POPL89] Other caching techniques [Ma, WWW15]
33
Conclusions Profiling of memoization opportunities New real-world opportunities Relevant speed-ups Iterative strategy beneficial Suggests cache configurations Suggestions easy to implement Artifact evaluated https://github.com/lucadt/memoizeit 24 Heap Single Global Multi Instance Multi Global Single Instance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.