Download presentation
Presentation is loading. Please wait.
Published byAgnes Sparks Modified over 9 years ago
1
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen M. Blackburn (Australian Nat’l Univ.) Kathryn S. McKinley (UT Austin)
2
2/22/082 Challenges of JVM Performance Analysis Controlling nondeterminism Just-In-Time Compilation driven by nondeterministic sampling Garbage Collectors Other Helper Threads Production JVMs are not created equal Thread model (kernel, user threads) Type of helper threads Need a solid measurement methodology! Isolate each JVM part
3
2/22/083 Forest and Trees What performance metrics explain performance differences and bottlenecks? Cache miss? L1 or L2? TLB miss? # of instructions? Inspecting one or two metrics is not always enough Performance counters give us only small number of counters at a time Multiple invocation for the measurement inevitable
4
2/22/084 Case Study: jython Application performance (Cycles)
5
2/22/085 Case Study: jython L1 Instruction cache miss/cyc
6
2/22/086 Case Study: jython L1 Data cache miss/cyc
7
2/22/087 Case Study: jython Total Instruction executed (retired)
8
2/22/088 Case Study: jython L2 Data cache miss/cycle
9
2/22/089 Project Status Established methodology to characterize application code performance Large number of metrics (40+) measured from hardware performance counters apples to apple comparison of JVMs using standard interface (JVMTI, JNI) Simulator data for detail analysis Limit studies What if L1 cache had no misses? More performance metrics e.g. uop mix
10
2/22/0810 Performance Counter Methodology Warmup JVM Stop JIT Full Heap GC Measured Run change metric Invoke JVM y times 1st – xth iteration (x+1)th iteration (x+2)th – (x+2+(n/p)k)th iteration Collecting n metric x warmup iterations (x = 10) p performance counters (can measure at most p metrics per iter.) n/p iterations needed for measurement k redundant measurement for statistical validation (k = 1) Need to hold workload constant for multiple measurements
11
2/22/0811 Performance Counter Methodology Stop-the-world Garbage Collector No concurrent marking One perfctr instance per pthread JVM internal threads are different pthreads from the application JVMTI Callbacks Thread start - start counter Thread finish - stop counter GC start - pause counter, only for userlevel thread GC stop - resume counter, only for userlevel thread
12
2/22/0812 Methodology Limitations Cannot factor out memory barrier overhead Use garbage collector with the least application overhead If a helper thread runs in the same pthread with the application (user-level thread), it will cause perturbation No evidence in J9, HotSpot, JRockit Instrumented code overhead Must be included in the measurement
13
2/22/0813 Performance Counter Experiment Pentium-M uni-processor 32KB 8-way L1 cache (data & instruction) 2MB 4-way L2 cache 2 hardware counter (18 if multiplexed) 1GB Memory 32bit Linux 2.6.20 with perfctr patch PAPI 3.5.0 Library Simulator Experiment PTLsim (http://www.ptlsim.org) x86 simulatorhttp://www.ptlsim.org 64bit AMD Athlon Experiment
14
2/22/0814 Experiment 3 Production JVMs * 2 versions IBM J9, Sun HotSpot JVM, JRockit (perfctr only) 1.5 and 1.6 Heap Size = max (16MB, 4*minimum heap size) 18 Benchmarks 9 DaCapo benchmarks 8 SPEC JVM 98 1 PseudoJBB
15
2/22/0815 Experiment 40+ Metrics 40 distinct metrics from performance counter L1 or L2 Cache misses (Instruction, Data, Read, Write) TLB-I miss Branch predictions Resource Stalls More rich metrics from the simulator Micro operation mix Load to store
16
2/22/0816 Performance Counter Results (Cycle Counts) PseudoJBB pmd jython jess
17
2/22/0817 Performance Counter Results (Cycle Counts) jack hsqldb compress db
18
2/22/0818 Performance Counter Results IBM J9 1.6 performed better than Sun HotSpot 1.6 in the average JRockit has the most variation in performance Full results ~800 graphs Full jython results in the paper http://z.cs.utexas.edu/users/habals/jvmcmp http://z.cs.utexas.edu/users/habals/jvmcmp or Google my name (Jungwoo Ha)
19
2/22/0819 Future Work JVM activity characterization Garbage collector JIT Statistical analysis of performance metrics metrics correlation Methodology to identify performance bottleneck Multicore performance analysis
20
2/22/0820 Conclusions Methodology for production JVM comparison Performance evaluation data Simulator results for deeper analysis
21
Thanks you!
22
2/22/0822
23
2/22/0823 Simulation Result
24
2/22/0824 Perfect Cache - compress
25
2/22/0825 Perfect Cache - db
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.