Download presentation
Presentation is loading. Please wait.
1
1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at Boulder
2
2 Finding Causes of Performance Phenomena Application Operating System Hardware C Program Application Framework Java Library Virtual Machine Native Library Operating System Hardware Java /.net Program
3
3 Warehouse Transactions Methodology Benchmark:SPECjbb2000 Virtual machine:JikesRVM Initialization 1 thread 120,000 transactions 50 transactions per time slice time
4
4 Expected Performance of Warehouse Thread Inst / Cyc 9,792 million39,816 millionCycles
5
5 Observed Performance of Warehouse Thread Inst / Cyc 0 0.622 9,792 million39,816 millionCycles
6
6 Investigation: Why this Difference? Correlate IPC with more than 100 other hardware performance metrics –No significant overall correlation
7
7 Investigation: Correlate with GC Activity Inst / Cyc 0 0.622 9,792 million39,816 millionCycles
8
8 Phenomenon Pre-GC Dip Inst / Cyc 0 0.622 9,792 million39,816 millionCycles
9
9 Phenomenon Pre-GC Dip Correlate with OS-Level Metric Inst / Cyc EEOff / Cyc 0 0.622 0 0.219 +300% -6% 9,792 million39,816 millionCycles
10
10 Phenomenon Pre-GC Dip Next Steps We have not found the root cause yet… Need metrics from different levels: –Allocation –Synchronization –System calls –Interrupts
11
11 Observed Performance Inst / Cyc 0 0.622 9,792 million39,816 millionCycles
12
12 Phenomenon Continuous increase Inst / Cyc 0 0.622 9,792 million39,816 millionCycles
13
13 Phenomenon Continuous increase Correlate with HW-Level Metric Inst / Cyc 0 0.622 LsuFlush / Cyc 0.037 0 9,792 million39,816 millionCycles
14
14 Phenomenon Continuous increase Correlate with VM-Level MetricNon- Opt AOSOpt StartEnd IPC0.34790.40910.48900.5082 LsuFlush/Cyc0.05330.02500.00170.0007
15
15 Phenomenon Continuous increase Next Steps We have not verified the root cause yet… Need metrics from different levels: –Recompilation activity –Time spent executing non-optimized vs. optimized code
16
16 Gather data about multiple levels Application Framework Java Library Virtual Machine Native Library Operating System Hardware Pre-GC Dip Continuous increase Vertical Profiling
17
17 Vertical Event Trace
18
18 Challenges & Possible Approaches Huge difference in event frequencies –E.g. 7 GCs, but 20 billion instructions completed –Idea: Count high-frequency events, trace low-frequency events Large number of possible metrics –Trace everything: impossible to anticipate, too expensive –Write many specialized profilers: error prone, large effort –Idea: Generate profilers from specification Overhead –E.g. tracing every memory access is very expensive –Idea: Provide tunable profiling parameters for least overhead Perturbation –E.g. instrumenting every memory access perturbs HPMs –Idea: Use separate runs for interfering metrics Separate Traces –E.g. handling non-determinism –Idea: Combine traces using intervals to summarize
19
19 Architecture Specification (what) Parameters (how) TracerTrace ReaderTrace Analyzer Generator Event Stream VisualizerInstrumentations Event creations, Counter updates Event Stream Interval Stream Aggregated Profiles Instrumenters
20
20 Intervals Events Vertical Profiling Specification: What to Profile specification IPC_And_BytesAllocated { hardware counter longCyc; hardware counter longInst; software counter longBytesAllocated; event ThreadSwitch { intfromThread; inttoThread; longcyc = Cyc; longinst = Inst; longbytesAllocated = BytesAllocated; } interval TimeSlice { starts with ThreadSwitch; ends with ThreadSwitch where end.fromThread == start.toThread; doubleipc = (end.inst-start.inst) / (end.cyc-start.cyc); longbytesAllocated = end.bytesAllocated – start.bytesAllocated; } Event Attributes Interval Metrics Counters
21
21 Status Profiling –Hardware Performance Monitors [VM’04] –Software Performance Monitors –Specification-driven (early prototype) Visualization & Analysis –IBM Performance Explorer
22
22 Future Work Evaluate utility –Find root causes of phenomena Evaluate perturbation –Intra-level perturbation (e.g. HPM → HPM) –Inter-level perturbation (e.g. lock tracing → HPM) Semi-automate investigative process –Statistics / Machine learning
23
23 Related Work Trace Analyzer –[Perl 92] Performance Assertion Checking –[Perl et al. 98] Continuous Monitoring Software Performance Counters –[Microsoft] Windows Management Instrumentation HPM and JikesRVM –[Sweeney et al. 04] Using Hardware Performance Monitors to Understand the Behavior of Java Applications
24
24 Questions?
25
25 EXTRAS
26
26 Profiling HPMs: Infrastructure Power4 Performance Monitors AIX 5.x pmsvc Kernel Extension AIX 5.x pmapi Library JikesRVM 2.3.0.1+ HPM Facility OS Hardware C Library VM
27
27 Profiling HPMs: Samples A sample represents a time slice –Start and end time (in time-base or “decrementer” ticks) –8 event counts –Processor id –Java thread id –Preempted or yielding –Java method ending the sample VP (CPU) 1: VP (CPU) 2: 10 ms
28
28 Profiling HPMs: Benchmark SPEC JBB Modified to execute a given number of transactions (120,000) Startup phase (ca. 8 sec) –1 main thread Steady-state phase (ca. 24 sec) –N warehouse threads Configurations –{1,2,3,4} warehouses on {1,2,3,4} processors Steady-state behavior –Ca. 50 transactions per 10 ms time slice
29
29 Performance Explorer Visualizer for JikesRVM hardware performance counter traces Built-in information about all Power4 performance events Support for creating computed metrics (e.g. Inst/Cyc, given Cyc and Instr counter values) Multiple visualizations, like time chart and scatter plot (for correlation of metrics)
30
30 Performance Explorer: Power4 Event Information
31
31 Performance Explorer: Creation of Computed Metrics
32
32 Performance Explorer: Overview of Java Threads
33
33 Performance Explorer: Time Chart
34
34 Performance Explorer: Scatter Plot
35
35 Phenomenon Pre-GC Dip in IPC Other Correlated Metrics MetricNormalDipIncrease IPC0.49240.46095-6.4% EeOff/Cyc0.019650.0785+300% HvCyc/Cyc0.023870.12489+423% GrpDispBlkSbCyc/Cyc0.005950.02577+333% LsuSrqSyncCyc/Cyc0.006120.017+178% StcxFail/StcxPassFail0.000860.00395+362% LsuLrqFullCyc/Cyc0.000770.00271+250%
36
36 Vertical Profiling Matrix Instrument: Observe: HardwareMachine code Byte codeSource code Hardware OS Native libs VM Java libs Framework Application
37
37 Vertical Profiling Matrix Two “vertical” dimensions –What we observe –What we instrument We may observe higher level behavior by instrumenting a lower level, or vice versa –Instrument HW, observe OS time –Instrument byte code, observe branch misses
38
38 Vertical profiling specification: How to profile ParameterPossible Values Buffer size100000, 1000000, 10000000, … Buffer typeJava byte[], Java int[], native Buffer ownershipGlobal, Processor, Thread Buffer access synchronizationNone, Lock-free, Locked Buffer accessJava, Magic Buffer overflow handlingFlush, Disable, Ignore Buffer flushingExplicit, Seg fault, Each thread switch Buffer flush targetFile, Socket, C routine
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.