Download presentation
Presentation is loading. Please wait.
Published byMyra Paul Modified over 9 years ago
1
Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Patricia J. Teller, and Jaime Acosta The University of Texas at El Paso Department of Computer Science Trevor Morgan Exxon/Mobil Bret Olszewski IBM Corporation-Austin 5 th Annual IBM Austin CAS Conference – 20 February 2004
2
Outline Motivation Data Events Profiled Information Collected Analysis Approach Performance Evaluation Framework Results Conclusions and Future Work
3
5 th Annual IBM Austin CAS Conference – 20 February 2004 Motivation Overall research goal General workload characterization model Project goal Develop a performance evaluation framework to facilitate analysis of large sampled event traces Study load access patterns of key applications Identify and remedy performance impediments
4
5 th Annual IBM Austin CAS Conference – 20 February 2004 Data Collection Environment IBM eserver p-Series 690 architecture 8- and 32-processor configurations TPC-C benchmark Data collected via event trace sampling: Timestamp Effective instruction and data addresses CPU id Process id Thread id
5
5 th Annual IBM Austin CAS Conference – 20 February 2004 Platform -1 P X XP XP P X X X X P PP P L2 L3 MCM 0 MCM 1 X 8-processor p690 configuration
6
5 th Annual IBM Austin CAS Conference – 20 February 2004 Platform - 2 P P PP PP P L2 L3 MCM 0 P P P PP PP P L2 L3 MCM 2 P P P PP PP P L2 L3 MCM 1 P P P PP PP P L2 L3 MCM 3 P 32-processor p690 configuration
7
5 th Annual IBM Austin CAS Conference – 20 February 2004 Events Resolution of L2-cache data-load misses L2.5 L2.5 shared L2.5 modified L2.75 L2.75 shared L2.75 modified L3 L3.5
8
5 th Annual IBM Austin CAS Conference – 20 February 2004 L2.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 73 cycles MCM 0 MCM 1 X
9
5 th Annual IBM Austin CAS Conference – 20 February 2004 L2.75 P X XP XP P X X X X P PP P L2 L3 Penalty: 96 cycles MCM 0 MCM 1 X
10
5 th Annual IBM Austin CAS Conference – 20 February 2004 L3 P X XP XP P X X X X P PP P L2 L3 Penalty: 112 cycles MCM 0 MCM 1 X
11
5 th Annual IBM Austin CAS Conference – 20 February 2004 L3.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 143 cycles MCM 0 MCM 1 X
12
5 th Annual IBM Austin CAS Conference – 20 February 2004 Analysis Identify application-specific sources of performance degradation associated with data references Level of Memory Hierarchy kernel …. text buffer pool data,bss,heap …. Address Space Segment Page Page Offset/ Cache line
13
5 th Annual IBM Austin CAS Conference – 20 February 2004 Performance Evaluation Framework Database Load DB Java Tool Report Generation Java Tool p690TPC-C Data Collection Environment Reports 5 BufferPool 56893 29384 6 Data,BSS,Heap 8799 4855 1 Kernel 23485 9840 Graphs Sampled Event Traces PID TID Timestamp Instr.Addr. DataAddr.
14
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results
15
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Memory Regions
16
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - L3 Cache
17
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Segment
18
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Pages
19
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results – Cache Lines
20
5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Instructions Lock OperationsAtomic Operations simple_lockfetch_and_add simple_lock_ppcfetch_and_add_h simple_unlockfetch_and_addlp disable_lockfetch_and_or unlock_enablefetch_and_orlp simple_unlock_memfetch_and_and unlock_enable_memfetch_and_andlp
21
5 th Annual IBM Austin CAS Conference – 20 February 2004 Targets for performance improvement of TPC-C are associated mainly with two regions of the address space: buffer pool data, bss, heap TPC-C lock instructions are not key to performance degradation 8- and 32-processor data have same reference pattern, thus, a model of TPC-C memory access may be possible Conclusions
22
5 th Annual IBM Austin CAS Conference – 20 February 2004 Suggest ways to improve performance of applications executed on p690 Enhance performance evaluation framework Quantify representativeness of sampled event traces Expand study of application data load behavior Process characterization Process migration Other performance issues Compulsory vs. capacity/conflict misses False sharing Contention for resources Develop synthetic applications that mimic the behavior of key p690 applications; use these to study application behavior and experiment with modifications to applications that may affect performance Future Work
23
5 th Annual IBM Austin CAS Conference – 20 February 2004 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.