Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Patricia J. Teller, and Jaime Acosta The University of Texas at.

Similar presentations


Presentation on theme: "Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Patricia J. Teller, and Jaime Acosta The University of Texas at."— Presentation transcript:

1 Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Patricia J. Teller, and Jaime Acosta The University of Texas at El Paso Department of Computer Science Trevor Morgan Exxon/Mobil Bret Olszewski IBM Corporation-Austin 5 th Annual IBM Austin CAS Conference – 20 February 2004

2 Outline Motivation Data  Events Profiled  Information Collected Analysis  Approach  Performance Evaluation Framework Results Conclusions and Future Work

3 5 th Annual IBM Austin CAS Conference – 20 February 2004 Motivation Overall research goal General workload characterization model Project goal  Develop a performance evaluation framework to facilitate analysis of large sampled event traces  Study load access patterns of key applications  Identify and remedy performance impediments

4 5 th Annual IBM Austin CAS Conference – 20 February 2004 Data Collection Environment IBM eserver p-Series 690 architecture 8- and 32-processor configurations TPC-C benchmark Data collected via event trace sampling: Timestamp Effective instruction and data addresses CPU id Process id Thread id

5 5 th Annual IBM Austin CAS Conference – 20 February 2004 Platform -1 P X XP XP P X X X X P PP P L2 L3 MCM 0 MCM 1 X 8-processor p690 configuration

6 5 th Annual IBM Austin CAS Conference – 20 February 2004 Platform - 2 P P PP PP P L2 L3 MCM 0 P P P PP PP P L2 L3 MCM 2 P P P PP PP P L2 L3 MCM 1 P P P PP PP P L2 L3 MCM 3 P 32-processor p690 configuration

7 5 th Annual IBM Austin CAS Conference – 20 February 2004 Events Resolution of L2-cache data-load misses  L2.5 L2.5 shared L2.5 modified  L2.75 L2.75 shared L2.75 modified  L3  L3.5

8 5 th Annual IBM Austin CAS Conference – 20 February 2004 L2.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 73 cycles MCM 0 MCM 1 X

9 5 th Annual IBM Austin CAS Conference – 20 February 2004 L2.75 P X XP XP P X X X X P PP P L2 L3 Penalty: 96 cycles MCM 0 MCM 1 X

10 5 th Annual IBM Austin CAS Conference – 20 February 2004 L3 P X XP XP P X X X X P PP P L2 L3 Penalty: 112 cycles MCM 0 MCM 1 X

11 5 th Annual IBM Austin CAS Conference – 20 February 2004 L3.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 143 cycles MCM 0 MCM 1 X

12 5 th Annual IBM Austin CAS Conference – 20 February 2004 Analysis Identify application-specific sources of performance degradation associated with data references Level of Memory Hierarchy kernel …. text buffer pool data,bss,heap …. Address Space Segment Page Page Offset/ Cache line

13 5 th Annual IBM Austin CAS Conference – 20 February 2004 Performance Evaluation Framework Database Load DB Java Tool Report Generation Java Tool p690TPC-C Data Collection Environment Reports 5 BufferPool 56893 29384 6 Data,BSS,Heap 8799 4855 1 Kernel 23485 9840 Graphs Sampled Event Traces PID TID Timestamp Instr.Addr. DataAddr.

14 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results

15 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Memory Regions

16 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - L3 Cache

17 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Segment

18 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Pages

19 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results – Cache Lines

20 5 th Annual IBM Austin CAS Conference – 20 February 2004 Results - Instructions Lock OperationsAtomic Operations simple_lockfetch_and_add simple_lock_ppcfetch_and_add_h simple_unlockfetch_and_addlp disable_lockfetch_and_or unlock_enablefetch_and_orlp simple_unlock_memfetch_and_and unlock_enable_memfetch_and_andlp

21 5 th Annual IBM Austin CAS Conference – 20 February 2004 Targets for performance improvement of TPC-C are associated mainly with two regions of the address space:  buffer pool  data, bss, heap TPC-C lock instructions are not key to performance degradation 8- and 32-processor data have same reference pattern, thus, a model of TPC-C memory access may be possible Conclusions

22 5 th Annual IBM Austin CAS Conference – 20 February 2004 Suggest ways to improve performance of applications executed on p690 Enhance performance evaluation framework Quantify representativeness of sampled event traces Expand study of application data load behavior  Process characterization  Process migration  Other performance issues Compulsory vs. capacity/conflict misses False sharing Contention for resources Develop synthetic applications that mimic the behavior of key p690 applications; use these to study application behavior and experiment with modifications to applications that may affect performance Future Work

23 5 th Annual IBM Austin CAS Conference – 20 February 2004 Questions?


Download ppt "Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Patricia J. Teller, and Jaime Acosta The University of Texas at."

Similar presentations


Ads by Google