Presentation is loading. Please wait.

Presentation is loading. Please wait.

Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University.

Similar presentations


Presentation on theme: "Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University."— Presentation transcript:

1 Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University of Texas at El Paso Department of Computer Science Bret Olszewski IBM Corporation – Austin, TX

2 ICPADS ’04 Outline Motivation Data Collection Environment  Workload & Platform  Monitored Events Sampled Event Traces Performance Evaluation Framework Data Analysis & Results Conclusions and Future Work

3 ICPADS ’04 Motivation Modern Systems Performance governed by memory subsystem SMPs  Deeper and larger memory hierarchies  Performance analysis considerations Time to results and size of data set Goal Develop a new performance analysis methodology

4 ICPADS ’04 Data Collection Environment Workload  TPC-C benchmark Commercial OLTP Platform  IBM eServer pSeries 690 architecture (p690) 8- and 32-processor configurations

5 ICPADS ’04 Platform P X XP XP P X X X X P PP P L2 L3 MCM 0 MCM 1 X 8-processor p690 configuration

6 ICPADS ’04 Platform P P PP PP P L2 L3 MCM 0 P P P PP PP P L2 L3 MCM 2 P P P PP PP P L2 L3 MCM 1 P P P PP PP P L2 L3 MCM 3 P 32-processor p690 configuration

7 ICPADS ’04 Monitored Events L2-cache data-load misses  L2.5  L2.75  L3  L3.5  MEM L1-cache data-load miss  L2

8 ICPADS ’04 L2 P X XP XP P X X X X P PP P L3 Penalty: 12 cycles MCM 0 MCM 1 X

9 ICPADS ’04 L2.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 73 cycles MCM 0 MCM 1 X

10 ICPADS ’04 L2.75 P X XP XP P X X X X P PP P L2 L3 Penalty: 96 cycles MCM 0 MCM 1 X

11 ICPADS ’04 L3 P X XP XP P X X X X P PP P L2 L3 Penalty: 112 cycles MCM 0 MCM 1 X

12 ICPADS ’04 L3.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 143 cycles MCM 0 MCM 1 X

13 ICPADS ’04 Data Collection 10-minute observation interval Performance Monitoring Unit (PMU)  Special-purpose registers  Programming interface Kernel extension eprof  PMU configuration  Event-based sampling

14 ICPADS ’04 Sampled Event Traces Sampling  Record periodic occurrences of an event  100 events/sec/CPU Event record 372872 184469 0.328104637 000000000000A8C4 0000000000218880 PIDTIDTimestamp Effective Instruction Address Effective Data Address Average number of samples collected/event  238,448 for 8-processor data  212,396 for 32-processor data

15 ICPADS ’04 Performance Framework Database Load DB Java Tool Report Generation Java Tool p690TPC-C Data Collection Environment Reports 5 BufferPool 56893 29384 6 Data,BSS,Heap 8799 4855 1 Kernel 23485 9840 Sampled Event Traces PID TID Timestamp Instr.Addr. DataAddr. Graphs

16 ICPADS ’04 Data Analysis - 1 Overall goal  Study effectiveness of p690 memory hierarchy Characterize differences between private and shared data loads Track missing L2-cache lines across levels of the p690 memory hierarchy Studied address regions  Referenced by 90% of L2-cache data-load misses  Private: Data,BSS,Heap  Shared: Buffer Pool

17 ICPADS ’04 Data Analysis - 2 Private data loads  Accessible only to owner process  Examples: process’ return stack, local variables  Ideal: Remain close to executing processor Shared data loads  Accessible by every TPC-C process  Examples: application code, global variables  Ideal: Remain in higher levels of memory hierarchy

18 ICPADS ’04 Results 32-Processor Data SharedPrivate

19 ICPADS ’04 Results 32-Processor Data Good Application/Architecture Match PrivateShared

20 ICPADS ’04 Results 32-Processor Data Possible Performance Impediment SharedPrivate

21 ICPADS ’04 Results 32-Processor Data Shared Data References More Localized than Private Data References PrivateShared

22 ICPADS ’04 Results 32-Processor Data MEM Data Load Hits Primarily Due To Compulsory Misses SharedPrivate

23 ICPADS ’04 Conclusions - 1 Developed new performance evaluation framework  Applicable to large SMP systems  Sampled performance monitor event traces Manageable, Collected in real-time  Core Database management system (MySQL), Java tools Applied methodology to study memory-subsystem behavior  TPC-C executing on p690  Evaluated differences between private and shared data loads

24 ICPADS ’04 Conclusions - 2 References for private data  Satisfied within the MCM  Good application/architecture match References for shared data  Referenced outside the MCM  Increased locality of reference  Target for performance improvement Main memory accesses primarily associated with compulsory misses

25 ICPADS ’04 Future Work Quantify representativeness of sampled event traces Enhance performance evaluation framework Expand study of application data load behavior e.g., process characterization Suggest ways to improve performance of TPC-C executing on p690  Improved memory management of Buffer Pool resulting in performance improvements  Track performance impediments to actual code and/or data structures

26 ICPADS ’04 Thank You. Questions?


Download ppt "Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University."

Similar presentations


Ads by Google