Download presentation
Presentation is loading. Please wait.
Published byWillis Hubbard Modified over 9 years ago
1
Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University of Texas at El Paso Department of Computer Science Bret Olszewski IBM Corporation – Austin, TX
2
ICPADS ’04 Outline Motivation Data Collection Environment Workload & Platform Monitored Events Sampled Event Traces Performance Evaluation Framework Data Analysis & Results Conclusions and Future Work
3
ICPADS ’04 Motivation Modern Systems Performance governed by memory subsystem SMPs Deeper and larger memory hierarchies Performance analysis considerations Time to results and size of data set Goal Develop a new performance analysis methodology
4
ICPADS ’04 Data Collection Environment Workload TPC-C benchmark Commercial OLTP Platform IBM eServer pSeries 690 architecture (p690) 8- and 32-processor configurations
5
ICPADS ’04 Platform P X XP XP P X X X X P PP P L2 L3 MCM 0 MCM 1 X 8-processor p690 configuration
6
ICPADS ’04 Platform P P PP PP P L2 L3 MCM 0 P P P PP PP P L2 L3 MCM 2 P P P PP PP P L2 L3 MCM 1 P P P PP PP P L2 L3 MCM 3 P 32-processor p690 configuration
7
ICPADS ’04 Monitored Events L2-cache data-load misses L2.5 L2.75 L3 L3.5 MEM L1-cache data-load miss L2
8
ICPADS ’04 L2 P X XP XP P X X X X P PP P L3 Penalty: 12 cycles MCM 0 MCM 1 X
9
ICPADS ’04 L2.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 73 cycles MCM 0 MCM 1 X
10
ICPADS ’04 L2.75 P X XP XP P X X X X P PP P L2 L3 Penalty: 96 cycles MCM 0 MCM 1 X
11
ICPADS ’04 L3 P X XP XP P X X X X P PP P L2 L3 Penalty: 112 cycles MCM 0 MCM 1 X
12
ICPADS ’04 L3.5 P X XP XP P X X X X P PP P L2 L3 Penalty: 143 cycles MCM 0 MCM 1 X
13
ICPADS ’04 Data Collection 10-minute observation interval Performance Monitoring Unit (PMU) Special-purpose registers Programming interface Kernel extension eprof PMU configuration Event-based sampling
14
ICPADS ’04 Sampled Event Traces Sampling Record periodic occurrences of an event 100 events/sec/CPU Event record 372872 184469 0.328104637 000000000000A8C4 0000000000218880 PIDTIDTimestamp Effective Instruction Address Effective Data Address Average number of samples collected/event 238,448 for 8-processor data 212,396 for 32-processor data
15
ICPADS ’04 Performance Framework Database Load DB Java Tool Report Generation Java Tool p690TPC-C Data Collection Environment Reports 5 BufferPool 56893 29384 6 Data,BSS,Heap 8799 4855 1 Kernel 23485 9840 Sampled Event Traces PID TID Timestamp Instr.Addr. DataAddr. Graphs
16
ICPADS ’04 Data Analysis - 1 Overall goal Study effectiveness of p690 memory hierarchy Characterize differences between private and shared data loads Track missing L2-cache lines across levels of the p690 memory hierarchy Studied address regions Referenced by 90% of L2-cache data-load misses Private: Data,BSS,Heap Shared: Buffer Pool
17
ICPADS ’04 Data Analysis - 2 Private data loads Accessible only to owner process Examples: process’ return stack, local variables Ideal: Remain close to executing processor Shared data loads Accessible by every TPC-C process Examples: application code, global variables Ideal: Remain in higher levels of memory hierarchy
18
ICPADS ’04 Results 32-Processor Data SharedPrivate
19
ICPADS ’04 Results 32-Processor Data Good Application/Architecture Match PrivateShared
20
ICPADS ’04 Results 32-Processor Data Possible Performance Impediment SharedPrivate
21
ICPADS ’04 Results 32-Processor Data Shared Data References More Localized than Private Data References PrivateShared
22
ICPADS ’04 Results 32-Processor Data MEM Data Load Hits Primarily Due To Compulsory Misses SharedPrivate
23
ICPADS ’04 Conclusions - 1 Developed new performance evaluation framework Applicable to large SMP systems Sampled performance monitor event traces Manageable, Collected in real-time Core Database management system (MySQL), Java tools Applied methodology to study memory-subsystem behavior TPC-C executing on p690 Evaluated differences between private and shared data loads
24
ICPADS ’04 Conclusions - 2 References for private data Satisfied within the MCM Good application/architecture match References for shared data Referenced outside the MCM Increased locality of reference Target for performance improvement Main memory accesses primarily associated with compulsory misses
25
ICPADS ’04 Future Work Quantify representativeness of sampled event traces Enhance performance evaluation framework Expand study of application data load behavior e.g., process characterization Suggest ways to improve performance of TPC-C executing on p690 Improved memory management of Buffer Pool resulting in performance improvements Track performance impediments to actual code and/or data structures
26
ICPADS ’04 Thank You. Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.