Download presentation
Presentation is loading. Please wait.
Published byShawn Oliver Modified over 8 years ago
1
DISSERTATION RESEARCH PLAN Mitesh Meswani
2
Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying Resources Identifying Signatures Performance Counters for profiling Representative tracing and validation
3
Previous Methodology Trace Selection: Trace the steady state execution of the benchmark suite using CPI for measuring representativeness, One trace per benchmark. Simulate the traces for different SMT knob settings recording the best setting for each pair Use regression modeling techniques to generate an analytical prediction model to predict best settings for a pair Prove model effectiveness for predicting settings for traces from other benchmarks
4
Recap of Previous Results Models using Decision Trees for SPEC CPU2000 and Stream Prediction of SMT mode: 97.5% Prediction of SMT Thread Priority: 83%
5
Modified Plan Summary Represent the use of relevant shared resources by a benchmark Identify signatures of shared resource usage within benchmarks using performance counters Use traces that represent signatures of shared resource usage that can cover 80% of the benchmarks execution Finally, identify the best SMT knob settings of the representative traces
6
Shared Resources Shared Resources (seven): TLB, Cache Memory (L2, L3), Branch Unit, FP Unit, FXU Unit, Compare- register Unit, Branch prediction hardware (history table) How many resources to consider? :- Analyze current traces to eliminate resources contribute less than a threshold value to cycles spent in shared resources. Compare-register unit is not significant Branch unit is also not significant
7
Signatures How many? :- A resource may have mild, moderate, or high contribution, to cycles spent in shared resources Idea: If we have five resources, equal contribution would mean 100/5 = approx 20% of cycles per resource, using this as basis Mild (1% to 15%) Moderate: 16% to 24% High : Greater than 24%
8
Finding Signatures Profile the benchmark execution to find cycles spent in the monitored shared resources Using performance counters sample the counters periodically Categorize the benchmark execution (SPEC CPU2000) in one of the possible permutation
9
Finding signatures Continued Profiling benchmark execution: Only six counters allowed per execution What are the Counts for a sample period? :-Merge them from different executions ? Use the highest sampling rate ?
10
Perf Counters to collect data Identified Counters FP: Completion stalls due to FPU (CMPLU_STALLS_FPU) FXU: Completion stalls due to FXU (CMPLU_STALLS_FXU) Derived Counters: LSU Stalls= Total Stalls in LSU – Stalls due to d-cache miss – stalls due to d-tlb miss
11
Perf Counters to collect data continued Unsolved TLB: Total d-tlb misses, Total i-tlb misses, miss resolution sites not known Total Cycles spent for accessing d-tlb known, includes cost of hits and misses Caches L2, L3 hit for data and instruction known, Maybe greater than actual penalty, execution overlaps misses, or miss down misspredicted branches Maybe use d-cache miss penalty and i-cache miss penatly on POWER5 which are counted only if completion is stalled. Branch History Affects prediction, Counter available to count cycles missprediction stalls completion
12
Representative Traces Collect traces if required, that represent the signatures found in benchmark profiling Use the performance data from simulation of single traces to verify the signatures Collect data for evaluating SMT-knobs on representative traces
13
Validation Use Scientific applications to verify if they are covered by signatures for 80% of their execution TO DO Identify test applications.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.