Download presentation
Presentation is loading. Please wait.
Published byAmie Bailey Modified over 8 years ago
1
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University
2
June 12, 2005MSP 20052 Introduction Memory wall –Increasing gap between processor and memory speeds –Concentration on bandwidth at the expense of latency Prefetch important data –Do not wait until the processor requests data –Pro-actively fetch the data that is likely to be consumed in the near future
3
June 12, 2005MSP 20053 Stream Prefetching Prefetching with outcome-based prediction –Use the history of previous misses to guess data addresses that are likely to miss soon Stream prefetching –A special case of outcome-based prediction –Proposed 15 years ago –The only hardware prefetching scheme used in modern microprocessors
4
June 12, 2005MSP 20054 Contributions Detailed sensitivity analysis of main prefetcher parameters on SPECcpu2000 programs –No such study in the literature –Many research papers fail to specify prefetcher parameters in comparative studies Case study –Evaluate performance of Runahead execution on a baseline with different stream prefetcher parameters
5
June 12, 2005MSP 20055 Outline Introduction Stream Prefetcher Operation Evaluation Methodology Experimental Results Conclusion
6
June 12, 2005MSP 20056 How Stream Prefetchers Work validstream addressstride validstream addressstride ……… validstream addressstride addr … AGU Global miss history addr + stride * lookahead miss addr = Stream exists? prefetch addr Stream table
7
June 12, 2005MSP 20057 Measured Parameters validstream addressstride validstream addressstride ……… validstream addressstride addr … miss addr prefetch distance Number of supported streams miss history length AGU addr + stride * lookahead = Stream exists? prefetch addr
8
June 12, 2005MSP 20058 Evaluation Methodology Benchmarks –22 SPECcpu2000 programs, highly optimized –All F77, C, and C++ programs –Multiple reference inputs per program –SimPoint interval of 500 million instructions Simulated architecture –SimpleScalar v4.0 cycle-accurate simulator –Aggressive superscalar Alpha 21264-like core
9
June 12, 2005MSP 20059 Simulated System Execution Core Fetch/issue/commit4/4/4 I-window/ROB/LSQ64/128/64 LdSt/Int/FP units2/4/2 Execution latenciesSimilar to Alpha 21264 Branch predictor16K-entry bimodal/gshare hybrid Memory Subsystem Cache sizes64KB IL1, 64KB DL1, 1MB L2 Cache associativity2-way L1, 4-way L2 Cache latencies2 cyc L1, 20 cyc L2 Main memory latency400 cycles
10
June 12, 2005MSP 200510 Outline Introduction Motivation Implementation Experimental Results Conclusion
11
June 12, 2005MSP 200511 Miss History Length 7 programs are very sensitive 16-entry history is enough
12
June 12, 2005MSP 200512 Number of Stream Table Entries only 3 programs are sensitive > 8 streams provides little benefit
13
June 12, 2005MSP 200513 L2 Cache Prefetch Distance 11 programs are very sensitive FP speedup varies by 80% - 140%
14
June 12, 2005MSP 200514 Case Study: Runahead Execution Performance of stream prefetching is highly dependent on parameter choice Another proposal: Runahead execution –Pseudo-retire long latency loads stalling the pipeline and continue executing –Roll back to checkpoint after load comes back from memory
15
June 12, 2005MSP 200515 Speedup over Stream Prefetching SPEC fp speedup drops by > 2x
16
June 12, 2005MSP 200516 Conclusion Key observations –The performance of the stream prefetcher is highly dependent on its configuration –Varying the prefetch distance alone almost doubles the average performance benefit –Choosing a non-optimal stream prefetcher as a baseline can distort results by a factor of two Conclusion –Parameter optimizations are imperative when comparing stream prefetchers to other prefetching techniques
17
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.