Download presentation
Presentation is loading. Please wait.
Published byHillary Robertson Modified over 9 years ago
1
Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith
2
February 20042/19 Outline Motivation Related Work Global History Buffer Prefetching Results Conclusion
3
February 20043/19 Motivation D-Cache misses to main memory are of increasing importance –Main memory is getting farther away (in clock cycles) –Many demanding, memory intensive workloads Computation is inexpensive compared to data accesses –Good opportunity to reevaluate prefetching data structures –Simple computation can supplement table information We consider prefetches from main memory to lowest level cache (L2 cache in this study)
4
February 20044/19 Markov Prefetching Markov prefetching forms address correlations –Joseph and Grunwald (ISCA ‘97) Uses global memory addresses as states in the Markov graph Correlation Table approximates Markov graph B C B A B C Correlation Table 1st predict.2nd predict. miss address A B C A B C B C... AB C 1.5 Miss Address Stream 1.5 Markov Graph A
5
February 20045/19 Correlation Prefetching Distance Prefetching forms delta correlations –Kandiraju and Sivasubramaniam (ISCA ‘02) Delta-based prefetching leads to much smaller table than “classical” Markov Prefetching Delta-based prefetching can remove compulsory misses Markov Prefetching 1 1 -2 1 1 -1 1 Global Delta Stream Distance Prefetching 27 28 29 27 28 29 28 29 Miss Address Stream 1 1 -2 1 global delta 28 29 2829 27 28 29 1st predict.2nd predict. miss address 1st predict.2nd predict.
6
February 20046/19 Global History Buffer (GHB) Holds miss address history in FIFO order Linked lists within GHB connect related addresses –Same static load –Same global miss address –Same global delta Global History Buffer miss addresses Index Table FI Load PC Linked list walk is short compared with L2 miss latency FO
7
February 20047/19 Miss Address Stream Global History Buffer miss addresspointer Index Table 28 29 head pointer 28 27 27 28 29 27 28 29 28 27 GHB - Example => Current => Prefetches Key 28 29 28 29 Global Miss Address
8
February 20048/19 GHB – Deltas 148 1 8 8 1 4 4 1 8 8 Global Delta Stream Miss Address Stream 27 28 36 44 45 49 53 54 62 70 71 1 18 => Current => Prefetches Key 844 WidthDepthHybrid Markov Graph.3.7 71 + 8 => 79 79 + 8 => 87 Prefetches 71 + 4 => 75 79 + 4 => 79 Prefetches 71 + 8 => 79 71 + 4 => 75 Prefetches
9
February 20049/19 GHB – Hybrid Delta Width prefetching suffers from poor accuracy and short look- ahead Depth prefetching has good look-ahead, but may miss prefetch opportunities when a number of “next” addresses have similar probability The hybrid method combines depth and width
10
February 200410/19 79 + 4 => 79 71 + 4 => 75 Global History Buffer miss addresspointer Index Table head pointer 27 28 36 44 45 49 53 1 GHB - Hybrid Example 1 => Current => Prefetches Key 54 62 70 4 8 8 8 Global Delta 1 8 8 1 4 4 1 8 8 Global Delta Stream Miss Address Stream 27 28 36 44 45 49 53 54 62 70 71 1 8 4 4 8 71 + 8 => 79 79 + 8 => 87 Prefetches
11
February 200411/19 Simulation Methodology Simulated SPEC CPU2000 benchmarks Fast forwarded 1 billion instructions and simulated 1 billion instructions Used peak binaries compiled -O4 optimization Results include all benchmarks that have at least a 5% IPC improvement with an ideal L2 cache Issue Width4 Instructions Load Store Queue 64 Entries RUU Size128 Entries Level 1 D-Cache16 KB, 2-way Level 1 I-Cache16 KB, 2-way Level 2 Cache512 KB, 4-way Memory Latency140 Cycles
12
February 200412/19 Simulation Methodology Table walk - one cycle per access IT size reduces table conflicts GHB size reflects prefetch history working set In general, the GHB prefetching requires less history Prefetching MethodTable ConfigurationSize Conventional Distance Prefetching512 Table Entries18 KB GHB Distance Prefetching512 IT Entries & 512 GHB Entries 8 KB
13
February 200413/19 Results Our results compare: –IPC Improvement (harmonic mean) vs. Prefetch Degree –Increase in Memory Traffic per instruction (arithmetic mean) vs. Prefetch Degree –Prefetch Accuracy – The percent of prefetches that are used by the program
14
February 200414/19 Distance Prefetching (Performance) 5% 15% 25% 35% 124816 Prefetch Degree Table (width) GHB (width) GHB (depth) GHB (hybrid) IPC Improvement
15
February 200415/19 Distance Prefetching (Performance) -10% 10% 30% 50% 70% 90% 110% ammp art wupwise swim lucas mgrid applu galgel apsi mcf twolf vpr parser gap bzip2 hmean Table (width) GHB (width) GHB (depth) GHB (hybrid) IPC Improvement (~300%)
16
February 200416/19 Distance Prefetching (Memory Traffic) 0% 30% 60% 90% 120% 150% 180% 124816 Prefetch Degree Table (width) GHB (width) GHB (depth) GHB (hybrid) Increase in Memory Traffic
17
February 200417/19
18
February 200418/19 Distance Prefetching (Memory Traffic) 0% 30% 60% 90% 120% 150% 180% 124816 Prefetch Degree Table (width) GHB (width) GHB (depth) GHB (hybrid) Increase in Memory Traffic
19
February 200419/19 Conclusions More complete picture of history –Allows width, depth, and hybrid –Also can improve other prefetching methods (covered in depth in the paper) Eliminates stale history in a natural way –FIFO discards old history to make room for new history –In a conventional table, old history can remain for a very long time and trigger inaccurate prefetches
20
February 200420/19 Acknowledgements This research was funded by: –An Intel Undergraduate Research scholarship. –A University of Wisconsin Hilldale Undergraduate Research fellowship. –The National Science Foundation under grants CCR-0311361 and EIA-0071924.
21
February 200421/19 Backup Slides
22
February 200422/19 Prefetching Metrics Accuracy is the percent of prefetches that are actually used. Coverage is the percent of memory references prefetched rather than demand fetched. Timeliness indicates if prefetched data arrives early enough to prevent the processor from stalling.
23
February 200423/19 GHB – Deltas 148 1 8 8 1 4 4 1 8 8 Global Delta Stream Miss Address Stream 27 28 36 44 45 49 53 54 62 70 71 1 18 => Current => Prefetches Key 844 Markov Graph.3.7 11
24
February 200424/19 Prefetch Taxonomy To simplify the discussion and illustrate the relation between prefetching methods we introduce a consistent naming convention. Each name is a X/Y pair. –X is the key used for localizing the address stream. –Y is the method for detecting address patterns.
25
February 200425/19 Prefetch Taxonomy We study two localizing methods –No localization or global (G) –Program Counter (PC) And three pattern detection methods –Address Correlation –Delta Correlation –Constant Stride
26
February 200426/19 Prefetch Taxonomy Markov Prefetching - G/AC Distance Prefetching - G/DC Stride Prefetching - PC/CS
27
February 200427/19 Stride Prefetching Table tracks the local history of loads. If a constant stride is detected in a load’s local history, then n + s, n + 2s, …, n + ds are prefetched. – n is the current target address – s is the detected stride – d is the prefetch degree or aggressiveness of the prefetching.
28
February 200428/19 Stride Prefetching TagLast AddressStrideState Reference Prediction Table PC of Load Target Address sub add Prefetch Address
29
February 200429/19 GHB – Stride Prefetching GHB-Stride uses the PC to access the index table. The linked lists contain the local history of each load. Compare the last two local strides. If the same then prefetch n + s, n + 2s, …, n + ds. Global History Buffer miss addresspointer Index Table head pointer A B C A B C B 1 C 1 PC =?
30
February 200430/19 GHB – Local Delta Correlation Form delta correlations within each load’s local history. For example, consider the local miss address stream: Addresses012646566128129 Deltas 116211 1 CorrelationPrefetch Predictions (1,1)6211 (1,62)1162 (62, 1)1621
31
February 200431/19
32
February 200432/19
33
February 200433/19
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.