Presentation is loading. Please wait.

Presentation is loading. Please wait.

Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

Similar presentations


Presentation on theme: "Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith."— Presentation transcript:

1 Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith

2 February 20042/19 Outline  Motivation  Related Work  Global History Buffer Prefetching  Results  Conclusion

3 February 20043/19 Motivation  D-Cache misses to main memory are of increasing importance –Main memory is getting farther away (in clock cycles) –Many demanding, memory intensive workloads  Computation is inexpensive compared to data accesses –Good opportunity to reevaluate prefetching data structures –Simple computation can supplement table information  We consider prefetches from main memory to lowest level cache (L2 cache in this study)

4 February 20044/19 Markov Prefetching  Markov prefetching forms address correlations –Joseph and Grunwald (ISCA ‘97)  Uses global memory addresses as states in the Markov graph  Correlation Table approximates Markov graph B C B A B C Correlation Table 1st predict.2nd predict. miss address A B C A B C B C... AB C 1.5 Miss Address Stream 1.5 Markov Graph A

5 February 20045/19 Correlation Prefetching  Distance Prefetching forms delta correlations –Kandiraju and Sivasubramaniam (ISCA ‘02)  Delta-based prefetching leads to much smaller table than “classical” Markov Prefetching  Delta-based prefetching can remove compulsory misses Markov Prefetching 1 1 -2 1 1 -1 1 Global Delta Stream Distance Prefetching 27 28 29 27 28 29 28 29 Miss Address Stream 1 1 -2 1 global delta 28 29 2829 27 28 29 1st predict.2nd predict. miss address 1st predict.2nd predict.

6 February 20046/19 Global History Buffer (GHB)  Holds miss address history in FIFO order  Linked lists within GHB connect related addresses –Same static load –Same global miss address –Same global delta Global History Buffer miss addresses Index Table FI Load PC  Linked list walk is short compared with L2 miss latency FO

7 February 20047/19 Miss Address Stream Global History Buffer miss addresspointer Index Table 28 29 head pointer 28 27 27 28 29 27 28 29 28 27 GHB - Example => Current => Prefetches Key 28 29 28 29 Global Miss Address

8 February 20048/19 GHB – Deltas 148 1 8 8 1 4 4 1 8 8 Global Delta Stream Miss Address Stream 27 28 36 44 45 49 53 54 62 70 71 1 18 => Current => Prefetches Key 844 WidthDepthHybrid Markov Graph.3.7 71 + 8 => 79 79 + 8 => 87 Prefetches 71 + 4 => 75 79 + 4 => 79 Prefetches 71 + 8 => 79 71 + 4 => 75 Prefetches

9 February 20049/19 GHB – Hybrid Delta  Width prefetching suffers from poor accuracy and short look- ahead  Depth prefetching has good look-ahead, but may miss prefetch opportunities when a number of “next” addresses have similar probability  The hybrid method combines depth and width

10 February 200410/19 79 + 4 => 79 71 + 4 => 75 Global History Buffer miss addresspointer Index Table head pointer 27 28 36 44 45 49 53 1 GHB - Hybrid Example 1 => Current => Prefetches Key 54 62 70 4 8 8 8 Global Delta 1 8 8 1 4 4 1 8 8 Global Delta Stream Miss Address Stream 27 28 36 44 45 49 53 54 62 70 71 1 8 4 4 8 71 + 8 => 79 79 + 8 => 87 Prefetches

11 February 200411/19 Simulation Methodology  Simulated SPEC CPU2000 benchmarks  Fast forwarded 1 billion instructions and simulated 1 billion instructions  Used peak binaries compiled -O4 optimization  Results include all benchmarks that have at least a 5% IPC improvement with an ideal L2 cache Issue Width4 Instructions Load Store Queue 64 Entries RUU Size128 Entries Level 1 D-Cache16 KB, 2-way Level 1 I-Cache16 KB, 2-way Level 2 Cache512 KB, 4-way Memory Latency140 Cycles

12 February 200412/19 Simulation Methodology  Table walk - one cycle per access  IT size reduces table conflicts  GHB size reflects prefetch history working set  In general, the GHB prefetching requires less history Prefetching MethodTable ConfigurationSize Conventional Distance Prefetching512 Table Entries18 KB GHB Distance Prefetching512 IT Entries & 512 GHB Entries 8 KB

13 February 200413/19 Results  Our results compare: –IPC Improvement (harmonic mean) vs. Prefetch Degree –Increase in Memory Traffic per instruction (arithmetic mean) vs. Prefetch Degree –Prefetch Accuracy – The percent of prefetches that are used by the program

14 February 200414/19 Distance Prefetching (Performance) 5% 15% 25% 35% 124816 Prefetch Degree Table (width) GHB (width) GHB (depth) GHB (hybrid) IPC Improvement

15 February 200415/19 Distance Prefetching (Performance) -10% 10% 30% 50% 70% 90% 110% ammp art wupwise swim lucas mgrid applu galgel apsi mcf twolf vpr parser gap bzip2 hmean Table (width) GHB (width) GHB (depth) GHB (hybrid) IPC Improvement (~300%)

16 February 200416/19 Distance Prefetching (Memory Traffic) 0% 30% 60% 90% 120% 150% 180% 124816 Prefetch Degree Table (width) GHB (width) GHB (depth) GHB (hybrid) Increase in Memory Traffic

17 February 200417/19

18 February 200418/19 Distance Prefetching (Memory Traffic) 0% 30% 60% 90% 120% 150% 180% 124816 Prefetch Degree Table (width) GHB (width) GHB (depth) GHB (hybrid) Increase in Memory Traffic

19 February 200419/19 Conclusions  More complete picture of history –Allows width, depth, and hybrid –Also can improve other prefetching methods (covered in depth in the paper)  Eliminates stale history in a natural way –FIFO discards old history to make room for new history –In a conventional table, old history can remain for a very long time and trigger inaccurate prefetches

20 February 200420/19 Acknowledgements  This research was funded by: –An Intel Undergraduate Research scholarship. –A University of Wisconsin Hilldale Undergraduate Research fellowship. –The National Science Foundation under grants CCR-0311361 and EIA-0071924.

21 February 200421/19 Backup Slides

22 February 200422/19 Prefetching Metrics  Accuracy is the percent of prefetches that are actually used.  Coverage is the percent of memory references prefetched rather than demand fetched.  Timeliness indicates if prefetched data arrives early enough to prevent the processor from stalling.

23 February 200423/19 GHB – Deltas 148 1 8 8 1 4 4 1 8 8 Global Delta Stream Miss Address Stream 27 28 36 44 45 49 53 54 62 70 71 1 18 => Current => Prefetches Key 844 Markov Graph.3.7 11

24 February 200424/19 Prefetch Taxonomy  To simplify the discussion and illustrate the relation between prefetching methods we introduce a consistent naming convention.  Each name is a X/Y pair. –X is the key used for localizing the address stream. –Y is the method for detecting address patterns.

25 February 200425/19 Prefetch Taxonomy  We study two localizing methods –No localization or global (G) –Program Counter (PC)  And three pattern detection methods –Address Correlation –Delta Correlation –Constant Stride

26 February 200426/19 Prefetch Taxonomy  Markov Prefetching - G/AC  Distance Prefetching - G/DC  Stride Prefetching - PC/CS

27 February 200427/19 Stride Prefetching  Table tracks the local history of loads.  If a constant stride is detected in a load’s local history, then n + s, n + 2s, …, n + ds are prefetched. – n is the current target address – s is the detected stride – d is the prefetch degree or aggressiveness of the prefetching.

28 February 200428/19 Stride Prefetching TagLast AddressStrideState Reference Prediction Table PC of Load Target Address sub add Prefetch Address

29 February 200429/19 GHB – Stride Prefetching  GHB-Stride uses the PC to access the index table.  The linked lists contain the local history of each load.  Compare the last two local strides. If the same then prefetch n + s, n + 2s, …, n + ds. Global History Buffer miss addresspointer Index Table head pointer A B C A B C B 1 C 1 PC =?

30 February 200430/19 GHB – Local Delta Correlation  Form delta correlations within each load’s local history.  For example, consider the local miss address stream: Addresses012646566128129 Deltas 116211 1 CorrelationPrefetch Predictions (1,1)6211 (1,62)1162 (62, 1)1621

31 February 200431/19

32 February 200432/19

33 February 200433/19


Download ppt "Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith."

Similar presentations


Ads by Google