Presentation is loading. Please wait.

Presentation is loading. Please wait.

Energy-Efficient Hardware Data Prefetching Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz.

Similar presentations


Presentation on theme: "Energy-Efficient Hardware Data Prefetching Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz."— Presentation transcript:

1 Energy-Efficient Hardware Data Prefetching Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz

2 CONTENTS  Introduction  Hardware prefetching  Hardware data prefetching methods  Performance speedup  Energy-aware prefetching techniques  PARE  Conclusion  References

3 Introduction  Data prefetching, is the process of fetching data that is needed in the program in advance, before the instruction that requires it is executed.  It removes apparent memory latency.  Two types  Software prefetching  Using compiler  Hardware prefetching  Using additional circuit

4 Hardware prefetching  Use additional circuit  Prefetch tables are used to store recent load instructions and relations between load instructions.  Better performance  Energy overhead comes from  Energy cost  Unnecessary L1 cache lookup

5 Hardware Data Prefetching Methods  Sequential prefetching  Stride prefetching  Pointer prefetching  Combined stride and pointer prefetching

6 Sequential Prefetching  One block lookahead (OBL) approach  Initiate a prefetch for block b+1 when block b is accessed  Prefetch_on_miss o Whenever an access for block b results in a cache miss  Tagged prefetching Associates a tag bit with every memory block When a block is demand-fetched or a prefetched block is referenced for the first time next block is fetched.

7 Click to edit the outline text format Second Outline Level  Third Outline Level Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline Level  Eighth Outline Level Ninth Outline LevelClick to edit Master text styles Second level Third level Fourth level Fifth level OBL Approaches  Prefetch-on-miss Tagged prefetch demand-fetched prefetched demand-fetched prefetched demand-fetched prefetched 0 1 demand-fetched prefetched 0 0 1

8 Stride Prefetching  Employ special logic to monitor the processor’s address referencing pattern  Detect constant stride array references originating from looping structures  Compare successive addresses used by load or store instructions

9 Reference Prediction Table (RPT)  RPT  64 entries  64 bits  Hold most recently used memory instructions  Address of the memory instruction  Previous address accessed by the instruction  Stride value  State field

10 Organization of RPT PCeffective address instruction tagprevious addressstridestate - + prefetch address

11 Pointer Prefetching  Effective for pointer_intensive programs  No constant stride  Dependence_based prefetching  Detect dependence relationship  Use two hardware tables  Correlation table(CT) Storing dependence information  Potential Producer Window(PPT) Records the most recent loaded values and the corresponding instructions

12 Combined Stride And Pointer Prefetching  Objective to evaluate a technique that would work for all types of memory access patterns  Use both array and pointer  Better performance  All three tables (RPT, PPW, CT)

13 Performance Speedup  Combined (stride+dep) technique has the best speedup for most benchmarks.

14 Energy-aware Prefetching Architecture Prefetching Filtering Buffer (PFB)......... L1 D-cache Stride Prefetcher Pointer Prefetcher Stride Counter LDQ RA RB OFFSET Hints Prefetches Tag-array Data-array Prefetch from L2 Cache Regular Cache Access Filtered Compiler-Based Selective Filtering Compiler-Assisted Adaptive Prefetching Prefetch Filtering using Stride Counter Hardware Filtering using PFB

15 Energy-aware Prefetching Technique  Compiler-Based Selective Filtering (CBSF)  Only searching the prefetch hardware tables  Compiler-Assisted Adaptive Prefetching (CAAP)  Select different prefetching schemes  Compiler-driven Filtering using Stride Counter (SC)  Reduce prefetching energy  Hardware-based Filtering using PFB (PFB)  Reduce L1 cache related energy overhead

16 Compiler-based selective filtering  Only searching the prefetch hardware tables for selective memory instructions identified by the compiler  Energy reduced by  Using loop or recursive type memory access  Use only array and linked data structure memory access

17 Compiler-assistive adaptive prefetching  Select different prefetching scheme based on  Memory access to an array which does not belongs to any larger structure are only fed into the stride prefetcher.  Memory access to an array which belongs to a larger structure are fed into both stride and pointer prefetchers.  Memory access to a linked data structure with no arrays are only fed into the pointer prefetcher.  Memory access to a linked data structure that contains arrays are fed into both prefetchers.

18 Compiler-hinted Filtering Using A Runtime SC  Reducing prefetching energy consumption wasted on memory access patterns with very small strides.  Small strides are not used  Stride can be larger than half the cache line size  Each cache line contain  Program Counter(PC)  Stride counter  Counter is used to count how many times the instruction occurs

19 PARE: A Power-aware Prefetch Engine  Used for reducing power dissipation  Two ways to reduce power  Reduces the size of each entry Based on spatial locality of memory accesses  Partitions the large table into multiple smaller tables

20 Hardware Prefetch Table

21 Pare Hardware Prefetch Table  Break up the whole prefetch table into 16 smaller tables  Each table containing 4 entries  It also contain a group number  Only use lower 16 bit of the PC instead of 32 bits

22 Pare Table Design

23 Advantages Of Pare Hardware Table  Power consumption reduced  CAM cell power is reduced  Small table  Reduce total power consumption

24 Conclusion  Improve the performance  Reduce the energy overhead of hardware data prefetching  Reduce total energy consumption  compiler-assisted and hardware-based energy- aware techniques and a new power-aware prefetch engine techniques are used.

25 References  Yao Guo,”Energy-Efficient Hardware Data Prefetching,” IEEE,vol.19,no.2,Feb.2011  A. J. Smith, “Sequential program prefetching in memory hierarchies,”IEEE Computer, vol. 11, no. 12, pp. 7–21, Dec. 1978.  A. Roth, A. Moshovos, and G. S. Sohi, “Dependence based prefetching for linked data structures,” in Proc. ASPLOS-VIII, Oct. 1998, pp.115–126.


Download ppt "Energy-Efficient Hardware Data Prefetching Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz."

Similar presentations


Ads by Google