Download presentation
Presentation is loading. Please wait.
Published byRudolph McKinney Modified over 9 years ago
1
Energy-Efficient Hardware Data Prefetching Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz
2
CONTENTS Introduction Hardware prefetching Hardware data prefetching methods Performance speedup Energy-aware prefetching techniques PARE Conclusion References
3
Introduction Data prefetching, is the process of fetching data that is needed in the program in advance, before the instruction that requires it is executed. It removes apparent memory latency. Two types Software prefetching Using compiler Hardware prefetching Using additional circuit
4
Hardware prefetching Use additional circuit Prefetch tables are used to store recent load instructions and relations between load instructions. Better performance Energy overhead comes from Energy cost Unnecessary L1 cache lookup
5
Hardware Data Prefetching Methods Sequential prefetching Stride prefetching Pointer prefetching Combined stride and pointer prefetching
6
Sequential Prefetching One block lookahead (OBL) approach Initiate a prefetch for block b+1 when block b is accessed Prefetch_on_miss o Whenever an access for block b results in a cache miss Tagged prefetching Associates a tag bit with every memory block When a block is demand-fetched or a prefetched block is referenced for the first time next block is fetched.
7
Click to edit the outline text format Second Outline Level Third Outline Level Fourth Outline Level Fifth Outline Level Sixth Outline Level Seventh Outline Level Eighth Outline Level Ninth Outline LevelClick to edit Master text styles Second level Third level Fourth level Fifth level OBL Approaches Prefetch-on-miss Tagged prefetch demand-fetched prefetched demand-fetched prefetched demand-fetched prefetched 0 1 demand-fetched prefetched 0 0 1
8
Stride Prefetching Employ special logic to monitor the processor’s address referencing pattern Detect constant stride array references originating from looping structures Compare successive addresses used by load or store instructions
9
Reference Prediction Table (RPT) RPT 64 entries 64 bits Hold most recently used memory instructions Address of the memory instruction Previous address accessed by the instruction Stride value State field
10
Organization of RPT PCeffective address instruction tagprevious addressstridestate - + prefetch address
11
Pointer Prefetching Effective for pointer_intensive programs No constant stride Dependence_based prefetching Detect dependence relationship Use two hardware tables Correlation table(CT) Storing dependence information Potential Producer Window(PPT) Records the most recent loaded values and the corresponding instructions
12
Combined Stride And Pointer Prefetching Objective to evaluate a technique that would work for all types of memory access patterns Use both array and pointer Better performance All three tables (RPT, PPW, CT)
13
Performance Speedup Combined (stride+dep) technique has the best speedup for most benchmarks.
14
Energy-aware Prefetching Architecture Prefetching Filtering Buffer (PFB)......... L1 D-cache Stride Prefetcher Pointer Prefetcher Stride Counter LDQ RA RB OFFSET Hints Prefetches Tag-array Data-array Prefetch from L2 Cache Regular Cache Access Filtered Compiler-Based Selective Filtering Compiler-Assisted Adaptive Prefetching Prefetch Filtering using Stride Counter Hardware Filtering using PFB
15
Energy-aware Prefetching Technique Compiler-Based Selective Filtering (CBSF) Only searching the prefetch hardware tables Compiler-Assisted Adaptive Prefetching (CAAP) Select different prefetching schemes Compiler-driven Filtering using Stride Counter (SC) Reduce prefetching energy Hardware-based Filtering using PFB (PFB) Reduce L1 cache related energy overhead
16
Compiler-based selective filtering Only searching the prefetch hardware tables for selective memory instructions identified by the compiler Energy reduced by Using loop or recursive type memory access Use only array and linked data structure memory access
17
Compiler-assistive adaptive prefetching Select different prefetching scheme based on Memory access to an array which does not belongs to any larger structure are only fed into the stride prefetcher. Memory access to an array which belongs to a larger structure are fed into both stride and pointer prefetchers. Memory access to a linked data structure with no arrays are only fed into the pointer prefetcher. Memory access to a linked data structure that contains arrays are fed into both prefetchers.
18
Compiler-hinted Filtering Using A Runtime SC Reducing prefetching energy consumption wasted on memory access patterns with very small strides. Small strides are not used Stride can be larger than half the cache line size Each cache line contain Program Counter(PC) Stride counter Counter is used to count how many times the instruction occurs
19
PARE: A Power-aware Prefetch Engine Used for reducing power dissipation Two ways to reduce power Reduces the size of each entry Based on spatial locality of memory accesses Partitions the large table into multiple smaller tables
20
Hardware Prefetch Table
21
Pare Hardware Prefetch Table Break up the whole prefetch table into 16 smaller tables Each table containing 4 entries It also contain a group number Only use lower 16 bit of the PC instead of 32 bits
22
Pare Table Design
23
Advantages Of Pare Hardware Table Power consumption reduced CAM cell power is reduced Small table Reduce total power consumption
24
Conclusion Improve the performance Reduce the energy overhead of hardware data prefetching Reduce total energy consumption compiler-assisted and hardware-based energy- aware techniques and a new power-aware prefetch engine techniques are used.
25
References Yao Guo,”Energy-Efficient Hardware Data Prefetching,” IEEE,vol.19,no.2,Feb.2011 A. J. Smith, “Sequential program prefetching in memory hierarchies,”IEEE Computer, vol. 11, no. 12, pp. 7–21, Dec. 1978. A. Roth, A. Moshovos, and G. S. Sohi, “Dependence based prefetching for linked data structures,” in Proc. ASPLOS-VIII, Oct. 1998, pp.115–126.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.