MICRO-48, 2015 Computer System Lab, Kim Jeong Won.

Slides:



Advertisements
Similar presentations
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Advertisements

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
CS7810 Prefetching Seth Pugsley. Predicting the Future Where have we seen prediction before? – Does it always work? Prefetching is prediction – Predict.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
1 Lecture 11: Large Cache Design IV Topics: prefetch, dead blocks, cache networks.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.
1 Virtual Memory Sample Questions Project 3 – Build Pthread Lib Extra Office Hour: Wed 4pm-5pm HILL 367 Recitation 6.
Dynamic Branch Prediction
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Predictor-Directed Stream Buffers Timothy Sherwood Suleyman Sair Brad Calder.
Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.
Virtual Memory Main Memory Magnetic Disk Upper level Lower level.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
CMPE 421 Parallel Computer Architecture
Lecture 19: Virtual Memory
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Virtual Memory. DRAM as cache What about programs larger than DRAM? When we run multiple programs, all must fit in DRAM! Add another larger, slower level.
Virtual Memory 1 1.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
CS203 – Advanced Computer Architecture Virtual Memory.
CS 162 Discussion Section Week 6. Administrivia Project 2 Deadlines – Initial Design Due: 3/1 – Review Due: 3/5 – Code Due: 3/15.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual Memory Chapter 7.4.
Module 11: File Structure
ECE232: Hardware Organization and Design
Lecture: Large Caches, Virtual Memory
CS161 – Design and Architecture of Computer
Dynamic Branch Prediction
Basic Performance Parameters in Computer Architecture:
Lecture: Cache Hierarchies
18742 Parallel Computer Architecture Caching in Multi-core Systems
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
CS510 Operating System Foundations
Lecture: Cache Hierarchies
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Lecture 21: Memory Hierarchy
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Computer Architecture
Lecture: Cache Innovations, Virtual Memory
CARP: Compression-Aware Replacement Policies
CSE 351: The Hardware/Software Interface
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Lecture 20: OOO, Memory Hierarchy
Contents Memory types & memory hierarchy Virtual memory (VM)
CS 3410, Spring 2014 Computer Science Cornell University
CSC3050 – Computer Architecture
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Paging and Segmentation
CS703 - Advanced Operating Systems
Cache Memory and Performance
Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A
Virtual Memory 1 1.
Presentation transcript:

MICRO-48, 2015 Computer System Lab, Kim Jeong Won

Motivation  First, in most high volume CPU designs, the program counter (PC) is unavailable at this level in the cache hierarchy  Second, a prefetcher located at the last level cache must deal with physical addresses directly without the benefit of a TLB or other page table information

Idea · Addresses patterns in page → (A, A-24, A+1, A-23, A+2, A-22, A+3) · Extracted delta patterns → (-24, +25) · Five common delta sequences found in LBM

Proposal_ Variable Length Delta Prefetcher (VLDP) A key innovation of VLDP → T he use of multiple DPT tables features of VLDP · enables the prediction of complex multi-delta access patterns · works on a per-page basis, and it can prefetch a di ff erent complex pattern for each page · uses multiple global prediction tables that can learn common access patterns across many pages · these prediction tables are indexed by varying lengths of delta histories

Proposal_ Delta History Bu ff er (DHB) · Page Num. - page number · Last Add. - page o ff set of the last address accessed in this page · Last 4 Deltas - sequence of up to 4 recently observed deltas · Last Predictor - the DPT level used for the latest delta prediction · Num. Times Used - the number of times this page has been used · Last Four Prefetched Offsets - sequence of up to 4 recently prefetched o ff sets The Delta History Bu ff er (DHB) tracks delta histories for recently accessed pages These histories, in turn, are used to lookup the DPT and predict future memory requests

Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 124

Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 154

Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number DHB hit(after the DHB entry has been updated with the most recent delta) 1.The newly updated delta history is used to index the DPT 2.The DHB entry stores the ID of the DPT

Proposal_ O ff set Prediction Table (OPT) O ff set Prediction Table OffsetDelta predictionAccuracy 1b · Offset – page offset · Delta prediction – predicted delta for second page access · Accuracy – 1-bit accuracy field OPT prediction = delta:1 → 1 OPT prediction = delta: 0 → 1 OPT prediction ≠ delta: 1 → 0 OPT prediction ≠ delta: *0 → 0 if the accuracy bit was already 0, the old predicted delta is replaced with the new observed delta 10 match not match *not match

Proposal_ Delta Prediction Table (DPT) A key feature of the DPT → it is not just a single table, but rather a set of cascaded tables · Deltas - delta history(obtained from the DHB) used as the keys · Pred - delta predictions used as the values · Accuracy - 2-bit accuracy counter · nMRU - 1-bit nMRU value

Proposal_ Delta Prediction Table (DPT) DPT updated by PAE · any new delta patterns will be allocated in the DPT · accuracy bits can be updated · if the prediction accuracy is su ffi ciently low, the delta prediction field may be updated to reflect the new delta

Proposal_ Multi-Degree Prefetch

Result_ Simulator Parameters

Result_ Performance Evaluation · 17.2% better than FDP · 8.5% better than SBP · 5.8% better than AMPM

Result_ Comparing VLDP to Prefetchers that use the Program Counter · VLDP has an accuracy of 61% · GHB has an accuracy of 33% · 7.1% better than GHB PC/DC · 7.6% better than SMS

Result_ Cache Misses and Prefetcher Coverage

Result_ Prefetcher Accuracy and DRAM accesses DRAM accesses · FDP has 3.7% · SMS has 60.5% · SBP has 22.6% · GHB has 5.4% · AMPM has 13.4% · VLDP has 17.2%

Result_ Sensitivity Analysis