MICRO-48, 2015 Computer System Lab, Kim Jeong Won.

Slides:

Advertisements

Similar presentations

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.

Advertisements

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

CS7810 Prefetching Seth Pugsley. Predicting the Future Where have we seen prediction before? – Does it always work? Prefetching is prediction – Predict.

CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and

Virtual Memory Hardware Support

1 Lecture 11: Large Cache Design IV Topics: prefetch, dead blocks, cache networks.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.

Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.

Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.

Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.

1 Virtual Memory Sample Questions Project 3 – Build Pthread Lib Extra Office Hour: Wed 4pm-5pm HILL 367 Recitation 6.

Dynamic Branch Prediction

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Predictor-Directed Stream Buffers Timothy Sherwood Suleyman Sair Brad Calder.

Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.

Virtual Memory Main Memory Magnetic Disk Upper level Lower level.

Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory

CMPE 421 Parallel Computer Architecture

Lecture 19: Virtual Memory

Lecture 9: Memory Hierarchy Virtual Memory Kai Bu

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

Virtual Memory. DRAM as cache What about programs larger than DRAM? When we run multiple programs, all must fit in DRAM! Add another larger, slower level.

Virtual Memory 1 1.

Computer Organization CS224 Fall 2012 Lessons 45 & 46.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.

Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.

CS203 – Advanced Computer Architecture Virtual Memory.

CS 162 Discussion Section Week 6. Administrivia Project 2 Deadlines – Initial Design Due: 3/1 – Review Due: 3/5 – Code Due: 3/15.

Virtual Memory Chapter 8.

CS161 – Design and Architecture of Computer

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Virtual Memory Chapter 7.4.

Module 11: File Structure

ECE232: Hardware Organization and Design

Lecture: Large Caches, Virtual Memory

CS161 – Design and Architecture of Computer

Dynamic Branch Prediction

Basic Performance Parameters in Computer Architecture:

Lecture: Cache Hierarchies

18742 Parallel Computer Architecture Caching in Multi-core Systems

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

CS510 Operating System Foundations

Lecture: Cache Hierarchies

CMSC 611: Advanced Computer Architecture

Exploring Value Prediction with the EVES predictor

Lecture 21: Memory Hierarchy

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

TLC: A Tag-less Cache for reducing dynamic first level Cache Energy

Computer Architecture

Lecture: Cache Innovations, Virtual Memory

CARP: Compression-Aware Replacement Policies

CSE 351: The Hardware/Software Interface

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Lecture 20: OOO, Memory Hierarchy

Contents Memory types & memory hierarchy Virtual memory (VM)

CS 3410, Spring 2014 Computer Science Cornell University

CSC3050 – Computer Architecture

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Paging and Segmentation

CS703 - Advanced Operating Systems

Cache Memory and Performance

Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A

Virtual Memory 1 1.

Presentation transcript:

MICRO-48, 2015 Computer System Lab, Kim Jeong Won

Motivation  First, in most high volume CPU designs, the program counter (PC) is unavailable at this level in the cache hierarchy  Second, a prefetcher located at the last level cache must deal with physical addresses directly without the beneﬁt of a TLB or other page table information

Idea · Addresses patterns in page → (A, A-24, A+1, A-23, A+2, A-22, A+3) · Extracted delta patterns → (-24, +25) · Five common delta sequences found in LBM

Proposal_ Variable Length Delta Prefetcher (VLDP) A key innovation of VLDP → T he use of multiple DPT tables features of VLDP · enables the prediction of complex multi-delta access patterns · works on a per-page basis, and it can prefetch a di ﬀ erent complex pattern for each page · uses multiple global prediction tables that can learn common access patterns across many pages · these prediction tables are indexed by varying lengths of delta histories

Proposal_ Delta History Bu ff er (DHB) · Page Num. - page number · Last Add. - page o ff set of the last address accessed in this page · Last 4 Deltas - sequence of up to 4 recently observed deltas · Last Predictor - the DPT level used for the latest delta prediction · Num. Times Used - the number of times this page has been used · Last Four Prefetched Offsets - sequence of up to 4 recently prefetched o ff sets The Delta History Bu ff er (DHB) tracks delta histories for recently accessed pages These histories, in turn, are used to lookup the DPT and predict future memory requests

Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 124

Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 154

Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to ﬁnd an entry with a matching page number DHB hit(after the DHB entry has been updated with the most recent delta) 1.The newly updated delta history is used to index the DPT 2.The DHB entry stores the ID of the DPT

Proposal_ O ﬀ set Prediction Table (OPT) O ﬀ set Prediction Table OffsetDelta predictionAccuracy 1b · Offset – page offset · Delta prediction – predicted delta for second page access · Accuracy – 1-bit accuracy field OPT prediction = delta:1 → 1 OPT prediction = delta: 0 → 1 OPT prediction ≠ delta: 1 → 0 OPT prediction ≠ delta: *0 → 0 if the accuracy bit was already 0, the old predicted delta is replaced with the new observed delta 10 match not match *not match

Proposal_ Delta Prediction Table (DPT) A key feature of the DPT → it is not just a single table, but rather a set of cascaded tables · Deltas - delta history(obtained from the DHB) used as the keys · Pred - delta predictions used as the values · Accuracy - 2-bit accuracy counter · nMRU - 1-bit nMRU value

Proposal_ Delta Prediction Table (DPT) DPT updated by PAE · any new delta patterns will be allocated in the DPT · accuracy bits can be updated · if the prediction accuracy is su ffi ciently low, the delta prediction field may be updated to reflect the new delta

Proposal_ Multi-Degree Prefetch

Result_ Simulator Parameters

Result_ Performance Evaluation · 17.2% better than FDP · 8.5% better than SBP · 5.8% better than AMPM

Result_ Comparing VLDP to Prefetchers that use the Program Counter · VLDP has an accuracy of 61% · GHB has an accuracy of 33% · 7.1% better than GHB PC/DC · 7.6% better than SMS

Result_ Cache Misses and Prefetcher Coverage

Result_ Prefetcher Accuracy and DRAM accesses DRAM accesses · FDP has 3.7% · SMS has 60.5% · SBP has 22.6% · GHB has 5.4% · AMPM has 13.4% · VLDP has 17.2%

Result_ Sensitivity Analysis