Uppsala Architecture Research Team But... That’s three wishes in one!!! Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

Cache and Virtual Memory Replacement Algorithms
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
Embedded Computer Architecture 5KK73 TU/e Henk Corporaal Bart Mesman Data Memory Management Part d: Data Layout for Caches.
CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
By: Aidahani Binti Ahmad
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Computer Architecture Lecture 26 Fasih ur Rehman.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 3.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,
CSCI206 - Computer Organization & Programming
CS161 – Design and Architecture of Computer
COSC3330 Computer Architecture
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
Replacement Policy Replacement policy:
CSC 4250 Computer Architectures
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
The Hardware/Software Interface CSE351 Winter 2013
Memory Hierarchy Virtual Memory, Address Translation
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers
Lecture: Cache Hierarchies
Lecture 21: Memory Hierarchy
Lecture 23: Cache, Memory, Virtual Memory
Chapter 5 Memory CSE 820.
Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )
CPE 631 Lecture 05: Cache Design
Module IV Memory Organization.
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
CSC3050 – Computer Architecture
Cache - Optimization.
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Uppsala Architecture Research Team But... That’s three wishes in one!!! Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption that is easy to implement!

Uppsala Architecture Research Team Refinement and Evaluation of the Elbow Cache or The Little Cache that could Mathias Spjuth

Uppsala Architecture Research TeamCache Address Space B H C Memory References: A Memory References: A-B Memory References: A-B-C 2-way Set Associative Cache Memory References: A D F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H C G{{ { { Sets

Uppsala Architecture Research Team Conflicts (cont.) Traditional way of reducing conflicts is to use set associative caches. ++ Lower miss rate (than direct-mapped) -- Slower access -- Slower access -- More complexity (uses more chip-area) -- More complexity (uses more chip-area) -- Higher power consumption -- Higher power consumption

Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: A F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H C G Cache Bank 2 2-waySkewedAssociativeCache D

Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: A F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H C G Cache Bank 2 2-waySkewedAssociativeCache D H No Conflicts!

Uppsala Architecture Research Team Skewed associative caches Uses different hashing (skewing) functions for indexing each cache bank ++ Lower missrate (than set-assoc.) ++ More predictable -- Slightly slower (hashing) -- Slightly slower (hashing) -- ”Cannot” use LRU replacement -- ”Cannot” use LRU replacement -- ”Cannot” use VI-PT -- ”Cannot” use VI-PT

Uppsala Architecture Research Team Elbow Cache Improve the performance of a skewed associative cache by reallocating blocks within the cache. By doing so we get a broader choice of which block to choose as the victim. Use timestamps as replacement metric.

Uppsala Architecture Research Team Finding the victim Two methods: 1. Look-ahead Consider all possible placements before the first reallocation is made. 2. Feedback Only consider the immediate placements, then iterate.

Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H-X C G Cache Bank 2 2-wayElbowLookaheadCache D X A Replacement paths: F-B-AE-D-H X

Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: F H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H-X C G Cache Bank 2 2-wayElbowFeedbackCache X A X Temp. Register E D

Uppsala Architecture Research Team Finding the victim (cont.) Look-ahead: ++ Most optimal -- Difficult to implement (>1 transformation) -- Difficult to implement (>1 transformation)Feedback: ++ Easy to implement (feed victim back to write buffer) -- Needs extra space in the write buffer -- Needs extra space in the write buffer

Uppsala Architecture Research Team Replacement Metrics Enhanced-Not-Recently-Used (NRUE): The best policy for skewed caches known so far. Each block contains two extra bits, a recently- used and very-recently-used bit, that are set on access to the block. These bits are regularly cleared. The very- recently-used bit is cleared more often. First, try to find a victim with no bit set. Then one with only the recently-used bit set. Then use random replacement.

Uppsala Architecture Research Team Timestamps A TATA B TBTB T curr Data Timestamp Counter Increase counter on every cache allocation Dist(A)= T max – T curr + T A if T curr < T A T curr – T A if T curr >= T A {

Uppsala Architecture Research Team Timestamps Timestamp[ticks] T max 0 T curr TATATATA TBTBTBTB Dist(A) > Dist(B); A older than B TATATATA Dist(A) < Dist(B); B older than A

Uppsala Architecture Research Team Implementation Lookahead: At most one transformation (4 possible victims) each replacement. At most one transformation (4 possible victims) each replacement. Do the transformation and load the new data at the same time. Do the transformation and load the new data at the same time.

Uppsala Architecture Research Team Implementation Feedback: Up to 7 transformations (max. 8 possible victims) each replacement. Up to 7 transformations (max. 8 possible victims) each replacement. Temporary victims are moved to the write buffer, before reallocation. Temporary victims are moved to the write buffer, before reallocation. Extra control field in write buffer. Extra control field in write buffer.

Uppsala Architecture Research Team Feedback N 2:1 2:2 Y X X Bank IBank II Write Buffer X id 1 X id 2 b Step Data+Tag TmSt Data+Tag TmSt B TmSt A TmSt ≥1 Write Read C TmSt v b s write mem read mem i j k & Data+Tag TmSt

Uppsala Architecture Research Team Test Configurations Set associative: 2-way, 4-way, 8-way, 16-way Set associative: 2-way, 4-way, 8-way, 16-way Fully associative cache Fully associative cache Skewed associative, LRU Skewed associative, LRU Skewed associative, NRUE Skewed associative, NRUE Skewed associative, 5-bit timestamp Skewed associative, 5-bit timestamp Elbow cache, 1-step lookahead, 5-bit timestamp Elbow cache, 1-step lookahead, 5-bit timestamp Elbow cache, 7-step feedback, 5-bit timestamp Elbow cache, 7-step feedback, 5-bit timestamp

Uppsala Architecture Research Team Test Configurations (2) General configuration: 8 KB, 16 KB, 32 KB cache size 8 KB, 16 KB, 32 KB cache size L1 data cache with 32 byte block size L1 data cache with 32 byte block size Write Back – No Allocate on Write & infinite write buffer (all writes ignored) Write Back – No Allocate on Write & infinite write buffer (all writes ignored) Miss Rate Reduction (MRR): MRR = (MR ref – MR)/MR ref

Uppsala Architecture Research Team

Conclusions I. For a 2-way skewed cache, timestamp replacement gives almost the same performance as LRU. II. Timestamps are useful. III. A 2-way elbow cache has roughly the same performance as an 8-way set associative cache of the same size.

Uppsala Architecture Research Team Conclusions (2) IV. The lookahead design is slightly better than the feedback. V. There are drawbacks with all skewed caches (skewing delays, VI-PT). VI. If the problems can be solved, the elbow cache is a good alternative to set associative caches.

Uppsala Architecture Research Team Future Work Power awareness: How does an elbow cache stand up against traditional set associative caches when power is considered? How does an elbow cache stand up against traditional set associative caches when power consumptions is considered?

Uppsala Architecture Research Team Links UART web:

Uppsala Architecture Research Team ?