Dynamic Associative Caches:

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors Chinnakrishnan S. Ballapuram Ahmad Sharif Hsien-Hsin S.
How caches take advantage of Temporal locality
Memory Problems Prof. Sin-Min Lee Department of Mathematics and Computer Sciences.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.
Increasing Cache Efficiency by Eliminating Noise Prateek Pujara & Aneesh Aggarwal {prateek,
Chapter 91 Logical Address in Paging  Page size always chosen as a power of 2.  Example: if 16 bit addresses are used and page size = 1K, we need 10.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part.
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Two Dimensional Highly Associative Level-Two Cache Design
Chapter 1 Computer System Overview
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
SECTIONS 1-7 By Astha Chawla
How will execution time grow with SIZE?
Copyright © 2011, Elsevier Inc. All rights Reserved.
Memory Hierarchy Virtual Memory, Address Translation
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
CS510 Operating System Foundations
ECE 445 – Computer Organization
Energy-Efficient Address Translation
Lecture: SMT, Cache Hierarchies
CSCI206 - Computer Organization & Programming
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Part V Memory System Design
CMSC 611: Advanced Computer Architecture
Reducing Memory Reference Energy with Opportunistic Virtual Caching
CSCI206 - Computer Organization & Programming
Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )
Lecture: SMT, Cache Hierarchies
Address-Value Delta (AVD) Prediction
Lecture 22: Cache Hierarchies, Memory
Module IV Memory Organization.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Today’s agenda Hardware architecture and runtime system
Virtual Memory Hardware
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Lecture: SMT, Cache Hierarchies
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecture: SMT, Cache Hierarchies
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Translation Buffers (TLBs)
Virtual Memory.
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Review What are the advantages/disadvantages of pages versus segments?
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Presentation transcript:

Dynamic Associative Caches: Reducing Dynamic Energy of First Level Caches Karthikeyan Dayalan, Meltem Ozsoy and Dmitry Ponomarev Department of Computer Science State University of New York at Binghamton Presented at the 32nd IEEE International Conference on Computer Design (ICCD), October 19-22, 2014

Direct-Mapped Cache Direct indexing, only one cache way is checked Tag index Byte Offset 01010111101100000 000000011 101001 Tag index Byte Offset 01010111101100000 000000011 101001 Tag Index Byte Offset 01010111101100000 000000011 101001 Direct indexing, only one cache way is checked Can have high miss rates TAG DATA TAG DATA == == HIT / MISS ICCD 2014

Set-Associative Cache Tag Index Byte Offset 0101011110110000001 000011 101001 Tag Index Byte Offset 0101011110110000001 000011 101001 Tag Index Byte Offset 0101011110110000001 000011 101001 Indexing into a set, all ways in a set are checked, at most one has the data Energy-Inefficient TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA ENERGY WASTAGE == == == == == == == == HIT / MISS ICCD 2014 3

Dynamic-Associative Cache (DAC) Key idea: Dynamically change cache access mode between Direct-Mapped and Set-Associative Part of the index is used to directly determine the way to access in Direct-Mapped mode Shadow tags are used to keep track of performance in each mode and switch as appropriate Need to invalidate cache contents when switching from Set-Associative to Direct-Mapped ICCD 2014

DAC Operation Set-Associative Mode: Direct-Mapped Mode: Exactly the same as traditional cache. Direct-Mapped Mode: Use least significant bits of the tag to select the way to be accessed. Imagine stacking the ways of a Set-Associative cache on top of each other to form Direct-Mapped cache. ICCD 2014

Direct-Mapped Access in DAC Tag Index Byte Offset 01010111101100000 00 000010 101001 Tag Index Byte Offset 01010111101100000 00 000010 101001 Tag Index Byte Offset 01010111101100000 00 000010 101001 Tag Index Byte Offset 01010111101100000 00 000010 101001 WAY SELECTION LOGIC TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA ENERGY SAVINGS == == == == == HIT / MISS ICCD 2014

Direct-Mapped Access in DAC Tag Index Byte Offset 01010111101100000 11 000010 101001 Tag Index Byte Offset 01010111101100000 11 000010 101001 Tag Index Byte Offset 01010111101100000 11 000010 101001 Tag Index Byte Offset 01010111101100000 11 000010 101001 WAY SELECTION LOGIC TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA ENERGY SAVINGS == == == == == HIT / MISS ICCD 2014

When to Switch Modes: Shadow Tags Index Byte Offset Shadow tags track hypothetical cache performance in the other mode To reduce complexity, only a few sets are shadowed WAY SELECTION LOGIC TAG TAG TAG TAG == == == == ICCD 2014

DAC Access in Direct-Mapped Mode Tag Index Byte Offset 01010111101100000 11 000001 101001 Tag Index Byte Offset 01010111101100000 11 000001 101001 Tag Index Byte Offset 01010111101100000 11 000001 101001 Tag Index Byte Offset 01010111101100000 11 000001 101001 WAY SELECTION LOGIC WAY SELECTION LOGIC TAG TAG TAG TAG TAG TAG TAG TAG TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA SHADOW TAGS CACHE Once the counter reaches its threshold value, mode transition happens == == == == == == == == == == == == == MISS HIT COUNTER-- COUNTER++ HIT MISS ICCD 2014

DAC Access in Set-Associative Mode Tag Index Byte Offset 01010111101100000 11 000001 101001 Tag Index Byte Offset 01010111101100000 11 000001 101001 Tag Index Byte Offset 01010111101100000 11 000001 101001 Tag Index Byte Offset 01010111101100000 11 000001 101001 WAY SELECTION LOGIC WAY SELECTION LOGIC TAG TAG TAG TAG TAG TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA SHADOW TAGS CACHE == == == == == == == == == == == == == HIT MISS COUNTER++ COUNTER-- MISS HIT ICCD 2014

DAC Design Variations DAC Budget: Only Direct-Mapped to Set-Associative mode transition is supported. Cache gets reset to Direct-Mapped mode on context switches DAC Deluxe: Both transitions are supported ICCD 2014

DAC Mode Transition Start in Direct-Mapped mode. Keep track of the difference between the number of misses in both modes (using shadow tags to track the other mode). Periodically compare the counter value against a threshold. If exceeded, trigger mode transition. If not, reset counter to zero. ICCD 2014

DAC Mode Transition Context Switch: SWITCH = 0 Direct-Mapped MISS HIT Cache access Shadow Hit Shadow Miss SWITCH++ SWITCH-- SWITCH > Threshold No Yes Reset Period: SWITCH = 0 Set-Associative ICCD 2014

Line Invalidations on SA->DM Transition Tag Index Byte Offset 01010111101100000 00 000010 101001 Tag Index Byte Offset 01010111101100000 00 000010 101001 Tag Index Byte Offset 01010111101100000 00 000010 101001 Tag Index Byte Offset 01010111101100000 00 000010 101001 Now Mode transition triggered from Set-Associative to Direct-Mapped mode Data was placed in Set-Associative mode WAY SELECTION LOGIC TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA TAG DATA Duplicate Entries == == == == == MISS ICCD 2014

Simulation Methodology Parameter Configuration Machine Width 8-wide fetch, issue and commit Window Size 128- entry ROB, 48-entry LSQ and Issue Queue Physical Registers 128 Integer + 128 FP Physical Registers L1 I-Cache 32 KB, 4-way set-associative, 64 byte line, 1 cycle hit time L1 D-Cache L2 Unified Cache 512 KB, 8-way set-associative, 128 byte line, 10 cycle hit time Memory latency 300 cycles ICCD 2014

CACTI Parameters Direct-Mapped (32 KB) 4 way Set-Associative DAC   Direct-Mapped (32 KB) 4 way Set-Associative DAC (in DM mode) Energy/access 0.16618(nJ) 0.39985(nJ) 0.15457(nJ) Leakage/bank 28.4582(mW) 64.5997(mW) 19.7505(mW) Ndwl/Ntwl 4/2 8/2 Ndbl/Ntbl 2/2 ICCD 2014

Impact on IPC The performance loss is less than 2% and DAC Budget stays in Set-Associative mode for long time. ICCD 2014

Impact on Cache Misses DAC covers about half of the MPKI gap between the Direct-Mapped and Set-Associative caches. ICCD 2014

DAC Impact on Energy Consumption DAC saves 80% of the energy compared to Set-Associative caches. ICCD 2014

Percentage of Accesses in Direct-Mapped Mode DAC Deluxe spends more time in Set-Associative mode, as expected. ICCD 2014

Conclusions It is possible to dynamically change cache associativity an obtain performance advantages of Set-Associative caches with energy- consumption of Direct-Mapped caches DAC saves 80% of the dynamic energy in L1 cache with a performance loss of less than 2%. DAC can be implemented using simple control logic, and a few extra tags to control the switching between the operating modes. ICCD 2014

THANK YOU !! QUESTIONS ?? ICCD 2014

Backup Slide: Handling Synonyms If OS ensures that the least significant bit do not change during address translation, synonym won’t occur. Another way is to check all the tags of the set. This will consume more power, but very small as most of the power is consumed by data arrays. ICCD 2014