1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline  Introduction  Motivation  The B-Cache Organization  Experimental.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Modified from notes by Saeid Nooshabadi
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering.
Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Memory Hierarchy 2.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.
Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
CMSC 611: Advanced Computer Architecture
Two Dimensional Highly Associative Level-Two Cache Design
Memory COMPUTER ARCHITECTURE
Adaptive Cache Partitioning on a Composite Core
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Part V Memory System Design
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Using Dead Blocks as a Virtual Victim Cache
Chapter 5 Memory CSE 820.
Lecture 08: Memory Hierarchy Cache Performance
Overheads for Computers as Components 2nd ed.
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng

2 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

3 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

4 Background  Bottleneck to achieve high performance  Increasing gap between memory latency and processor speed  Multilevel memory hierarchy  Cache acts as intermediary between the super fast processor and the much slower main memory.  Two cache mapping schemes  Direct-Mapped Cache:  Set-Associative Cache:

5 Comparision Direct-Mapped Cache faster access time consumes less power per access consumes less area easy to implement simple to design higher miss rate Set-Associative Cache longer access time consumes more power per access consumes more area reduces conflict misses has a replacement policy Desirable cache : access time of direct-mapped cache + low miss rate of set-associative cache.

6 What is B-Cache? Balanced Cache (B-Cache): A mechanism to provide the benefit of cache block replacement while maintaining the constant access time of a direct-mapped cache

7 New features of B-Cache  Decoder length of direct-mapped cache is increased by 3 bits: accesses to heavily used sets can be reduced to 1/8th of original design.  A replacement policy is added.  A programmable decoder is used.

8 The problem (an Example) 8-bit adresses 0,1,8,9... 0,1,8,9

9 8-bit address same as in 2-way cache X : invalid PD entry B-Cache solution

10 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

11 Terminology r Memory address mapping factor (MF): r B-Cache associativity (BAS): PI : index length of PD NPI : index length of NPD OI : index length of original direct-mapped cache MF = 2 (PI+NPI) /2 OI, where MF≥1 BAS = 2 OI /2 NPI, where BAS≥1

12 B-Cache organization MF = 2 (PI+NPI) /2 OI =2 (6+6) /2 9 =8BAS = 2 (OI) /2 NPI =2 (3) /2 6 =8

13 Replacement policy  Random Policy:  Simple to design and needs very few extra hardware.  Least Recently Used(LRU):  Better hit rate but more area overhead

14 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

15 Experimental Methodology  Primary metric: miss rate  Other metrics: Latency,Storage,Power Costs, Overall Performance, Overall Energy  Baseline: level one cache (direct-mapped 16kB cache with 32 bytes line size for instruction and data caches)  26 SPEC2K benchmarks run using the SimpleScalar tool set

16 Data miss-rate reductions 16 entry victim buffer set-associative caches B-Caches with dif. MFs

17 Latency

18 Storage Overhead r Additional hardware for the B-Cache is the CAM based PD. r 4.3% higher than baseline

19 Power Overhead r Extra power consumption: PD of each subarray. r Power reduction:  3-bit data length reduction  Removal of 3 input NAND gates r 10.5% higher than baseline

20 Overall Performance  Outperforms baseline by average of 5.9%.  Only 0.3% less than 8-way cache but 3.7% higher than victim buffer.

21 Overall Energy r B-Cache consumes least energy ( 2% less than the baseline ) r B-Cache reduces miss rate and hence accesses to 2nd level cache, which is more power costly. r When cache miss, B-Cache also reduces cache memory accesses through miss prediction of PD, which makes power overhead much less.

22 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

23 Related works  Reducing Miss Rate of Direct Mapped Caches  Page allocation  Column associative cache  Adaptive group associative cache  Skewed associative cache r Reducing Access Time of Set-associative Caches  Partial address matcing : predicting hit way  Difference bit cache

24 Compared with previous tech B-cache r Applied to both high performance and low- power embedded systems r Balanced without software intervention r Feasible and easy to implement

25 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

26 Conclusion r B-Cache allows accesses to cache sets to be balanced by increasing the decoder length and incorporating a replacement policy to a direct-mapped cache design. r Programmable decoders dynamically determine which memory address has a mapping to cache set r A 16kB level one B-Cache outperforms direct-mapped cache by 64.5% and 37.8% miss rate reductions for instruction and data cache, respectively r Average IPC improvement: 5.9% r Energy reduction: 2%. r Access time: same as direct mapped cache

27 Thanks!