Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group.

Slides:

Advertisements

Similar presentations

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Advertisements

Cosc 3P92 Week 9 Lecture slides

Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.

Memory Address Decoding

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.

5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)

COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.

Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.

How caches take advantage of Temporal locality

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 15 Instructor: L.N. Bhuyan

Highly-Associative Caches for Low-Power Processors Michael Zhang Krste Asanovic

Caches The principle that states that if data is used, its neighbor will likely be used soon.

Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.

EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.

University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.

Cache Organization of Pentium

COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Memory interface Memory is a device to store data

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

CPEN Digital System Design

Digital Logic Design Instructor: Kasım Sinan YILDIRIM

CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.

CSCI 232© 2005 JW Ryder1 Cache Memory Organization Direct Mapping Fully Associative Set Associative (very popular) Sector Mapping.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

Microprocessor Microprocessor (cont..) It is a 16 bit μp has a 20 bit address bus can access upto 220 memory locations ( 1 MB). It can support.

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.

Lecture 20 Last lecture: Today’s lecture: Types of memory

1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.

Memory Hierarchy and Cache Design (4). Reducing Hit Time 1. Small and Simple Caches 2. Avoiding Address Translation During Indexing of the Cache –Using.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Advanced Caches Smruti R. Sarangi.

DREAM TEAM 2 Roto, Holiano, Chaka

Class Exercise 1B.

Cache Organization of Pentium

CAM Content Addressable Memory

Cache Memory Presentation I

The Main Memory system: DRAM organization

CSCI206 - Computer Organization & Programming

Lecture: DRAM Main Memory

Module IV Memory Organization.

Lecture: DRAM Main Memory

Interconnect with Cache Coherency Manager

Lecture 22: Cache Hierarchies, Memory

Direct Mapping.

Module IV Memory Organization.

Chapter 6 Memory System Design

Chap. 12 Memory Organization

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

Overview Problem Solution CPU vs Memory performance imbalance

Spring 2019 Prof. Eric Rotenberg

Presentation transcript:

Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Contents Cache Logical View Physical View Case Study – Power 4 L2 Cache Cache Logical View Physical View Case Study – Power 4 L2 Cache

Logical Cache Structure n-way associative cache n-elements per set 2 m Sets TagIndex Address (32 bits) = = Data Hit m 32 – m - k … Offset k or

Cache Structure

Cache Access Steps 1. Decode address 2. Enable the word line 3. Raise the bit lines to high 4. Get the tag value from the tag array 5. Check for tag match 6. Select data output Steps 1. Decode address 2. Enable the word line 3. Raise the bit lines to high 4. Get the tag value from the tag array 5. Check for tag match 6. Select data output

Conventional Cache Organization Memory Cell

Read: Set bit and bit´ high If the value in the cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged Write: Set bit´ to 0. This forces 1 in the latch.

Decoder with Driver

Various Components Comparator is xor logic Multiplexer hierarchy for offset. First get block (from output drive), then word, then byte Output Driver Maximum of one input bits high If input 0, then high resistant output … I0I1I7

Banking Idea: Support Multiple Cache Accesses Solution: Use multiporting on bit cells (Cost is big) Divide the cache into independent banks

Cache Search Steps: 1. Find Bank (bank index) 2. Find Set in Bank (index) 3. Check if data is valid and in the cache (tag match) 4. If all ok return data (block and byte offset), else check lower level memory Steps: 1. Find Bank (bank index) 2. Find Set in Bank (index) 3. Check if data is valid and in the cache (tag match) 4. If all ok return data (block and byte offset), else check lower level memory

Case Study - Power 4 Dual Core 64-bit Processors 32KB L1 D-Cache (Per Processor) 2-way associative 128 Bytes Line 64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4 sectors x 32B) ~1.5MB L2 Cache 8-way set associative 128 Bytes line

Power4 Floorplan

Power4 L2 Logical View Cache Split into 3 Parts, 0.5Mb each Control by 4 Coherency Processors 1 64B Store Queue per Processor

Power4 L2U ~512 KB 8 Banks 128 B block size 8-way associative Word lines Bit lines Decoders Address Bus

Power4 L2 Cache Block Size C = 512 KB = 2 19 B Block Size = 128 B = 2 7 B 8-way associative 8 Banks per Cache Block Therefore: Set Size is 2 3 *2 7 B= 2 10 B Sets in Cache are 2 19 /2 10 = 2 9 sets Sets per Bank are 2 9 / 2 3 = 2 6 sets L2 Cache Block Size C = 512 KB = 2 19 B Block Size = 128 B = 2 7 B 8-way associative 8 Banks per Cache Block Therefore: Set Size is 2 3 *2 7 B= 2 10 B Sets in Cache are 2 19 /2 10 = 2 9 sets Sets per Bank are 2 9 / 2 3 = 2 6 sets tagindexoffset bank indexset index 64-bit

Power4: CACTI Results cacti um CACTI version Cache Parameters: Number of Subbanks: 8 Total Cache Size: Size in bytes of Subbank: Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): Cycle Time (wave pipelined) (ns): Total Power all Banks (nJ): Total Power Without Routing (nJ): Total Routing Power (nJ): Maximum Bank Power (nJ): Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 cacti um CACTI version Cache Parameters: Number of Subbanks: 8 Total Cache Size: Size in bytes of Subbank: Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): Cycle Time (wave pipelined) (ns): Total Power all Banks (nJ): Total Power Without Routing (nJ): Total Routing Power (nJ): Maximum Bank Power (nJ): Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 cacti um CACTI version Cache Parameters: Number of Subbanks: 16 Total Cache Size: Size in bytes of Subbank: Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): Cycle Time (wave pipelined) (ns): Total Power all Banks (nJ): Total Power Without Routing (nJ): Total Routing Power (nJ): Maximum Bank Power (nJ): Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2

CACTI Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line (sectors) Tag Array Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line (sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the increase of sense amplifiers Increase of Ndwl and Ntwl increases the number of word line drivers Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line (sectors) Tag Array Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line (sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the increase of sense amplifiers Increase of Ndwl and Ntwl increases the number of word line drivers

Thank You