Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Lecture 12 Reduce Miss Penalty and Hit Time
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
1 The Memory Hierarchy Ideally one would desire an indefinitely large memory capacity such that any particular word would be immediately available … We.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 23 - Course.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
CMPE 421 Parallel Computer Architecture
Lecture 19: Virtual Memory
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques Basic multiplication and division ( non- restoring)
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Cache Memory.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
Performance improvements ( 1 ) How to improve performance ? Reduce the number of cycles per instruction and/or Simplify the organization so that the clock.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CSE 351 Section 9 3/1/12.
Improving Memory Access 1/3 The Cache and Virtual Memory
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
ECE 445 – Computer Organization
Integration of cache System into MIPS Pipeline
CSCI206 - Computer Organization & Programming
Systems Architecture II
Lecture 08: Memory Hierarchy Cache Performance
Lecture 22: Cache Hierarchies, Memory
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 20: OOO, Memory Hierarchy
Fundamental Concepts Processor fetches one instruction at a time and perform the operation specified. Instructions are fetched from successive memory locations.
CS-447– Computer Architecture Lecture 20 Cache Memories
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Memory & Cache.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication and division ( non- restoring) algorithms IEEE 754 floating point standard (definition provided) Write a sequence of register transfers to implement a given instruction for MIPS Given a set of Register Transfers, design the control needed for some component

Cache Design Issues 1.Where can a word or block of words be placed in the cache?

Cache Design Issues 1.Where can a word or block of words be placed in the cache? 2. How can a word be found if it is in the cache?

Cache Design Issues 1.Where can a word or block of words be placed in the cache? 2.How can a word be found if it is in the cache? 3.Which word or block of words should be replaced on a cache miss?

Cache Design Issues 1.Where can a word or block of words be placed in the cache? 2.How can a word be found if it is in the cache? 3.Which word or block of words should be replaced on a cache miss? 4. When do we write the main memory?

Cache Design Issues 1.Where can a word or block of words be placed in the cache? 2.How can a word be found if it is in the cache? 3.Which word or block of words should be replaced on a cache miss? 4.When do we write the main memory? Remember READS dominate WRITES

10 bit Address 10 bit Addr Main 1K words 10 bit words 4 bit Addr Cache 16 Entries

10 bit Address 10 bit Addr Main 1K words 10 bit words 4 bit Addr Cache 16 Entries 1.Where can a word be placed in the cache?

10 bit Address 10 bit Addr Main 1K words 10 bit words 4 bit Addr Cache 16 Entries 1.Where can a word be placed in the cache? Use the last 4 bits of the address.

Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache 16 Entries 1.Where can a word be placed in the cache? Use the last 4 bits of the address. Address

Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache 16 Entries Address

Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache 16 Entries Address How can a word be found if it is in the cache? 2 6 = 64 possibilities Need to know the rest of the address !

Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache 16 Entries Address How can a word be found if it is in the cache? 2 6 = 64 possibilities Need to know the rest of the address ! Save it in the Cache

Tag Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache Address Data -10 bitTag – 6 bits

Tag Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache Address Data -10 bit 1 Tag – 6 bits Also need a Valid bit to indicate that the cache has valid data

Tag Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache Address Data -10 bit 1 Tag – 6 bits Valid Which word should be replaced on a cache miss?

Tag Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache Address Data -10 bit 1 Tag – 6 bits Valid Which word should be replaced on a cache miss? The one with the same Index.

Hit if: the location has been accessed and there has not been a location accessed with the same index since then.

Hit if: the location has been accessed and there has not been a location accessed with the same index since then. Temporal locality: most recently accessed Spatial locality: will not be replaced until an access occurs beyond the size of the cache

Hit if: the location has been accessed and there has not been a location accessed with the same tag since then. Temporal locality: most recently accessed Spatial locality: will not be replaced until an access occurs beyond the size of the cache The larger the cache the lower the miss rate and the lower the average access time (approaches Hit time)

Tag Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache Address Data -10 bit 1 Tag – 6 bits Valid When do we write the main memory?

Tag Index 10 bit Addr Main 1K words 10 bit words Index 4 bit Addr Cache Address Data -10 bit 1 Tag – 6 bits Valid When do we write the main memory? As soon as the cache is written. Called Write-Through

Direct Mapped General Structure Tag Index Address - n bits 2 k Cache Words 1 Tag (n-k-2 bits) Computer Word (n bits) Valid Byte Offset n-k-2 k 2

Direct Mapped General Structure Tag Index Address - n bits 2 k Cache Words 1 Tag (n-k-2 bits) Computer Word (n bits) Valid Byte Offset n-k-2 k 2 Ex: 32 bit address and 2 14 words of data cache, k = 14

Direct Mapped General Structure Tag Index Address - n bits 2 k Cache Words 1 Tag (n-k-2 bits) Computer Word (n bits) Valid Byte Offset n-k-2 k 2 Ex: 32 bit address and 2 14 words of data cache, k = 14 Cache width is = 49 bits 49 / 32 = 1.53 bits more than just data

Tag Index Address – 32 bits Byte Offset 16K entries Valid Tag Data 16 32

Tag Index Address – 32 bits Byte Offset 16K entries Valid Tag Data 16 32

Tag Index Address – 32 bits Byte Offset 16K entries Valid Tag Data = Hit

Tag Index Address – 32 bits Byte Offset 16K entries Valid Tag Data = Hit Data

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Notation: Cache Memory ( Field) [Address] Data field addressed by PC Index CM(31 –0)[PC(15-2)] Tag Field Addressed by PC Index CM(47-32)[PC(15-2]

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Instruction Fetch - Assume Main Memory access is 5 clock cycles Was: S0 M[PC] IR, PC+4 PC,S1 S

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Instruction Fetch - Assume Main Memory access is 5 clock cycles Was: S0 M[PC] IR, PC+4 PC,S1 S S0 CM(31-0)[ PC(15-2)] IR, PC+4 PC, HitS1+HitS10 S

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Instruction Fetch - Assume Main Memory access is 5 clock cycles Was: S0 M[PC] IR, PC+4 PC,S1 S S0 CM(31-0)[ PC(15-2)] IR, PC+4 PC, HitS1+HitS10 S S10 PC – 4 PCS11 S S11 MM[PC] MMOutS12 S S12 S13 S S13S14 S S14S15 S S15S16 S

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Instruction Fetch - Assume Main Memory access is 5 clock cycles Was: S0 M[PC] IR, PC+4 PC,S1 S S0 CM(31-0)[ PC(15-2)] IR, PC+4 PC, HitS1+HitS10 S S10 PC – 4 PCS11 S S11 MM[PC] MMOutS12 S S12 S13 S S13S14 S S14S15 S S15S16 S S16 MMOut CM(31-0)[PC(15-2)], PC(31-16) CM(47-32)[PC(15-2)] 1 CM(48)[PC(15-2)]S0 S

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Instruction Fetch - Assume Main Memory access is 5 clock cycles Hit = Valid[Index] { Cache Tag[Index] = Addr Tag} For CM addressed by PC(15-2)

Tag Index Address – 32 bits Byte Offset 16K entries Valid Tag Data = Hit Data WRITE Write Cache Write Main

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Memory Write - Assume Main Memory access is 5 clock cycles WAS: S5 B M[ALUOut]S0 S

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Memory Write - Assume Main Memory access is 5 clock cycles WAS: S5 B M[ALUOut]S0 S S5 B CM(31-0)[ALUOut(15-2)],

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Memory Write - Assume Main Memory access is 5 clock cycles WAS: S5 B M[ALUOut]S0 S S5 B CM(31-0)[ALUOut(15-2)], ALUOut(31-16) CM(47-32)[ALUOut(15-2)] 1 CM(48)[ALUOut(15-2)]

Cache Control for MIPS Lite Tag Index Byte Offset Address Cache Valid Tag Data Memory Write - Assume Main Memory access is 5 clock cycles WAS: S5 B M[ALUOut]S0 S S5 B CM(31-0)[ALUOut(15-2)], ALUOut(31-16) CM(47-32)[ALUOut(15-2)] 1 CM(48)[ALUOut(15-2)] B MM[ALUOut]S17 S S17S18 S S18S19 S S19S20 S S20S0 S

DECStation 3100 Processor Instruction Cache Data Cache 14 bit Index 32 bit Data Word 16K Entries 64 KB Data Main Memory

DECStation 3100 Instruction DataEffective ProgramMiss RateMiss Rate Miss Rate gcc 6.1% 2.1% 5.4% spice 1.2% 1.3% 1.2% Only Read Misses Effective is weighted average of accesses Instruction Miss Rate not always less than Data Miss Rates clearly depend on the program Direct Mapped 1 word cache is effective

DECStation 3100 Instruction DataEffective ProgramMiss RateMiss Rate Miss Rate gcc 6.1% 2.1% 5.4% spice 1.2% 1.3% 1.2% Direct Mapped 1 word cache is effective Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty

DECStation 3100 Instruction DataEffective ProgramMiss RateMiss Rate Miss Rate gcc 6.1% 2.1% 5.4% spice 1.2% 1.3% 1.2% What if combined into one large cache?

DECStation 3100 Instruction DataEffective Combined ProgramMiss RateMiss Rate Miss Rate Miss Rate gcc 6.1% 2.1% 5.4% 4.8% spice 1.2% 1.3% 1.2% What if combined into one large cache? One large cache has lower miss rate than two half caches Split caches can double the bandwidth by simultaneous access ( pipelining)

Write – Through Performance Improvement Every Write : Write Cache and Write Main Memory Can be 10% to 15% of instructions

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory Valid Address Data

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory Memory Controller Writes Data from Buffer to Main and Releases Buffer

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory Write Cache and Buffer Continue until? Memory Controller Writes Data from Buffer to Main and Releases Buffer

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory Write Cache and Buffer Continue until? 1.Write Buffer Full (Write Miss – HOLD)

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory Write Cache and Buffer Continue until? 1.Write Buffer Full (Write Miss – HOLD) 2. Read Miss

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory Write Cache and Buffer Continue until? 1.Write Buffer Full (Write Miss – HOLD) 2.Read Miss Wait until Write Buffer is empty.