Cache Organization and Performance Evaluation Vittorio Zaccaria.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Performance of Cache Memory
Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
Performance Evaluation of Architectures Vittorio Zaccaria.
1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.
The Memory Hierarchy & Cache
EECC551 - Shaaban #1 lec # 8 Spring The Memory Hierarchy & Cache Memory Hierarchy & Cache Basics (from 550):Review of Memory Hierarchy &
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
EENG449b/Savvides Lec /1/04 April 1, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.
ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:
Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.
EECC551 - Shaaban #1 lec # 7 Winter Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with.
EENG449b/Savvides Lec /7/05 April 7, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Topics Today: – Caches!! Theory Design Examples 2.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi
Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
December 18, Digital System Architecture Memory Hierarchy Design Pradondet Nilagupta Spring 2005 (original notes from Prof. Shaaban)
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Lecture 15 Calculating and Improving Cache Perfomance
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Computer Organization CS224 Fall 2012 Lessons 41 & 42.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Memory Hierarchy—Improving Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2008.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
Compilers can have a profound impact on the performance of an application on given a processor. This problem will explore the impact compilers have on.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
COSC3330 Computer Architecture
Improving Memory Access 1/3 The Cache and Virtual Memory
FLIPPED CLASSROOM ACTIVITY CONSTRUCTOR – USING EXISTING CONTENT
Morgan Kaufmann Publishers Memory & Cache
ECE 445 – Computer Organization
Systems Architecture II
Lecture 08: Memory Hierarchy Cache Performance
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Fundamentals of Computing: Computer Architecture
Lecture 7 Memory Hierarchy and Cache Design
Memory & Cache.
Presentation transcript:

Cache Organization and Performance Evaluation Vittorio Zaccaria

Exercise 1 How many total bits are required for a direct mapped instruction cache with 64 KB of data and one-word blocks, assuming 32-bit address? 1 Word=4 bytes Block no. = 64KB/4Bytes=2^14 blocks Tag bits=32-14[index]-2[offset]=16 Size=[16(tag)+1(validbit)+4(blocksize)*8]*2^14=802816

Exercise 2: DM cache 64blocks x 32 bytes Assuming byte addressing and 32-bit addresses, how many bits are there in each of the tag, Index, and Offset fields of the address? How many total bytes of data can be stored in the cache? How many bytes of memory does the cache use (including tags, valid bits, and data)? How many possible blocks reference to the same cache block? If the cache is loaded with random blocks, what is the probability of, given an address, having a match in the tag field? Index=6 bits, offset=5 bits; tag=21bits 2KB (21+1[valid])*64/8+32*64=2224Bytes 2^21 1/(2^21)

Exercise 3 Assume a cache with: Cache size = 128 bytes total. 2-word blocks. 2-way set associative. How may blocks has the cache? How many bits is the index? How many bits is the tag? [128/(8*2)]=8 [2=log(8blocks/2 sets)] [32-2-3(offset)=27]

Cache Performance CPUtime = Instruction Count x (CPI execution + Mem accesses per instruction x Miss rate x Miss penalty in cycles) x Clock cycle time Misses per instruction = #Memory accesses per instruction x Miss rate CPI = CPI execution + Misses per instruction x Miss penalty cycles AMAT= HitTime+MissRate*MissPenalty (can be expressed in cycles or in secs).

Why misses? 1) Compulsory — The first access to a block is not in the cache, so the block must be brought into the cache. 2) Capacity — If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. 3) Conflict — If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur.

3Cs Absolute Miss Rate (SPEC92)

Exercise 4 Consider a VAX-11/780 MP=6 cycles CPI exec = 8.5 MR=0.11 #mem_acc/instruction=3 Compute arch. CPI with cache CPIrealCache= CPIexec+#memacc/instr*MR*MP= * 0.11 *6 = 10.48

Exercise 5 Compare the previous architecture in the 100% miss rate case with the same in the 100% hit rate case. Compare the speedup of the real cache with the ideal one. 100%miss 100%Hit: CPIidealCache=CPIexec=8.5 CPInoCache= *6 = 26.5 Speedup(idealCache, realCache)=10.48/8.5=1.23

Excercise 6 Compute the CPI of an architecture cache with: CPIideal=1.5 MP=10 MR=0.11 #mem_acc/instr=1.4 CPIrealCache= *0.11*10=3.04

Exercise (6 cont.) Compare the case of 100% hit rate with the case of 100% miss rate. Speedup real-ideal cache: CPInoCache= *10=15.5 CPIidealCache= 1.5 Speedup=3.04 / 1.5 = 2

Exercise 7 Consider two architectures: A and B Tclk(A)=20ns, 8.5% faster than Tclk(B) Both A and B have #mem_acc/instr=1.3 MP(A)=MP(B)=200 ns MR(A)=3.9%, MR(B)=3.0% Compute AMAT(A) and AMAT(B) Compute CPI(A) and CPI(B)

Solution 7 CPI(A)= CPI(B) AMAT(A)= AMAT(B)= *10*3.9%= *[3%*round(200ns*/(20ns+8.5%*20ns))] = ns+200ns*3.9%=27.8ns 20ns(1+8.5%)+200ns*3.0%=27.7ns

Exercise 8 Architecture A[I$,D$]: 1 instr. on 85% of the cycles; other cycles NOP. Architecture B[I$,D$]: 2 instr. on 65% of cycles; 1 instr. on 30% of the time; other cycles NOP. Assume hit time= 1 cycle, miss time = 50 cycles. I$ hit rate = 100% D$ hit rate= 98% L/S instr = 33% of all instr.

Exercise 8 (cont.) CPI(A) and CPI(B) with a perfect memory system? AMAT in cycles relative to D$? CPI(A)=100cycles/85instr=1.17 CPI(B)=100/(65*2+30)= *49=1.98 cycles

Exercise 8 (cont.) CPI(A) and CPI(B) with actual cache? Speedup(B,A)=1.58; CPI(A)= *0.02*49=1.49 CPI(B)= *0.02*49=0.94

Exercise MHz CPU, 50 MHz bus speed DCache has 2 64-bit words per block Buses: 2 bytes wide burst transfer mode: each block read is: (bus clocks) Hit time= 1 cycle 6% miss rate. Ideal ICache

Exercise 9 (cont.) Consider only read data accesses. What is the effective AMAT in ns? How would you speedup? Doubling bus width? Doubling bus speed? Compute first AMAT and then speedup (1+0.06*((4+7)*300/50))CPU clocks =4.96CPU clocks, 16.5 ns

Exercise 9 Doubling bus width? First datum in 4 bus clocks, then AMAT= (1+0.06*(4+3)*6)CPUclocks =3.52CPUclocks =11.7 ns

Exercise 9 (cont.) Doubling bus speed? 1 bus clock= 3 cpu cycles AMAT= (1+0.06*(4+7)*3)CPUclocks =2.98CPUclocks =9 ns Speedup(2Xfreq,2Xwidth)=1.18