Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.

Slides:

Advertisements

Similar presentations

Computer Organization Lab 1 Soufiane berouel. Formulas to Remember CPU Time = CPU Clock Cycles x Clock Cycle Time CPU Clock Cycles = Instruction Count.

Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time

Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

A8 MIPS vs A8 Clock Sweep Simulation May A8 MIPS vs A8 clk sweep (L2 hit ratio of ~ 85%) L1 inst miss ratio = L1 load miss ratio = L1.

Performance of Cache Memory

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!

Lecture: Pipelining Basics

1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.

Using one level of Cache:

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

How caches take advantage of Temporal locality

Cache Organization and Performance Evaluation Vittorio Zaccaria.

CSCE 212 Quiz 4 – 2/16/11 *Assume computes take 1 clock cycle, loads and stores take 10 cycles and branches take 4 cycles and that they are running on.

Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.

Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.

EECC551 - Shaaban #1 lec # 7 Winter Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with.

EEM 486 EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches.

Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.

Lecture 6. Cache #2 Prof. Taeweon Suh Computer Science & Engineering Korea University ECM534 Advanced Computer Architecture.

Operation Frequency No. of Clock cycles ALU ops % 1 Loads 25% 2

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

CS25212 Coarse Grain Multithreading Learning Objectives: – To be able to describe a coarse grain multithreading implementation – To be able to estimate.

COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.

1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

CMP 301A Computer Architecture 1 Lecture 2. Outline zDirect mapped caches: Reading and writing policies zMeasuring cache performance zImproving cache.

Lecture Objectives: 1)Explain the relationship between miss rate and block size in a cache. 2)Construct a flowchart explaining how a cache miss is handled.

Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.

M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.

Lecture 15 Calculating and Improving Cache Perfomance

Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Review Introduction to Computer System, Fall (PPI, FDU)

Systems I Cache Organization

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

COSC3330 Computer Architecture

Cache Memory and Performance

CS2100 Computer Organization

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Lecture: Pipelining Basics

Lecture 13: Reduce Cache Miss Rates

Lecture: Pipelining Basics

ECE 445 – Computer Organization

CSCI206 - Computer Organization & Programming

Set-Associative Cache

Interconnect with Cache Coherency Manager

CPE 631 Lecture 05: Cache Design

Morgan Kaufmann Publishers

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?

Caches & Memory.

Presentation transcript:

Quiz 4 Solution

n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE :5% miss rate, 16-byte blocks n L2-Cache: 10% miss rate, access time:15 ns, 64-byte blocks n Memory : access time 75 ns n Miss penalty for data between L1 and L2=15ns+16/16*2.5ns=17.5ns n Miss penalty for inst between L1 and L2 =15ns+32/16*2.5ns =20ns n Miss penalty between L2 and Memory =75ns+64/16*7.5ns =105ns n Part a) n DATA_AMAT = L1 hit time+ L1 data miss rate *( L2 hit rate*(Data in L2 penalty)+L2 miss rate *(Data not in L2 but in memory penalty)) n = (0.9(17.5)+0.1( ))=1.4 n INST_AMAT = L1 hit time+ L1 inst miss rate *( L2 hit rate*(inst in L2 penalty)+L2 miss rate *(inst not in L2 but in memory penalty)) n = (0.9(20)+0.1(20+105))=0.61

n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE :5% miss rate, 16-byte blocks n L2-Cache: 10% miss rate, access time:15 ns, 64-byte blocks n Memory : access time 75 ns n DATA_AMAT n = (0.9(17.5)+0.1( ))=1.4 n INST_AMAT n = (0.9(20)+0.1(20+105))=0.61 n Part b) n Over-all CPI= Ideal CPI+impact of data accesses+impact of inst accesses n =0.4+(0.3(AMAT_DATA)+1(AMAT_INST))xfrequency n We need to take frequency into account as this is CPI and not execution time. n =0.4+((0.3(1.4)+1(0.61))*2.5GHZ)=2.975

n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE :5% miss rate, 16-byte blocks n L2-Cache: 10% miss rate, access time:15 ns, 64-byte blocks n Memory : access time 75 ns n Over-all CPI=0.4+((0.3(1.4)+1(0.61))*2.5GHZ)=2.975 n Part c) Replace 2.5GHZ processor w/ 3.6GHZ. Improvement? n New CPI=0.4+((0.3(1.4)+1(0.61))*3.6GHZ)=4.1 n execution time=CPI*Clock period n Improvement=old execution time/new execution time n =(2.975/2.5)/(4.1/3.6)= 1.19/1.13=1.05 n 5% improvement!

n Part d) n 1-Bigger L1, cut missrate to half, L1 hit= 1 cycle =0.4 ns n DATA_AMAT = L1 hit time+ L1 data miss rate *( L2 hit rate*(Data in L2 penalty)+L2 miss rate *(Data not in L2 but in memory penalty)) n = (0.9(17.5)+0.1( ))=1.1 n INST_AMAT = L1 hit time+ L1 inst miss rate *( L2 hit rate*(inst in L2 penalty)+L2 miss rate *(inst not in L2 but in memory penalty)) n = (0.9(20)+0.1(20+105))=0.7 n Over-all CPI= Ideal CPI+impact of data accesses+impact of inst accesses n =0.4+(0.3(AMAT_DATA)+1(AMAT_INST))xfrequency n =0.4+(0.3(1.1)+1(0.7))*frequency= No improvement.

n Part d) n 2-Smaller L2, L2 access time= 10, L2 miss rate=15% n Miss penalty for data between L1 and L2=10ns+16/16*2.5ns=12.5ns n Miss penalty for inst between L1 and L2 =10ns+32/16*2.5ns =15ns n DATA_AMAT = L1 hit time+ L1 data miss rate *( L2 hit rate*(Data in L2 penalty)+L2 miss rate *(Data not in L2 but in memory penalty)) n = (0.85(12.5)+0.15( ))=1.41 n INST_AMAT = L1 hit time+ L1 inst miss rate *( L2 hit rate*(inst in L2 penalty)+L2 miss rate *(inst not in L2 but in memory penalty)) n = (0.85(15)+0.15(15+105))=0.61 n Over-all CPI= Ideal CPI+impact of data accesses+impact of inst accesses n =0.4+(0.3(AMAT_DATA)+1(AMAT_INST))xfrequency n =0.4+(0.3(1.41)+1(0.61))*frequency= lose performance.