1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

Slides:

Advertisements

Similar presentations

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

Performance of Cache Memory

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!

1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.

Cache Organization and Performance Evaluation Vittorio Zaccaria.

CSCE 212 Quiz 4 – 2/16/11 *Assume computes take 1 clock cycle, loads and stores take 10 cycles and branches take 4 cycles and that they are running on.

1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.

Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.

Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)

2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.

CMP 301A Computer Architecture 1 Lecture 3. Outline zQuick summary zMultilevel cache zVirtual memory y Motivation and Terminology y Page Table y Translation.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.

Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.

Lecture 15 Calculating and Improving Cache Perfomance

1 CSCI 2510 Computer Organization Memory System II Cache In Action.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.

1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

COSC3330 Computer Architecture

Cache Memory and Performance

Improving Memory Access 1/3 The Cache and Virtual Memory

CSC 4250 Computer Architectures

How will execution time grow with SIZE?

Morgan Kaufmann Publishers Memory & Cache

ECE 445 – Computer Organization

CSCI206 - Computer Organization & Programming

Systems Architecture II

Lecture 08: Memory Hierarchy Cache Performance

Set-Associative Cache

November 14 6 classes to go! Read

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

Problem 1: Final 2004 zYou are building a system around a processor running at 2.0 GHz. The processor has a CPI of 0.5 excluding memory stalls. 30% of the instructions are loads and stores. The memory system is composed of a unified L1 cache that imposes 1ns penalty on hits. The L1-cache has a miss rate of 2% for both instructions and data and has 32-byte blocks. The unified L2 cache has 64-byte blocks and an access time of 25ns. It is connected to the L1 cache by a 256-bit data bus that runs at 400MHz and can transfer one 256-bit bus word per bus cycle. Of all memory references sent to the L2 cache, 95% are satisfied without going to the main memory. The main memory has an access latency of 100ns, after which any number of bus words may be transferred at the rate of one per cycle on the 128-bit-wide 133MHz main memory bus. zWhat is A.M.A.T? ( 6 points) zWhat is the over-all CPI? ( 6 points) zYou are considering replacing the 2.0GHz with one that runs at 3.0GHZ but otherwise is identical. How much faster does the system run with a faster processor? Assume that the speed of the memory system remains the same in absolute terms. (6 points)

Problem 2: Quiz 2004 zYou are building a system around a processor running at 2.5 GHz. The processor has a CPI of 0.4 excluding memory stalls. 30% of the instructions are loads and stores. The memory system is composed of a split L1 cache that imposes no penalties on hits. The I-cache has a miss rate of 2% and has 32- byte blocks. The D-cache has a 5% miss rate and 16-byte blocks. The 512 KB, unified L2 cache has 64-byte blocks and an access time of 15ns. It is connected to the L1 cache by a 128-bit data bus that runs at 400MHz and can transfer one 128-bit bus word per bus cycle. Of all memory references sent to the L2 cache, 90% are satisfied without going to the main memory. The main memory has an access latency of 75 ns, after which any number of bus words may be transferred at the rate of one per cycle on the 128-bit-wide 133MHz main memory bus. zWhat is A.M.A.T? zWhat is the over-all CPI? zYou are considering replacing the 2.5GHz with one that runs at 3.6GHZ but otherwise is identical. How much faster does the system run with a faster processor? Assume that the speed of the memory system remains the same in absolute terms. zYou have the following two options to improve system performance z1-Use a bigger L1 cache, which cuts miss rates to half but results in a 1-cycle L1 cache hit time. z2-Use a smaller L2 cache, which reduces the access time to 10ns but increases the L2 cache miss-rate to 15%. z How does each option impact performance and which one is a better choice (if any)?

Problem 3: Final 2004 A program consists of two-nested loops-a small inner loop and a much larger outer loop. The general structure of the program is given in the figure below. The decimal memory addresses shown represent the location of the two loops and the beginning and end of the total program. All memory locations in the various sections contain instructions to be executed in straight-line sequencing. The program is to be run on a computer that uses a 2-way set-associative cache. The main memory size is 64k-bytes, Cache size is 1k-bytes and block size is 128 bytes. START Inner loop executed Outer loop executed 20 times Executed 10 times 1200 END The cycle time of the main memory is 100ns and the cycle time of the cache is 10ns. a) Specify the number of bits in the TAG, BLOCK, and WORD fields in main memory addressees. (5 points) b) Compute the total time needed for instruction fetching during the execution of the program in the above figure. (20 points)