CSCI206 - Computer Organization & Programming

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Performance of Cache Memory
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 The System Unit Lecture 2 CSCI 1405 Introduction to Computer Science Fall 2006.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Random access memory.
Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:
CMPE 421 Parallel Computer Architecture
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
1 CSCI 2510 Computer Organization Memory System I Organization.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Cache Basics Define temporal and spatial locality.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Memory Hierarchy David Kilgore CS 147 Dr. Lee Spring 2008.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Computer Architecture Lecture 25 Fasih ur Rehman.
CACHE _View 9/30/ Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels.
Computer System Structures Storage
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Memory P2 Understand hardware technologies for game platforms
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
COSC3330 Computer Architecture
Cache Memory and Performance
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Morgan Kaufmann Publishers Memory & Cache
CS-301 Introduction to Computing Lecture 17
ECE 445 – Computer Organization
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Introduction to Computing
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Memory P2 Understand hardware technologies for game platforms
Morgan Kaufmann Publishers
Memory Organization.
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
MICROPROCESSOR MEMORY ORGANIZATION
CSC3050 – Computer Architecture
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Fundamentals of Computing: Computer Architecture
Memory & Cache.
Memory Principles.
Caches & Memory.
Presentation transcript:

CSCI206 - Computer Organization & Programming Memory Introduction zyBook: 12.1, 12.2

Many different types of memory Volatile SRAM DRAM SDRAM PC100 PC133 DDR SDRAM DDR2 SDRAM DDR3 SDRAM DDR4 SDRAM GDDR3 GDDR4 GDDR5 RDRAM Non-volatile: ROM EEPROM NOR Flash NAND Flash SD SDHC SDXC FRAM HDD Optical Drive WHY?

Competing Features of Memory

Memory hierarchy CPU Registers On-board CPU Cache Main memory On chips (circuits) Secondary storage Typically involving mechanical parts.

Intel iCore-7 memory hierarchy

Intel iCore-7 cache

Trade-offs Ideally memory would be infinitely large, fast, and low power This is impossible The memory hierarchy simulates a large/fast memory system using combination of different memory technologies

Why it works temporal locality spatial locality We can simulate a large/fast memory because of temporal locality recently accessed data is likely to be accessed again in the future spatial locality data near recently accessed data (by address) is more likely to be requested in the future than data that is far away

Cache Memory The cache is a small amount of fast (expensive) memory holding data currently being worked on (temporal/spatial locality) Main memory is much larger, slower, and cheaper The processor interfaces with the cache, so memory appears to be fast! A cache algorithm decides which memory blocks to store and when to move blocks back into main memory

Cache Hit lw $t0, 0($s0) [$s0+0] is in cache Cache runs at CPU speed so there is no delay, data is read from cache in M stage All pipeline diagrams we have done assume cache hits Otherwise, MEM-EX or MEM-MEM forward would be impossible as memory read is much slower than register

Cache Miss lw $t0, 0($s0) [$s0+0] NOT in cache Latency to main memory is 100 ns In comparison, register reading is a few cycles If the CPU runs at 2 GHz how many cycles do we stall?

Cache performance parameters Hit rate: The fraction of memory accesses found in a level of the memory hierarchy. Miss rate: The fraction of memory accesses not found in a level of the memory hierarchy. Hit time: The time required to access a level of the memory hierarchy Miss penalty: The time required to fetch a block into a level of the memory hierarchy from the lower level.

Average Memory Access Time (AMAT) If a cache hits 80% of the time and the miss penalty is 200 cycles, the AMAT (in clock cycles) is

Performance A program with 1M instructions runs on a 2 GHz ideal pipelined processor (CPI=1). 25% of the instructions access memory with an 80% hit rate and 100 cycle miss penalty. How long does the program take to execute?

Approach