ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 12 Memory – Part 1 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
February 9, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
February 4, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of.
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of.
February 9, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at.
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 10 CS252 Graduate Computer Architecture Spring 2014 Lecture 10: Memory Krste Asanovic
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
1 CSCI 2510 Computer Organization Memory System I Organization.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CPEN Digital System Design
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 14 Virtual Memory Benjamin Lee Electrical and Computer Engineering Duke University
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
ECE 252 / CPS 220 Advanced Computer Architecture I Lecture 14 Virtual Memory Benjamin Lee Electrical and Computer Engineering Duke University
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Computer Architecture Lecture 24 Fasih ur Rehman.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
2/8/2016 CS152, Spring 2016 CS 152 Computer Architecture and Engineering Lecture 6 - Memory Dr. George Michelogiannakis EECS, University of California.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CENG709 Computer Architecture and Operating Systems Lecture 6 - Memory Murat Manguoglu Department of Computer Engineering Middle East Technical University.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Electrical and Computer Engineering
Multilevel Memories (Improving performance using alittle “cash”)
Cache Memory Presentation I
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
Computer Organization & Design 计算机组成与设计
Rose Liu Electrical Engineering and Computer Sciences
Krste Asanovic Electrical Engineering and Computer Sciences
Electrical and Computer Engineering
Adapted from slides by Sally McKee Cornell University
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
Lecture 22: Cache Hierarchies, Memory
Dept. of Info. Sci. & Elec. Engg.
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory and Performance
CSE 486/586 Distributed Systems Cache Coherence
Memory Principles.
Presentation transcript:

ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 12 Memory – Part 1 Benjamin Lee Electrical and Computer Engineering Duke University

ECE 552 / CPS 5502 ECE552 Administrivia 19 October – Homework #3 Due 19 October – Project Proposals Due One page proposal 1.What question are you asking? 2.How are you going to answer that question? 3.Talk to me if you are looking for project ideas. 23 October – Class Discussion Roughly one reading per class. Do not wait until the day before! 1.Jouppi. “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers.” 2.Kim et al. “An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches.” 3.Fromm et al. “The energy efficiency of IRAM architectures” 4.Lee et al. “Phase change memory architecture and the quest for scalability”

ECE 552 / CPS 5503 History of Memory Core Memory -Williams Tube in Manchester Mark I (1947) unreliable. -Forrester invented core memory for MIT Whirlwind ( s) in response -First large-scale, reliable main memory Magnetic Technology -Core memory stores bits using magnetic polarity on ferrite cores -Ferrite cores threaded onto 2D grid of wires -Current pulses on X- and Y-axis could read and write cells Performance -Robust, non-volatile storage -1 microsecond core access time DEC PDP-8/E Board, 4K words x 12 bits, (1968)

ECE 552 / CPS 5504 Semiconductor Memory -Static RAM (SRAM): cross-coupled inverters latch value -Dynamic RAM (DRAM): charge stored on a capacitor Advent of Semiconductor Memory -Technology became competitive in early 1970s -Intel founded to exploit market for semiconductor memory Dynamic Random Access Memory (DRAM) -Charge on a capacitor maps to logical value -Intel 1103 was first commercial DRAM -Semiconductor memory quickly replaced core memory in 1970’s

ECE 552 / CPS 5505 DRAM – Dennard 1968 TiN top electrode (V REF ) Ta 2 O 5 dielectric W bottom electrode poly word line access transistor 1T DRAM Cell word bit access transistor Storage capacitor (FET gate, trench, stack) V REF

ECE 552 / CPS 5506 DRAM Chip Architecture -- Chip organized into 4-8 logical banks, which can be accessed in parallel -- Each bank implements 2-D array of bits Row Address Decoder Col. 1 Col. 2 M Row 1 Row 2 N Column Decoder & Sense Amplifiers M N N+M bit lines word lines Memory cell (one bit) D Data Bank 1

ECE 552 / CPS 5507 Packaging & Memory Channel DIMM (Dual Inline Memory Module): Multiple chips sharing the same clock, control, and address signals. Data pins collectively supply wide data bus. For example, four x16 chips supply 64b data bus. Address lines multiplexed row/column address Clock and control signals Data bus (x4, x8, x16, x32) DRAM chip ~12 ~7

ECE 552 / CPS 5508 Packaging & 3D Stacking [ Apple A4 package cross-section, iFixit 2010 ] Two stacked DRAM die Processor plus logic die [ Apple A4 package on circuit board ]

ECE 552 / CPS 5509 DRAM Operation 1.Activate (ACT) - Decode row address (RAS). Enable the addressed row (e.g., 4Kb) - Bitline and capacitor share charge - Sense amplifiers detect small change in voltage. - Latch row contents (a.k.a. row buffer) 2. Read or Write - Decode column address (CAS). Select subset of row (e.g., 16b) - If read, send latched bits to chip pins - If write, modify latched bits and charge capacitor - Can perform multiple CAS on same row without RAS (i.e., buffer hit) 3. Precharge - Charge bit lines to buffer to prepare for next row access

ECE 552 / CPS DRAM Chip Architecture -- Activate: Latch row in sense amplifiers -- Read/Write: Access specific columns in the row. -- Precharge: Prepare for next row Row Address Decoder Col. 1 Col. 2 M Row 1 Row 2 N Column Decoder & Sense Amplifiers M N N+M bit lines word lines Memory cell (one bit) D Data Bank 1

ECE 552 / CPS DRAM Controller 1. Interfaces to Processor Datapath - Processor issues a load/store instruction - Memory address maps to particular chips, rows, columns 2. Implements Control Protocol - (1) Activate a row, (2) Read/write the row, (3) Precharge - Enforces timing parameters between commands - Latency of each step is approximately 15-20ns - Various DRAM standards (DDR, RDRAM) have different signals

ECE 552 / CPS Double Data Rate (DDR*) DRAM [ Micron, 256Mb DDR2 SDRAM datasheet ] RowColumnPrechargeRow’ Data 200MHz Clock 400Mb/s Data Rate

ECE 552 / CPS Processor-Memory Bottleneck Memory is usually a performance bottleneck - Processor limited by memory bandwidth and latency Latency (time for single transfer) - Memory access time >> Processor cycle time - Example: 60ns latency translates into 60 cycles for 1GHz processor Bandwidth (number of transfers per unit time) - Every instruction is fetched from memory - Suppose M is fraction of loads/stores in a program - On average,1+M memory references per instruction - For CPI = 1, system must supply 1+M memory transfers per cycle.

ECE 552 / CPS Processor-Memory Latency Consider processor. Four-way superscalar. 3GHz clock. In 100ns required to access DRAM once, processor could execute 1,200 instructions

ECE 552 / CPS Distance Increases Latency Small Memory CPU Big Memory CPU

ECE 552 / CPS Memory Cell Size Off-chip DRAM has higher density than on-chip SRAM. [ Foss, “Implementing Application-Specific Memory”, ISSCC 1996 ] DRAM on memory chip On-Chip SRAM in logic chip

ECE 552 / CPS Memory Hierarchy CapacityRegister (RF) << SRAM << DRAM LatencyRegister (RF) << SRAM << DRAM Bandwidth on-chip >> off-chip Consider a data access. If data is located in fast memory, latency is low (e.g., SRAM). If data is not located in fast memory, latency is high (e.g., DRAM). Small, Fast Memory (RF, SRAM) CPU Big, Slow Memory (DRAM) AB holds frequently used data

ECE 552 / CPS Memory Hierarchy Management Small & Fast (Registers) - Instruction specifies address (e.g., R5) - Implemented directly as register file - Hardware might dynamically manage register usage - Examples: stack management, register renaming Large & Slow (SRAM and DRAM) - Address usually computed from values in registers (e.g., ld R1, x(R2)) - Implemented directly as hardware-managed cache hierarchy - Hardware decides what data is kept in faster memory - Software may provide hints

ECE 552 / CPS Real Memory Reference Patterns Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): (1971) Time Memory Address (one dot per access)

ECE 552 / CPS Predictable Patterns Temporal Locality If a location is referenced once, the same location is likely to referenced again in the near future. Spatial Locality If a location is referenced once, nearby locations are likely to be referenced in the near future.

ECE 552 / CPS Real Memory Reference Patterns Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): (1971) Time Memory Address (one dot per access) Temporal Locality Temporal Locality Spatial Locality Spatial Locality

ECE 552 / CPS Caches Caches exploit predictable patterns Temporal Locality Caches remember the contents of recently accessed locations Spatial Locality Caches fetch blocks of data nearby recently accessed locations

ECE 552 / CPS Caches Cache Processor Main Memory Address Data Address Tag Data Block Data Byte Data Byte Data Byte Line copy of main memory location 100 copy of main memory location

ECE 552 / CPS Cache Controller Controller examines address from datapath and searches cache for matching tags. Cache Hit – address found in cache - Return copy of data from cache Cache Miss – address not found in cache - Read block of data from main memory. - Wait for main memory - Return data to processor and update cache - What is the update policy?

ECE 552 / CPS Data Placement Policy Fully Associative - Update – place data in any cache line (a.k.a. block) - Access – search entire cache for matching tag Set Associative - Update – place data within set of lines determined by address - Access – identify set from address, search set for matching tag Direct Mapped - Update – place data in specific line determined by address - Access – identify line from address, check for matching tag

ECE 552 / CPS Placement Policy Set Number Cache Fully (2-way) Set Direct AssociativeAssociative Mapped anywhereanywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) Memory Line Number Line 12 can be placed

ECE 552 / CPS Direct-Mapped Cache TagData Line V = Line Offset TagIndex t k b t HIT Data Word or Byte 2 k lines

ECE 552 / CPS Fully Associative Cache

ECE 552 / CPS Update/Replacement Policy In an associative cache, which cache line in a set should be evicted when the set becomes full? Random Least Recently Used (LRU) - LRU cache state must be updated on every access - True implementation only feasible for small sets (e.g., 2-way) - Approximation algorithms exist for larger sets First-In, First-Out (FIFO) - Used in highly associative caches Not Most Recently Used (NMRU) - Implements FIFO with an exception for most recently used blocks

ECE 552 / CPS Cache Example Given memory accesses (read address), complete table for cache. Cache is two-way set associative with four lines (a.k.a. sets) Each entry contains the {tag, index} for that line.

ECE 552 / CPS Line Size and Spatial Locality Word3Word0Word1Word2 Larger line size has distinct hardware advantages -- less tag overhead -- exploit fast burst transfers from DRAM -- exploit fast burst transfers over wide bus What are the disadvantages of increasing block size? -- fewer lines, more line conflicts -- can waste bandwidth depending on application’s spatial locality Line address offset b 2 b = line size a.k.a block size (in bytes) Split CPU address b bits 32-b bits Tag Line is unit of transfer between the cache and memory 4 word line, b=2

ECE 552 / CPS Midterm

ECE 552 / CPS Acknowledgements These slides contain material developed and copyright by -Arvind (MIT) -Krste Asanovic (MIT/UCB) -Joel Emer (Intel/MIT) -James Hoe (CMU) -John Kubiatowicz (UCB) -Alvin Lebeck (Duke) -David Patterson (UCB) -Daniel Sorin (Duke)