Main MemoryCS510 Computer ArchitecturesLecture 15 - 1 Lecture 15 Main Memory.

Slides:



Advertisements
Similar presentations
Chapter 5 Internal Memory
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 19, 2003 Topic: Main Memory (DRAM) Organization.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
ENGS 116 Lecture 141 Main Memory and Virtual Memory Vincent H. Berk October 26, 2005 Reading for today: Sections 5.1 – 5.4, (Jouppi article) Reading for.
EECC550 - Shaaban #1 Lec # 10 Summer Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
Main Memory by J. Nelson Amaral.
8-5 DRAM ICs High storage capacity Low cost Dominate high-capacity memory application Need “refresh” (main difference between DRAM and SRAM) -- dynamic.
Overview Booth’s Algorithm revisited Computer Internal Memory Cache memory.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
EEL 5708 Main Memory Organization Lotzi Bölöni Fall 2003.
Survey of Existing Memory Devices Renee Gayle M. Chua.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Lecture 13 Main Memory Computer Architecture COE 501.
Main Memory CS448.
CS 312 Computer Architecture Memory Basics Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki
Memory Hierarchy— Reducing Miss Penalty Reducing Hit Time Main Memory Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Lecture 12: Memory Hierarchy— Five Ways to Reduce Miss Penalty (Second Level Cache) Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Chapter 4 Memory Design: SOC and Board-Based Systems
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CSE431 L18 Memory Hierarchy.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 18: Memory Hierarchy Review Mary Jane Irwin (
CPEG3231 Integration of cache and MIPS Pipeline  Data-path control unit design  Pipeline stalls on cache misses.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
Physical Memory and Physical Addressing ( Chapter 10 ) by Polina Zapreyeva.
Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions)
Memory Hierarchy— Reducing Miss Penalty Reducing Hit Time Main Memory
CS 704 Advanced Computer Architecture
Yu-Lun Kuo Computer Sciences and Information Engineering
Memory Hierarchy Reducing Hit Time Main Memory and Examples
Reducing Hit Time Small and simple caches Way prediction Trace caches
Improving Memory Access 1/3 The Cache and Virtual Memory
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
CMSC 611: Advanced Computer Architecture
Lecture: DRAM Main Memory
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Shared Memory Accesses
Main Memory Background
Presentation transcript:

Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory

Main MemoryCS510 Computer ArchitecturesLecture Main Memory Background Performance of Main Memory: –Latency: Cache Miss Penalty Access Time(AT): time between request and word arrives Cycle Time(CT): time between requests –Bandwidth: I/O & Large Block Miss Penalty (L2) Main Memory, a 2D matrix, is DRAM: –Dynamic since needs to be refreshed periodically (8 ms) Difference in AT and CT, AT<CT –Addresses divided into 2 halves, multiplexing them to memory: RAS or Row Access Strobe CAS or Column Access Strobe Cache uses SRAM: –No refresh (6 transistors/bit vs. 1 transistor/bit) No difference in AT and CT, AT=CT –Address not divided

Main MemoryCS510 Computer ArchitecturesLecture Main Memory Background Size:Size: DRAM/SRAM  4~8 Cost and Cycle time: SRAM/DRAM  8~16 Capacity of DRAM : 4 times/3 years or 60%/year RAS access time : 7% per year

Main MemoryCS510 Computer ArchitecturesLecture Main Memory Organization Simple: –CPU, Cache, Bus, Memory are same width (32 bits) CPU Cache BUS M 1-word-wide memory Interleaved : –CPU, Cache, Bus 1wd: Memory N Modules (4 Modules); shows word interleave Wide: –CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits) MMMM Bank bank bank bank CPU Cache BUS Interleaved Memory M CPU Cache BUS Wide Memory MUX

Main MemoryCS510 Computer ArchitecturesLecture Main Memory Performance Timing model –1 to send address, –6 access time, –1 to send data –Block access time Assuming Cache Block is 4 words Address Bank 0 Bank1 Bank 2 Bank 3                 Simple M.P. = 4 x (1+6+1) = 32 Wide M.P. = = 8 Interleaved M.P. = (4x1) = 11

Main MemoryCS510 Computer ArchitecturesLecture Technique for Higher BW: 1. Wider Main Memory Alpha AXP : 256-bit wide L2, Memory Bus, Memory Drawbacks –expandability doubling the width needs doubling the capacity –bus width need a multiplexer to get the desired word from a block –error correction - separate error correction every 32 bits otherwise, on WRITE, read block -> modify word -> calculate the new ECC -> store

Main MemoryCS510 Computer ArchitecturesLecture Technique for Higher BW: 2. Interleaved Memory Interleaved Memory and Wide Memory –Consider the following description of a machine and its cache performance mem bus width = 1 word=32 bit »memory accesses/ instr = 1.2 »cache miss penalty = 8(1+6+1) cycles »average CPI(ignoring cache misses) = 2 –What is the improvement over the base machine(block size=1) in performance of interleaving 2-way and 4-way versus doubling the width of memory and the bus block size(word) miss rate(%) 3 2 1

Main MemoryCS510 Computer ArchitecturesLecture Interleaved Memory Answer –CPI + (M ref/instr. x miss rate x miss penalty) =2 + (1.2 x (0.03 for 1-way, 0.02 for 2-way, or 0.01 for 4-way) x mis penalty) –the CPI for the base machine(Simple Memory)(BM) 2+(1.2 x 0.03 x 8) = –2-word wide memory 32-bit bus and mem, no interleaving = 2+(1.2x0.02x(2x8)) = slower than BM 32-bit bus and mem, interleaving = 2+(1.2x0.02x(1+6+(2x1))) = faster than BM 64-bit bus and mem, no interleaving = 2+(1.2x0.02x8) = faster than BM –4-word wide memory 32-bit bus and mem, no interleaving = 2+(1.2x0.01x(4x8)) = slower than BM 32-bit bus and mem, interleaving = 2+(1.2x0.01x(1+6+(4x1))) = faster than 2-word 64-bit bus and mem, no interleaving = 2+(1.2x0.01x(2x8)) = same as 2-word

Main MemoryCS510 Computer ArchitecturesLecture Technique for Higher BW: 3. Independent Memory Banks Interleaved Memory-Faster Sequential Accesses; Independent Memory Banks - Faster Independent Accesses Motivation: Higher BW for sequential accesses by interleaving sequential bank addresses - each bank shares the address line Memory banks for independent accesses - each bank has a bank controller, separate address lines –1 bank for I/O, 1 bank for cache read, 1 bank for cache write, etc. –If 1 controller controls all the banks, it can only provide fast access time for one operation –Benefit of memory banks for Miss under Miss in Non-faulting caches Superbank: all memory banks active on one block transfer Bank: portion within a superbank that is word interleaved Superbank Number Superbank Offset Bank Number Bank Offset Superbank Bank

Main MemoryCS510 Computer ArchitecturesLecture Independent Memory Banks How many banks? –For sequential accesses, a new bank delivers a word on each clock –For sequential accesses, number of banks  number of clocks to access a word in a bank –Otherwise will return to the original bank before it has the next word ready Increasing capacity of a DRAM chip => fewer chips to build the same capacity memory system => harder to have banks

Main MemoryCS510 Computer ArchitecturesLecture Technique for Higher BW: 4. Avoiding Bank Conflicts Even a lot of banks, still bank conflict in certain regular accesses - e.g. Storing 256x512 array in 128 banks and column processing (512 is an even multiple of 128)         Bank0 Bank1 Bank127,…, Bank511 int x[256][512]; for (j = 0; j < 512; j = j+1) for (i = 0; i < 256; i = i+1) x[i][j] = 2 * x[i][j]; Column processing Column elements are in the same bank Inner Loop is a column processing which causes bank conflicts

Main MemoryCS510 Computer ArchitecturesLecture Avoiding Bank Conflicts SW approaches –Loop interchange to avoid accessing the same bank –Declaring array size not power of 2(number of banks is a power of 2) so that addresses point to the different banks, i.e., a column elements are spread around different banks HW: Prime number of banks –bank number = (address) MOD (number of banks) –address within bank = address / number of banks To avoid calculation of divide per memory access address within bank = (address) MOD (number words in bank ) 3=(31)MOD(7) –bank number? words per bank? Easy if both are power of 2

Main MemoryCS510 Computer ArchitecturesLecture Chinese Remainder Theorem As long as two sets of integers a i and b i follow these rules Fast Bank Number b i =(x) MOD (a i ), 0 < b i < a i, 0 < x < a 0 x a 1 x a 2 x... and that a i and a j are co-prime if i  j, then the integer x has only one solution (unambiguous mapping): bank number = b 0 =(x) Mod (a 0 ); number of banks = a 0 (= 3 in ex), 0 < b 0 < a 0 address within a bank = b 1 =(x) Mod (a 1 ); size of a bank = a 1 (= 8 in ex) N words’ addresses 0 to N-1; prime no. of banks(3); words/bank power of 2(8)

Main MemoryCS510 Computer ArchitecturesLecture Fast Bank Numbers Seq. Interleaved Modulo Interleaved Bank Number: Addr in Bank: Bank # = (5) Mod (3) = 2: (5) Mod (8) = 5 5/3 = 1 Address = 5

Main MemoryCS510 Computer ArchitecturesLecture Technique for Higher BW: 5. DRAM Specific Interleaving DRAM access - Row Access(RAS) and Column Access(CAS) Multiple accesses to a RAS buffer: several names (page mode) –64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns New DRAMs to address CPU-DRAM speed gap; what will they cost, will they survive? –Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock –RAMBUS: startup company; reinvent DRAM interface Each Chip acts as a module vs. slice of memory(or bank) Short bus between CPU and chips Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip) Niche memory only? or main memory? –e.g., Video RAM for frame buffers, DRAM + fast serial output

Main MemoryCS510 Computer ArchitecturesLecture Main Memory Summary Wider Memory: for independent access Interleaved Memory: for sequential or independent accesses Avoiding bank conflicts: SW & HW DRAM specific optimizations: page mode & Specialty DRAM