In1210/01-PDS 1 TU-Delft The Memory System. in1210/01-PDS 2 TU-Delft Organization 0123 4567 89...... Word Address Byte Address 0 1 2 3.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Memory hierarchy and paging Electronic Computers M.
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
OS Spring ‘04 Paging and Virtual Memory Operating Systems Spring 2004.
CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Memory Management -3 CS 342 – Operating Systems Ibrahim Korpeoglu Bilkent.
Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
Memory Organization.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 The Memory System (Chapter 5)
VIRTUAL MEMORY. Virtual memory technique is used to extents the size of physical memory When a program does not completely fit into the main memory, it.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Architecture Lecture 28 Fasih ur Rehman.
Lecture 19: Virtual Memory
IT253: Computer Organization
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
Computer Architecture Lecture 26 Fasih ur Rehman.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
4.3 Virtual Memory. Virtual memory  Want to run programs (code+stack+data) larger than available memory.  Overlays programmer divides program into pieces.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Page Table Implementation. Readings r Silbershatz et al:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
Cache Memory Yi-Ning Huang. Principle of Locality Principle of Locality A phenomenon that the recent used memory location is more likely to be used again.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Cache Memory.
CSE 351 Section 9 3/1/12.
The Memory System (Chapter 5)
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Memory and cache CPU Memory I/O.
Lecture 28: Virtual Memory-Address Translation
Memory and cache CPU Memory I/O.
Module IV Memory Organization.
Module IV Memory Organization.
Chap. 12 Memory Organization
Cache Memory.
M. Usha Professor/CSE Sona College of Technology
Translation Buffers (TLB’s)
Virtual Memory Overcoming main memory size limitation
Contents Memory types & memory hierarchy Virtual memory (VM)
Main Memory Background
Sarah Diesburg Operating Systems CS 3430
Review What are the advantages/disadvantages of pages versus segments?
4.3 Virtual Memory.
Overview Problem Solution CPU vs Memory performance imbalance
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

in1210/01-PDS 1 TU-Delft The Memory System

in1210/01-PDS 2 TU-Delft Organization Word Address Byte Address

in1210/01-PDS 3 TU-Delft Connection Memory-CPU Memory CPU Read/Write MFC Address Data MAR MDR

in1210/01-PDS 4 TU-Delft Memory l Addressable number of bits l Different orderings l Speed up techniques -Cache memories -Memory interleaving l Enlargement -Virtual memory

in1210/01-PDS 5 TU-Delft Organisation(1) sense/wr W0 W1 W15 FF Address decoder input/output lines b7b1b0 R/W CS A0 A1 A2 A3 b1

in1210/01-PDS 6 TU-Delft Pinning Total pins required for 16x8 memory: 16 l 4 address lines l 8 data lines l 2 control lines l 2 power lines

in1210/01-PDS 7 TU-Delft 32 by 32 memory array W0 W K by 1 memory 5-bit deco- der 10-bit address lines two 32-to-1 multiplexors inout

in1210/01-PDS 8 TU-Delft Pinning Total number of pins required: 16 l 10 address lines l 2 data lines (in/out) l 2 control lines l 2 power lines For 128 by 8 memory: 19 pins ( )

in1210/01-PDS 9 TU-Delft Multiple Modules(1) Address in Module m bits CS address Module n-1 CS address Module i CS address Module 0 Module k bits MM address Block-wise organization

in1210/01-PDS 10 TU-Delft Multiple Modules(2) CS address Module 2**k-1 CS address Module i CS address Module 0 Module k bits Address in Module m bits MM address Interleaving organization

in1210/01-PDS 11 TU-Delft Question? l What is the advantage of the interleaved organization? l What the disadvantage?

in1210/01-PDS 12 TU-Delft Memory Hierarchy increasing size increasing speed increasing cost Disks Main Memory Secondary cache Primary cache CPU

in1210/01-PDS 13 TU-Delft Caches(1) l Problem: Main Memory is slower than CPU registers (factor of 5-10) l Solution: Fast and small memory between CPU and Main Memory l Contains: recent references memory locations CPU Cache Main Memory

in1210/01-PDS 14 TU-Delft Caches(2) l Works because of locality principle l Profit: -Cache hit ratio: h -Access time cache: c -Cache miss ratio: 1-h -Access time main memory: m -Mean access time: h.c + (1-h).m l Cache is transparent to programmer

in1210/01-PDS 15 TU-Delft Caches(3) l At READ operation -If not in cache, get block in cache and read out cache (possibly read-through) -If in cache, read out cache l At WRITE operation -If not in cache, write in main memory -If in cache, write in cache, and: »write in main memory (store through) »set modified (dirty) bit

in1210/01-PDS 16 TU-Delft Caches(3a) Borrow books from library, store according to first letter of first author name in 26 locations l Direct mapped: separate location for a single book for each letter l Associative: any book can go to any of the 26 locations l Set-associative: 2 locations for letters A-B, C-D, E-F, etc

in1210/01-PDS 17 TU-Delft Caches(4) l Suppose -Main Memory is N = 2 n bytes -Divided in blocks of b = 2 k bytes -Cache: 128 blocks -e.g. n=16, k=4, b=16 l Every block in cache has valid bit (is reset when memory is modified) l At context switch: invalidate cache

in1210/01-PDS 18 TU-Delft Direct Mapped Cache(1) l A block in memory (j) can only be at one place in cache (j mod #cache blocks) l Place determined by block number l Memory address: 574 tagblockword main memory address

in1210/01-PDS 19 TU-Delft Direct Mapped Cache(1) BLOCK BLOCK 127 BLOCK 128 BLOCK BLOCK 255 BLOCK 256 tag 5 bits tag BLOCK 0 BLOCK 1 BLOCK 2 CACHE

in1210/01-PDS 20 TU-Delft Direct Mapped Cache(1) BLOCK 0 BLOCK BLOCK 127 BLOCK 128 BLOCK BLOCK 255 BLOCK 256 tag 5 bits tag BLOCK 0 BLOCK 1 BLOCK 2 CACHE

in1210/01-PDS 21 TU-Delft Associative(1) l Each block can be at any place in cache l At cache entry: parallel (associative) match of tag in address with tags in all cache entries l Associative: slower, more expensive, higher hit ratio 124 tagword main memory address

in1210/01-PDS 22 TU-Delft Associative(2) BLOCK 0 BLOCK BLOCK 127 BLOCK 128 BLOCK BLOCK 255 BLOCK 256 tag 12- bits BLOCK blocks tag BLOCK 1 tag BLOCK 2 tag BLOCK 3 tag BLOCK 4

in1210/01-PDS 23 TU-Delft Set-Associative(1) l Combination of direct mapped and associative l Cache consists of sets l Each set is associative l One block can only be placed in one set; determined by set number 664 tagsetword main memory address

in1210/01-PDS 24 TU-Delft Set-Associative(2) BLOCK 0 BLOCK BLOCK 127 BLOCK 128 BLOCK BLOCK 255 BLOCK 256 tag 6- bits BLOCK blocks tag BLOCK 1 tag BLOCK 2 tag BLOCK 3 tag BLOCK 4 set 0 set 1

in1210/01-PDS 25 TU-Delft Set-Associative(2) BLOCK 0 BLOCK BLOCK 127 BLOCK 128 BLOCK BLOCK 255 BLOCK 256 tag 6- bits BLOCK blocks tag BLOCK 1 tag BLOCK 2 tag BLOCK 3 tag BLOCK 4 set 0 set 1

in1210/01-PDS 26 TU-Delft Question? l Main memory: 4 GByte l Cache: 512 blocks of 64 byte l Cache: 8-way set-associative l How many bits is the: -byte address within a block -set number -tag

in1210/01-PDS 27 TU-Delft Answer! l Main memory: 4 GByte, so 32-bits address l Blocks of 64 byte, so 6-bits byte address l 8-way set-associative cache with 512 blocks, so 512/8=64 sets, so 6-bits set number l So, =20-bits tag

in1210/01-PDS 28 TU-Delft Replacement(1) (Set) associative replacement algorithms: l Least Recently Used (LRU) -At 2 k blocks per set, implement with k-bit counters per block -Hit: increase lower counters than referenced with 1, set counter at 0 -Miss and set not full: replace, set counter new block 0, increase rest -Miss and set full: replace counter with value 2 k -1, set counter new block at 0, increase rest

in1210/01-PDS 29 TU-Delft Example k=2 HIT

in1210/01-PDS 30 TU-Delft Example k=2 EMPTY MISS AND SET NOT FULL

in1210/01-PDS 31 TU-Delft Example k=2 MISS AND SET FULL

in1210/01-PDS 32 TU-Delft Replacement(2) l Replace oldest block l Random replacement

in1210/01-PDS 33 TU-Delft Program example int SUM = 0; for(j=0, j<10, j++) { SUM =SUM + A[0,j]; { AVE = SUM/10; for(i=9, i>-1, i--){ A[0,i] = A[0,i]/AVE } Normalize elements of first row of A

in1210/01-PDS 34 TU-Delft Example cache BLOCK 0 tag BLOCK 1 tag BLOCK 2 tag BLOCK 3 tag BLOCK 4 tag BLOCK 5 tag BLOCK 6 tag BLOCK 7 tag CACHE with 8 blocks, each block 1 word, LRU replacement Set 0 Set tagblock direct 16 tag associative 151 tagset associative

in1210/01-PDS 35 TU-Delft Examples(2) Tag direct Tag set-associative Tag associative a(0,0) a(1,0) a(2,0) a(3,0).... a(0,9) a(1,9) a(2,9) a(3,9) Memory address 4x10 array column order 7A00

in1210/01-PDS 36 TU-Delft Direct mapped a[0,0]a[0,2]a[0,4]a[0,6]a[0,8]a[0,6]a[0,4]a[0,2]a[0,0] j=1j=3j=5j=7j=9i=6i=4i=2i= block pos. Contents of cache after pass: a[0,1]a[0,3]a[0,5]a[0,7]a[0,9]a[0,7]a[0,5]a[0,3]a[0,1] = miss = hit

in1210/01-PDS 37 TU-Delft Associative a[0,0]a[0,8] a[0,0] j=7j=8j=9i=1i=0 a[0,1] a[0,9]a[0,1] a[0,2] a[0,3] a[0,4] a[0,5] a[0,6] a[0,7] block pos.

in1210/01-PDS 38 TU-Delft Set-associative a[0,0]a[0,4]a[0,8]a[0,4] j=3j=7j=9i=4i=2 a[0,1]a[0,5]a[0,9]a[0,5] a[0,2]a[0,6] a[0,2] a[0,3]a[0,7] a[0,3] block pos. a[0,0] i=0 a[0,1] a[0,2] a[0,3] set 0

in1210/01-PDS 39 TU-Delft PowerPC l PowerPC 604 l Data and Instruction cache l Caches are 16 K bytes l Four-way set associative l 128 sets, each with 4 blocks, each block 8 words of 32 bits

in1210/01-PDS 40 TU-Delft Example Block 0 00BA2st Block 1 Block 2 Block 3 003F4st address F4008 Set 0 =? no yes

in1210/01-PDS 41 TU-Delft Virtual Memory(1) l Problem: if compiled program does not fit into memory l Solution: Virtual memory, where the logical address space is larger than the physical address space l Logical address space: Addresses referable by instructions l Physical address space: Addresses referable in real machine

in1210/01-PDS 42 TU-Delft Virtual Memory(2) l For realizing virtual memory, we need an address conversion: a m = f(a v ) l a m is physical address (machine address) l a v is virtual address l This is generally done by hardware

in1210/01-PDS 43 TU-Delft Organization Processor MMU Cache Main Memory Disk Storage amam amam avav data DMA transfer

in1210/01-PDS 44 TU-Delft Address translation l Basic approach is to partition both physical address space and virtual address space in equally sized blocks called pages l A virtual address is composed of a page number and a word within a page, called off-set

in1210/01-PDS 45 TU-Delft Page tables virtual page numberoffset page frameoffset page table address + virtual address from processorpage table base register physical address from processor control bits page

in1210/01-PDS 46 TU-Delft Associative TBL virtual page numberoffset virtual address from processor page frameoffset physical address from processor virtual pagereal page = ? Hit Miss control bits TLB

in1210/01-PDS 47 TU-Delft Policies l Number of pages in main memory: resident set l Mechanism works because of principle of locality l Acceleration: recent address translations in separate cache

in1210/01-PDS 48 TU-Delft Replacement l Page replacement algorithms l Protection possible through page table register l Sharing possible through page table l Hardware support: Memory Management Unit (MMU)

in1210/01-PDS 49 TU-Delft Question? l Main memory: 256 MByte l Maximal virtual-address space: 4 GByte l Page size: 4 KByte l How many bits is the -offset within a page -virtual page frame number -(physical) page frame number

in1210/01-PDS 50 TU-Delft Answer! l Physical address: 8+20=28 bits l Virtual address: 32 bits l Offset in a page: 12 bits l Virtual page frame number: 32-12=20 bits l Physical page frame number: 28-12=16 bits