Download presentation
Presentation is loading. Please wait.
Published byCora Richard Modified over 7 years ago
1
Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions) on pipelining are posted on the web page. Last year’s practice exam, and last year’s midterm (with solutions) are on web site (under “Exams”)
2
Main Memory Background
Performance of Main Memory: Latency: Cache Miss Penalty Access Time: time between request and word arrives Cycle Time: time between requests Bandwidth: I/O & Large Block Miss Penalty (L2) Main Memory is DRAM: Dynamic Random Access Memory Dynamic since needs to be refreshed periodically (8 ms, 1% time) Addresses divided into 2 halves (Memory as a 2D matrix): RAS or Row Access Strobe CAS or Column Access Strobe Cache uses SRAM: Static Random Access Memory No refresh (6 transistors/bit vs. 1 transistorSize: DRAM/SRAM 4-8, Cost & Cycle time: SRAM/DRAM 8-16
3
DRAM Organization Row and Column separate, because pins/packaging expensive. So address bus / 2. RAS (Row Access Strobe) typically first. Some allows multiple CAS for same RAS (page mode). Refresh: Write after read (wipes out data), Refresh: Periodic. Read each row every 8ms. Cost is O(sqrt(capacity)).
4
4 Key DRAM Timing Parameters
tRAC: minimum time from RAS line falling to the valid data output. Quoted as the speed of a DRAM when buy A typical 4Mb DRAM tRAC = 60 ns Speed of DRAM since on purchase sheet? tRC: minimum time from the start of one row access to the start of the next. tRC = 110 ns for a 4Mbit DRAM with a tRAC of 60 ns tCAC: minimum time from CAS line falling to valid data output. 15 ns for a 4Mbit DRAM with a tRAC of 60 ns tPC: minimum time from the start of one column access to the start of the next. 35 ns for a 4Mbit DRAM with a tRAC of 60 ns
5
Example Memory Performance
Single access RAS 60ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
6
Example Memory Performance
Single access The Bad News RAS RAS 60ns 110ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
7
Example Memory Performance
Single access RAS CAS RAS 60ns 110ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
8
Example Memory Performance
Single access RAS CAS RAS 60ns 110ns 15ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
9
Example Memory Performance
Single access RAS CAS RAS 60ns 110ns 15ns Wait Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
10
Example Memory Performance
Single access RAS CAS CAS RAS 60ns 110ns 15ns Wait 35ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
11
Example Memory Performance
Multiple accesses RAS CAS CAS RAS 60ns The Good News 110ns 15ns Wait 35ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
12
Example Memory Performance
Multiple accesses RAS CAS CAS CAS CAS CAS 60ns 110ns 15ns Wait 15ns 15ns 35ns 35ns 35ns But refresh is a RAS... Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns
13
DRAM Performance A 60 ns (tRAC) DRAM can
perform a row access only every 110 ns (tRC) perform column access (tCAC) in 15 ns, but time between column accesses is at least 35 ns (tPC). In practice, external address delays and turning around buses make it 40 to 50 ns These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead!
14
DRAM Trends DRAMs: capacity +60%/yr, cost –30%/yr
2.5X cells/area, 1.5X die size in _3 years ‘98 DRAM fab line costs $2B Commodity, second source industry => high volume, low profit, conservative Order of importance: 1) Cost/bit 2) Capacity First RAMBUS: 10X BW, +30% cost => little impact Gigabit DRAM will take over market
15
Main Memory Performance
Simple: CPU, Cache, Bus, Memory same width (32 or 64 bits) Wide: CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits; UtraSPARC 512) Interleaved: CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved (logically “wide”).
16
Why not have wider memory?
Pins, packaging CPU accesses word at a time. Need multiplexer in critical path. Unit of expansion. ECC (need to read full ECC block on every write to portion of block).
17
Main Memory Performance
Timing model (word size is 32 bits) 1 to send address, 6 access time, 1 to send data Cache Block is 4 words Simple M.P = 4 x (1+6+1) = 32 Wide M.P = = 8 Interleaved M.P. = x1 = 11
18
Main Memory Performance
Timing model (word size is 32 bits) 1 to send address, 6 access time, 2 (or more) to send data Cache Block is 4 words Interleaved M.P. = x1 = 11 Independent reads or writes: don’t need to wait as long as next op is to different bank.
19
Independent Memory Banks
Memory banks for independent accesses vs. faster sequential accesses Multiprocessor I/O CPU with Hit under n Misses, Non-blocking Cache Superbank: all memory active on one block transfer (or Bank) Bank: portion within a superbank that is word interleaved (or Subbank) … Superbank Bank
20
Independent Memory Banks
How many banks? number banks >= number clocks to access word in bank For sequential accesses, otherwise will return to original bank before it has next word ready Increasing DRAM => fewer chips => harder to have banks
21
DRAMs per PC over Time DRAM Generation ‘86 ‘89 ‘92 ‘96 ‘99 ‘02
‘86 ‘89 ‘92 ‘96 ‘99 ‘02 1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb 32 8 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 16 4 8 2 4 1 Minimum Memory Size 8 2 4 1 8 2
22
Fast Memory Systems: DRAM specific
Multiple CAS accesses: several names (page mode) Extended Data Out (EDO): 30% faster in page mode New DRAMs to address gap; RAMBUS: reinvent DRAM interface Each Chip a module vs. slice of memory Short bus between CPU and chips Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip) Synchronous DRAM: 2 banks on chip, a clock signal to DRAM, transfer synchronous to system clock ( MHz) RAMBUS first seen as niche (e.g. video memory), now poised to become standard.
23
Main Memory Organization DRAM/Disk interface
24
Four Questions for Memory Hierarchy Designers
Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)
25
Four Questions Applied to Virtual Memory
Q1: Where can a block be placed in the upper level? (Block placement: fully-associative vs. page coloring) Q2: How is a block found if it is in the upper level? (Block identification:translation & lookup, page-tables and TLB) Q3: Which block should be replaced on a miss? (Block replacement: random vs. LRU) Q4: What happens on a write? (Write strategy: copy-on-write, protection, etc.) Q?: protection? demand-load vs. prefetch? fixed vs. variable size? unit of transfer vs. frame size. Software
26
Paging Organization size of information blocks that are transferred from secondary to main storage (M) virtual and physical address space partitioned into blocks of equal size page frames pages disk mem cache reg pages frame
27
Addresses in Virtual Memory
3 addresses to consider Physical address: where in main memory frame is stored Virtual address: a logical address, relative to process/name space/page table Disk address: specifying where on disk page is stored. Disk addresses can be either physical (specifying cyclinder, block, etc.) or indirect (another level of naming --- even file system or segment). Virtual addresses logically include a process_id concatenated to n-bit address.
28
Virtual Address Space and Physical Address Space sizes
From point of view of hierarchy, the disk will have more capacity than DRAM BUT, this does not mean that virtual address space will be bigger than physical address space. Virtual addresses provide protection and a naming mechanism. A long, long, time ago, some machines had physical address space bigger than virtual address space, and more core than vaddr space. (Multiple processes in memory at same time).
29
Address Map V = {0, 1, . . . , n - 1} virtual address space
M = {0, 1, , m - 1} physical address space MAP: V --> M U {0} address mapping function n > m, (n=m, n<m history) MAP(a) = a' if data at virtual address a is present in physical address a' and a' in M = 0 if data at virtual address a is not present in M a missing item fault Name Space V fault handler Processor Addr Trans Mechanism Main Memory Secondary Memory a a' physical address OS performs this transfer
30
Paging Organization + V.A. P.A. unit of mapping frame 0 1K Addr Trans
frame 0 1K Addr Trans MAP page 0 1K 1024 1 1K 1024 1 1K also unit of transfer from virtual to physical memory 7168 7 1K Physical Memory 31744 31 1K Virtual Memory Address Mapping 10 VA page no. disp Page Table Page Table Base Reg Access Rights V actually, concatenation is more likely PA + index into page table table located in physical memory physical memory address
31
Virtual Address and a Cache
VA PA miss Trans- lation Cache Main Memory CPU hit Virtually Addressed Caches Revisited data It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as possible ASIDE: Why access cache with PA at all? VA caches have a problem! synonym / alias problem: two different virtual addresses map to same physical address => two different cache entries holding data for the same physical address! for update: must update all cache entries with same physical address or memory becomes inconsistent determining this requires significant hardware, essentially an associative lookup on the physical address tags to see if you have multiple hits; or software enforced alias boundary: same lsb of VA &PA > cache size
32
TLBs A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Address Physical Address Dirty Ref Valid Access Really just a cache on the page table mappings TLB access time comparable to cache access time (much less than main memory access time)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.