Presentation is loading. Please wait.

Presentation is loading. Please wait.

Faculty of Computer Science © 2006 CMPUT 229 Memory Hierarchy Part 1 Refreshing Memory.

Similar presentations


Presentation on theme: "Faculty of Computer Science © 2006 CMPUT 229 Memory Hierarchy Part 1 Refreshing Memory."— Presentation transcript:

1 Faculty of Computer Science © 2006 CMPUT 229 Memory Hierarchy Part 1 Refreshing Memory

2 © 2006 Department of Computing Science CMPUT 229 Reading Assignment Optional: Bryant, Randal E., O’Hallaron, David, Computer Systems: A Programmer’s Perspective, Prentice Hall, 2003. (B&H) Chapter 6: The Memory Hierarchy Required: Sections 8.4 and 12.4 of the Clements textbook.

3 © 2006 Department of Computing Science CMPUT 229 Types of Memories Read/Write Memory (RWM): the time required to read or write a bit of memory is independent of the bit’s location. once a word is written to a location, it remains stored as long as power is applied to the chip, unless the location is written again. the data stored at each location must be refreshed periodically by reading it and then writing it back again, or else it disappears. we can store and retrieve data. Random Access Memory (RAM): Static Random Access Memory (SRAM): Dynamic Random Access Memory (DRAM):

4 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3DOUT2DOUT1DOUT0 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3DIN0DIN2DIN1 WE_L CS_L OE_L WR_L IOE_L 011011

5 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3 WE_L CS_L OE_L WR_L IOE_L 011011

6 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3 WE_L CS_L OE_L WR_L IOE_L 011011

7 IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3 WE_L CS_L OE_L WR_L IOE_L 011011

8 © 2006 Department of Computing Science CMPUT 229 Refreshing the Memory Vcap 0V HIGH LOW V CC time 0 stored 1 written refreshes The solution is to periodically refresh the memory cells by reading and writing back each one of them.

9 © 2006 Department of Computing Science CMPUT 229 SRAM with Bi-directional Data Bus IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DIO3DIO2DIO1DIO0 WE_L CS_L OE_L WR_L IOE_L microprocessor

10 © 2006 Department of Computing Science CMPUT 229 DRAM High Level View Cols Rows 0 123 0 1 2 3 Internal row buffer DRAM chip addr data 2/2/ 8/8/ Memory controller (to CPU) Byant/O’Hallaron, pp. 459

11 © 2006 Department of Computing Science CMPUT 229 DRAM RAS Request RAS = 2 Cols Rows 0 123 0 1 2 3 Internal row buffer DRAM chip Row 2 addr data 2/2/ 8/8/ Memory controller RAS = Row Address Strobe Byant/O’Hallaron, pp. 460

12 © 2006 Department of Computing Science CMPUT 229 DRAM CAS Request Supercell (2,1) Cols Rows 0 123 0 1 2 3 Internal row buffer DRAM chip CAS = 1 addr data 2/2/ 8/8/ Memory controller CAS = Column Address Strobe Byant/O’Hallaron, pp. 460

13 Memory Modules : Supercell (i,j) 03178151623243263394047485556 64-bit double word at main memory address A addr (row = i, col = j) data 64 MB memory module consisting of 8 8Mx8 DRAMs Memory controller bits 0-7 DRAM 7 DRAM 0 bits 8-15 bits 16-23 bits 24-31 bits 32-39 bits 40-47 bits 48-55 bits 56-63 64-bit doubleword to CPU chip Byant/O’Hallaron, pp. 461

14 Step 1: Apply row address 1 Step 2: RAS go from high to low and remain low 2 Step 4: WE must be high 4 Step 3: Apply column address 3 Step 5: CAS goes from high to low and remain low 5 Step 6: OE goes low 6 Step 7: Data appears 7 Step 8: RAS and CAS return to high 8 Read Cycle on an Asynchronous DRAM

15 © 2006 Department of Computing Science CMPUT 229 Improved DRAMs Central Idea: Each read to a DRAM actually reads a complete row of bits or word line from the DRAM core into an array of sense amps. A traditional asynchronous DRAM interface then selects a small number of these bits to be delivered to the cache/microprocessor. All the other bits already extracted from the DRAM cells into the sense amps are wasted.

16 © 2006 Department of Computing Science CMPUT 229 Fast Page Mode DRAMs In a DRAM with Fast Page Mode, a page is defined as all memory addresses that have the same row address. To read in fast page mode, all the steps from 1 to 7 of a standard read cycle are performed. Then OE and CAS are switched high, but RAS remains low. Then the steps 3 to 7 (providing a new column address, asserting CAS and OE) are performed for each new memory location to be read.

17 A Fast Page Mode Read Cycle on an Asynchronous DRAM

18 © 2006 Department of Computing Science CMPUT 229 Enhanced Data Output RAMs (EDO-RAM) The process to read multiple locations in an EDO-RAM is very similar to the Fast Page Mode. The difference is that the output drivers are not disabled when CAS goes high. This distintion allows the data from the current read cycle to be present at the outputs while the next cycle begins. As a result, faster read cycle times are allowed.

19 An Enhanced Data Output Read Cycle on an Asynchronous DRAM

20 © 2006 Department of Computing Science CMPUT 229 Synchronous DRAMs (SDRAM) A Synchronous DRAM (SDRAM) has a clock input. It operates in a similar fashion as the fast page mode and EDO DRAM. However the consecutive data is output synchronously on the falling/rising edge of the clock, instead of on command by CAS. How many data elements will be output (the length of the burst) is programmable up to the maximum size of the row. The clock in an SDRAM typically runs one order of magnitude faster than the access time for individual accesses.

21 © 2006 Department of Computing Science CMPUT 229 DDR SDRAM A Double Data Rate (DDR) SDRAM is an SDRAM that allows data transfers both on the rising and falling edge of the clock. Thus the effective data transfer rate of a DDR SDRAM is two times the data transfer rate of a standard SDRAM with the same clock frequency.

22 © 2006 Department of Computing Science CMPUT 229 The Rambus DRAM (RDRAM) Multiple memory arrays (banks) Rambus DRAMs are synchronous and transfer data on both edges of the clock.

23 © 2006 Department of Computing Science CMPUT 229 SDRAM Memory Systems Complex circuits for RAS/CAS/OE. Each DIMM is connected in parallel with the memory controller. (DIMM = Dual In-line Memory Module) Often requires buffering. Needs the whole clock cycle to establish valid data. Making the bus wider is mechanically complicated.

24 © 2006 Department of Computing Science CMPUT 229 RDRAM Memory Systems

25 © 2006 Department of Computing Science CMPUT 229 Locality We say that a computer program exhibits good locality if the program tends to reference data that is nearby or data that has been referenced recently. Because a program might do one of these things, but not the other, the principle of locality is separated into two flavors: Temporal locality: a memory location that is referenced once is likely to be referenced multiple times in the near future. Spatial locality: if a memory location that is referenced once then locations that are nearby are likely to be referenced in the near future. Byant/O’Hallaron, pp. 478

26 © 2006 Department of Computing Science CMPUT 229 Examples In the Sampler function below, RandInt returns a randomly selected integer within the specified interval. Which program has better locality? 1 int SumVec(int v[], int N) 2 { 3 int i; 4 int sum = 0; 5 6 for (i=0 ; i<N ; i=i+1) 7 sum += v[i]; 8 return sum; 9 } 1 int Sampler(int v[], int N, int K) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<K ; i=i+1) 7 { 8 j = RandInt(0,N-1); 9 sum += v[j]; 10 } 11 return sum/K; 12 } Byant/O’Hallaron, pp. 479

27 Memory Hierarchy Larger, slower, and cheaper (per byte) storage devices Registers CPU registers hold words retrieved from cache memory. L0: On-chip L1 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache. L1: Off-chip L2 cache (SRAM) L2 cache holds cache lines retrieved from memory. L2: Main memory (DRAM) Main memory holds disk blocks retrieved from local disks. L3: Local secondary storage (local disks) Local disks hold files retrieved from disks on remote network servers. L4: Remote secondary storage (distributed file systems, Web servers) L5: Smaller, faster, and costlier (per byte) storage devices Byant/ O’Hallaron, pp. 483

28 © 2006 Department of Computing Science CMPUT 229 Caching Principle 49143 0123 4567 891011 12131415 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks. Smaller, faster, more expensive device at level k caches a subset of the blocks from level k+1 Data is copied between levels in block-sized transfer units Level k: Level k+1: Byant/O’Hallaron, pp. 484

29 © 2006 Department of Computing Science CMPUT 229 Cache Misses Cold Misses, or compulsory misses, occur the first time that a data is referenced. Conflict Misses, occur when two memory references have to occupy the same memory line. It can occur even when the remainder of the cache is not in use. Capacity Misses, occur when there are no more free lines in the cache.

30 © 2006 Department of Computing Science CMPUT 229 Simplest Cache: Direct Mapped Memory 4 Byte Direct Mapped Cache Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F Cache Index 0 1 2 3  Location 0 can be occupied by data from: –Memory location 0, 4, 8,... etc. –In general: any memory location whose 2 LSBs of the address are 0s –Address => cache index  Which one should we place in the cache?  How can we tell which one is in the cache?

31 © 2006 Department of Computing Science CMPUT 229 1 KB Direct Mapped Cache, 32B blocks  For a 2 ** N byte cache: –The uppermost (32 - N) bits are always the Cache Tag –The lowest M bits are the Byte Select (Block Size = 2 ** M) Cache Index 0 1 2 3 : Cache Data Byte 0 0431 : Cache Tag Example: 0x50Ex: 0x01 0x50 Cache Tag is Stored as part of the cache “state” Valid Bit : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Byte Select Ex: 0x00 9

32 © 2006 Department of Computing Science CMPUT 229 Direct-mapped Cache Clements pp. 346

33 © 2006 Department of Computing Science CMPUT 229 Identifying sets in Direct-mapped Caches Clements pp. 347

34 © 2006 Department of Computing Science CMPUT 229 Operation of a Direct-mapped Cache Clements pp. 348

35 © 2006 Department of Computing Science CMPUT 229 Full-Associative Cache Clements pp. 348

36 © 2006 Department of Computing Science CMPUT 229 Two-way Set Associative Cache  N-way set associative: N entries for each Cache Index –N direct mapped caches operates in parallel (N typically 2 to 4)  Example: Two-way set associative cache –Cache Index selects a “set” from the cache –The two tags in the set are compared in parallel –Data is selected based on the tag result Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

37 © 2006 Department of Computing Science CMPUT 229 Set associative-mapped cache Clements pp. 349

38 © 2006 Department of Computing Science CMPUT 229 L1 and L2 Bus System Main memory I/O bridge Bus interfaceL2 cache ALU Register file CPU chip Cache busSystem busMemory bus L1 cache Byant/O’Hallaron, pp. 488

39 Cache Organization B–110 B–110 Valid Tag Set 0: B = 2 b bytes per cache block E lines per set S = 2 s sets t tag bits per line 1 valid bit per line Cache size: C = B x E x S data bytes B–110 B–110 Valid Tag Set 1: B–110 B–110 Valid Tag Set S -1: Byant/O’Hallaron, pp. 488

40 © 2006 Department of Computing Science CMPUT 229 Address Partition t bitss bitsb bits 0m-1 TagSet indexBlock offset Address: Compared with tags in the cache to find a match. Used to find the set where the data might be found in the cache. Selects which word, inside the block, is referenced. Byant/O’Hallaron, pp. 488

41 © 2006 Department of Computing Science CMPUT 229 Multi-Level Cache Organization Main memory Disk L1 i-cache L1 d-cache Regs L2 unified cache CPU Byant/O’Hallaron, pp. 504


Download ppt "Faculty of Computer Science © 2006 CMPUT 229 Memory Hierarchy Part 1 Refreshing Memory."

Similar presentations


Ads by Google