Download presentation
Presentation is loading. Please wait.
1
CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic D: The Memory Hierarchy José Nelson Amaral
2
CMPUT 229 - Computer Organization and Architecture I2 Reading Assignment Bryant, Randal E., O’Hallaron, David, Computer Systems: A Programmer’s Perspective, Prentice Hall, 2003. (B&H) Chapter 6: The Memory Hierarchy
3
CMPUT 229 - Computer Organization and Architecture I3 Types of Memories Read/Write Memory (RWM): the time required to read or write a bit of memory is independent of the bit’s location. once a word is written to a location, it remains stored as long as power is applied to the chip, unless the location is written again. the data stored at each location must be refreshed periodically by reading it and then writing it back again, or else it disappears. we can store and retrieve data. Random Access Memory (RAM): Static Random Access Memory (SRAM): Dynamic Random Access Memory (DRAM):
4
IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3DOUT2DOUT1DOUT0 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3DIN0DIN2DIN1 WE_L CS_L OE_L WR_L IOE_L 011011
5
IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3 WE_L CS_L OE_L WR_L IOE_L 011011
6
IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3 WE_L CS_L OE_L WR_L IOE_L 011011
7
IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DOUT3 3-to-8 decoder 210210 A2 A1 A0 0123456701234567 DIN3 WE_L CS_L OE_L WR_L IOE_L 011011
8
CMPUT 229 - Computer Organization and Architecture I8 Refreshing the Memory Vcap 0V HIGH LOW V CC time 0 stored 1 written refreshes The solution is to periodically refresh the memory cells by reading and writing back each one of them.
9
CMPUT 229 - Computer Organization and Architecture I9 SRAM with Bi-directional Data Bus IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR IN OUT SEL WR DIO3DIO2DIO1DIO0 WE_L CS_L OE_L WR_L IOE_L microprocessor
10
CMPUT 229 - Computer Organization and Architecture I10 DRAM High Level View Cols Rows 0 123 0 1 2 3 Internal row buffer DRAM chip addr data 2/2/ 8/8/ Memory controller (to CPU) Byant/O’Hallaron, pp. 459
11
CMPUT 229 - Computer Organization and Architecture I11 DRAM RAS Request RAS = 2 Cols Rows 0 123 0 1 2 3 Internal row buffer DRAM chip Row 2 addr data 2/2/ 8/8/ Memory controller RAS = Row Address Strobe Byant/O’Hallaron, pp. 460
12
CMPUT 229 - Computer Organization and Architecture I12 DRAM CAS Request Supercell (2,1) Cols Rows 0 123 0 1 2 3 Internal row buffer DRAM chip CAS = 1 addr data 2/2/ 8/8/ Memory controller CAS = Column Address Strobe Byant/O’Hallaron, pp. 460
13
Memory Modules : Supercell (i,j) 03178151623243263394047485556 64-bit double word at main memory address A addr (row = i, col = j) data 64 MB memory module consisting of 8 8Mx8 DRAMs Memory controller bits 0-7 DRAM 7 DRAM 0 bits 8-15 bits 16-23 bits 24-31 bits 32-39 bits 40-47 bits 48-55 bits 56-63 64-bit doubleword to CPU chip Byant/O’Hallaron, pp. 461
14
Step 1: Apply row address 1 Step 2: RAS go from high to low and remain low 2 Step 4: WE must be high 4 Step 3: Apply column address 3 Step 5: CAS goes from high to low and remain low 5 Step 6: OE goes low 6 Step 7: Data appears 7 Step 8: RAS and CAS return to high 8 Read Cycle on an Asynchronous DRAM
15
CMPUT 229 - Computer Organization and Architecture I15 Improved DRAMs Central Idea: Each read to a DRAM actually reads a complete row of bits or word line from the DRAM core into an array of sense amps. A traditional asynchronous DRAM interface then selects a small number of these bits to be delivered to the cache/microprocessor. All the other bits already extracted from the DRAM cells into the sense amps are wasted.
16
CMPUT 229 - Computer Organization and Architecture I16 Fast Page Mode DRAMs In a DRAM with Fast Page Mode, a page is defined as all memory addresses that have the same row address. To read in fast page mode, all the steps from 1 to 7 of a standard read cycle are performed. Then OE and CAS are switched high, but RAS remains low. Then the steps 3 to 7 (providing a new column address, asserting CAS and OE) are performed for each new memory location to be read.
17
A Fast Page Mode Read Cycle on an Asynchronous DRAM
18
CMPUT 229 - Computer Organization and Architecture I18 Enhanced Data Output RAMs (EDO-RAM) The process to read multiple locations in an EDO-RAM is very similar to the Fast Page Mode. The difference is that the output drivers are not disabled when CAS goes high. This distintion allows the data from the current read cycle to be present at the outputs while the next cycle begins. As a result, faster read cycle times are allowed.
19
An Enhanced Data Output Read Cycle on an Asynchronous DRAM
20
CMPUT 229 - Computer Organization and Architecture I20 Synchronous DRAMs (SDRAM) A Synchronous DRAM (SDRAM) has a clock input. It operates in a similar fashion as the fast page mode and EDO DRAM. However the consecutive data is output synchronously on the falling/rising edge of the clock, instead of on command by CAS. How many data elements will be output (the length of the burst) is programmable up to the maximum size of the row. The clock in an SDRAM typically runs one order of magnitude faster than the access time for individual accesses.
21
CMPUT 229 - Computer Organization and Architecture I21 DDR SDRAM A Double Data Rate (DDR) SDRAM is an SDRAM that allows data transfers both on the rising and falling edge of the clock. Thus the effective data transfer rate of a DDR SDRAM is two times the data transfer rate of a standard SDRAM with the same clock frequency.
22
CMPUT 229 - Computer Organization and Architecture I22 The Rambus DRAM (RDRAM) Multiple memory arrays (banks) Rambus DRAMs are synchronous and transfer data on both edges of the clock.
23
CMPUT 229 - Computer Organization and Architecture I23 SDRAM Memory Systems Complex circuits for RAS/CAS/OE. Each DIMM is connected in parallel with the memory controller. (DIMM = Dual In-line Memory Module) Often requires buffering. Needs the whole clock cycle to establish valid data. Making the bus wider is mechanically complicated.
24
CMPUT 229 - Computer Organization and Architecture I24 RDRAM Memory Systems
25
CMPUT 229 - Computer Organization and Architecture I25 Bus Structure Main memory I/O bridge Bus interface ALU Register file CPU System busMemory bus Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Expansion slots for other devices such as network adapters Byant/O’Hallaron, pp. 472
26
CMPUT 229 - Computer Organization and Architecture I26 DMA Request Main memory I/O bridge Bus interface ALU Register file CPU System busMemory bus Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Expansion slots for other devices such as network adapters DMA = Direct Memory Access Byant/O’Hallaron, pp. 473
27
CMPUT 229 - Computer Organization and Architecture I27 DMA Transfer Main memory I/O bridge Bus interface ALU Register file CPU System busMemory bus Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Expansion slots for other devices such as network adapters DMA = Direct Memory Access Byant/O’Hallaron, pp. 473
28
CMPUT 229 - Computer Organization and Architecture I28 DMA Complet. Notification Main memory I/O bridge Bus interface ALU Register file CPU Memory bus Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Expansion slots for other devices such as network adapters DMA = Direct Memory Access Interrupt Byant/O’Hallaron, pp. 474
29
CMPUT 229 - Computer Organization and Architecture I29 Locality We say that a computer program exhibits good locality if the program tends to reference data that is nearby or data that has been referenced recently. Because a program might do one of these things, but not the other, the principle of locality is separated into two flavors: Temporal locality: a memory location that is referenced once is likely to be referenced multiple times in the near future. Spatial locality: if a memory location that is referenced once then locations that are nearby are likely to be referenced in the near future. Byant/O’Hallaron, pp. 478
30
CMPUT 229 - Computer Organization and Architecture I30 Examples In the Sampler function below, RandInt returns a randomly selected integer within the specified interval. Which program has better locality? 1 int SumVec(int v[], int N) 2 { 3 int i; 4 int sum = 0; 5 6 for (i=0 ; i<N ; i=i+1) 7 sum += v[i]; 8 return sum; 9 } 1 int Sampler(int v[], int N, int K) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<K ; i=i+1) 7 { 8 j = RandInt(0,N-1); 9 sum += v[j]; 10 } 11 return sum/K; 12 } Byant/O’Hallaron, pp. 479
31
Memory Hierarchy Larger, slower, and cheaper (per byte) storage devices Registers CPU registers hold words retrieved from cache memory. L0: On-chip L1 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache. L1: Off-chip L2 cache (SRAM) L2 cache holds cache lines retrieved from memory. L2: Main memory (DRAM) Main memory holds disk blocks retrieved from local disks. L3: Local secondary storage (local disks) Local disks hold files retrieved from disks on remote network servers. L4: Remote secondary storage (distributed file systems, Web servers) L5: Smaller, faster, and costlier (per byte) storage devices Byant/ O’Hallaron, pp. 483
32
CMPUT 229 - Computer Organization and Architecture I32 Caching Principle 49143 0123 4567 891011 12131415 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks. Smaller, faster, more expensive device at level k caches a subset of the blocks from level k+1 Data is copied between levels in block-sized transfer units Level k: Level k+1: Byant/O’Hallaron, pp. 484
33
CMPUT 229 - Computer Organization and Architecture I33 Cache Misses Cold Misses, or compulsory misses, occur the first time that a data is referenced. Conflict Misses, occur when two memory references have to occupy the same memory line. It can occur even when the remainder of the cache is not in use. Capacity Misses, occur when there are no more free lines in the cache.
34
CMPUT 229 - Computer Organization and Architecture I34 L1 and L2 Bus System Main memory I/O bridge Bus interfaceL2 cache ALU Register file CPU chip Cache busSystem busMemory bus L1 cache Byant/O’Hallaron, pp. 488
35
Cache Organization B–110 B–110 Valid Tag Set 0: B = 2 b bytes per cache block E lines per set S = 2 s sets t tag bits per line 1 valid bit per line Cache size: C = B x E x S data bytes B–110 B–110 Valid Tag Set 1: B–110 B–110 Valid Tag Set S -1: Byant/O’Hallaron, pp. 488
36
CMPUT 229 - Computer Organization and Architecture I36 Address Partition t bitss bitsb bits 0m-1 TagSet indexBlock offset Address: Compared with tags in the cache to find a match. Used to find the set where the data might be found in the cache. Selects which word, inside the block, is referenced. Byant/O’Hallaron, pp. 488
37
CMPUT 229 - Computer Organization and Architecture I37 Multi-Level Cache Organization Main memory Disk L1 i-cache L1 d-cache Regs L2 unified cache CPU Byant/O’Hallaron, pp. 504
38
CMPUT 229 - Computer Organization and Architecture I38 Writing Cache-Conscious Programs Problem: Write C code for a function that computes the sum of the elements of a two dimensional array, a[M][N], of integers. int SumArray(int a[][], int M, int N) 1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 } 1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
39
CMPUT 229 - Computer Organization and Architecture I39 SumArrayRows Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
40
CMPUT 229 - Computer Organization and Architecture I40 SumArrayRows Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
41
CMPUT 229 - Computer Organization and Architecture I41 SumArrayRows Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
42
CMPUT 229 - Computer Organization and Architecture I42 SumArrayRows Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
43
CMPUT 229 - Computer Organization and Architecture I43 SumArrayCols Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
44
CMPUT 229 - Computer Organization and Architecture I44 SumArrayCols Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
45
CMPUT 229 - Computer Organization and Architecture I45 SumArrayCols Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
46
CMPUT 229 - Computer Organization and Architecture I46 SumArrayCols Data Access Order a[1][2] a[1][3] a[1][4] a[1][5] a[2][0] a[2][1] a[2]2] a[2][3] a[2][4] a[2][5] a[3][0] a[3][1] a[3][2] a[3][3] a[3][4] a[0][0] a[0][1] a[0][2] a[0][3] a[0][4] a[0][5] a[1][0] a[1][1] 0x8000 4000 0x8000 4004 0x8000 4010 0x8000 4024 0x8000 4008 0x8000 4014 0x8000 4028 0x8000 403C 0x8000 400C 0x8000 4018 0x8000 402C 0x8000 4040 0x8000 401C 0x8000 4030 0x8000 4044 0x8000 4050 0x8000 4020 0x8000 4034 0x8000 4048 0x8000 4054 0x8000 4038 0x8000 404C 0x8000 4058 1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 } Byant/O’Hallaron, pp. 508
47
CMPUT 229 - Computer Organization and Architecture I47 Read Bandwidth The rate that a program reads data from the memory system is called the read throughput or the read bandwidth. The read throughput of a program depends on the memory hierarchy level from which the data is retrieved. The read throughput is measured in bytes per second, or more commonly in Mbytes/s. We can write a program to force the data to come from the various levels in the hierarchy to estimate the read throughput.
48
CMPUT 229 - Computer Organization and Architecture I48 Measuring Read Bandwidth 1 int test(int elems, int stride) 2 { 3 int i; 4 int result = 0; 5 volatile int sink; 6 7 for(i=0 ; i<elems ; i += stride) 8 result += data[i]; 9 sink = result; /* to prevent compiler from optimizing away the loop */ 10 } Byant/O’Hallaron, pp. 508
49
Pentium III Xeon Memory Mountain Byant/O’Hallaron, pp. 514
50
Temporal Locality (stride = 1)
51
Spatial Locality Slope (size = 256 KB)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.