Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Systems – the impact of caches

Similar presentations


Presentation on theme: "Computer Systems – the impact of caches"— Presentation transcript:

1 Computer Systems – the impact of caches
Arnoud Visser Computer Systems – the impact of caches

2 Computer Systems – the impact of caches
Introduction Different sorts of memory On-die /1/10 cycles On-board 100 On-disk Off-machine Arnoud Visser Computer Systems – the impact of caches

3 Computer Systems – the impact of caches
The CPU-Memory Gap The increasing gap between disk, DRAM and SRAM, CPU speeds. Arnoud Visser Computer Systems – the impact of caches

4 Storage Trends bigger, not faster
metric :1980 $/MB ,000 access (ms) typical size (MB) ,000 9,000 9,000 Disk metric :1980 $/MB 8, ,000 access (ns) typical size (MB) ,000 DRAM (Culled from back issues of Byte and PC Magazine) Arnoud Visser Computer Systems – the impact of caches

5 Processor trends faster
metric :1980 $/MB 19,200 2, access (ns) typical size (MB) SRAM :1980 processor Pent P-III clock rate (MHz) cycle time (ns) 1, Arnoud Visser Computer Systems – the impact of caches

6 Intel Processors Cache SRAM
486 8K - Pentium 1993 8 K Pentium Pro 256K-1M Pentium II 1997 16 K 512K ½ Celeron A 1998 128K Pentium III Coppermine 2000 256K Pentium 4 Willamette 12 K Pentium 4 Northwood 2002 512K The original Pentium II was code-named "Klamath". It ran at a paltry 66 MHz bus speed and ranged from 233MHz to 300MHz. In 1998, Intel did some slight re-working of the processor and released "Deschutes". They used a 0.25 micron design technology for this one, and allowed a 100MHz system bus. The L2 cache was still separate from the actual processor core and still ran at only half speed. They would not rectify this issue until the release of the Celeron A and Pentium III. Deschutes ran from 333MHz to up to 450 MHz. Intel released the Pentium III "Katmai" processor in February of 1999, running at 450 MHz on a 100MHz bus. Katmai introduced the SSE instruction set, and had still a cache of 512 k at half speed. Coppermine had only 256 KB. But, the cache was located directly on the CPU core rather than on the daughtercard as typified in previous Slot 1 processors. Coppermine also supported the 133 MHz front side bus. Coppermine proved to be a performance chip and it was and still is used by many PCs. Coppermine eventually saw 1+ GHz. There is also the risk that a larger L2 cache with the same CPU architecture will not provide the desired boost in speed. This is a good place to mention the transition from the Willamette P4 to the Northwood P4. Doubling the L2 cache from 256 KB to 512 KB alone did not provide the big increase in performance. It was only possible to enhance performance in combination with a higher bandwidth (FSB speed). Arnoud Visser Computer Systems – the impact of caches 3

7 Memory Hierarchy L0: L1: L2: L3: L4: L5: Arnoud Visser
Registers On-chip L1 cache (SRAM) Main memory (DRAM) Local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices Remote secondary storage (distributed file systems, Web servers) Local disks hold files retrieved from disks on remote network servers. Main memory holds disk blocks retrieved from local disks. Off-chip L2 L1 cache holds cache lines retrieved from the L2 cache. CPU registers hold words retrieved from cache memory. L2 cache holds cache lines retrieved from memory. L0: L1: L2: L3: L4: L5: Smaller, faster, costlier Arnoud Visser Computer Systems – the impact of caches

8 Computer Systems – the impact of caches
Pay the price To access large amounts of data in a cost-effective manner, the bulk of the data must be stored on disk 1GB: ~$200 80 GB: ~$110 4 MB: ~$500 Disk DRAM SRAM Arnoud Visser Computer Systems – the impact of caches

9 Computer Systems – the impact of caches
Locality Principle of Locality: Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves. Temporal locality: Recently referenced items are likely to be referenced in the near future. Spatial locality: Items with nearby addresses tend to be referenced close together in time. Arnoud Visser Computer Systems – the impact of caches

10 Computer Systems – the impact of caches
Arnoud Visser Computer Systems – the impact of caches

11 Computer Systems – the impact of caches
Locality Example sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Data Reference array elements in succession (stride-1 reference pattern): Reference sum each iteration: Instructions Reference instructions in sequence: Cycle through loop repeatedly: Spatial locality Temporal locality Spatial locality Temporal locality Arnoud Visser Computer Systems – the impact of caches

12 Computer Systems – the impact of caches
Power Programmer Claim: Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer. Good locality? int sumarrayrows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum } Arnoud Visser Computer Systems – the impact of caches

13 Computer Systems – the impact of caches
Stride-M example Question: Does this function have good locality? int sumarraycols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum } Arnoud Visser Computer Systems – the impact of caches

14 Computer Systems – the impact of caches
Matrix M=2,N=3 int sumarrowrows() Adress 4 8 12 16 20 Contents a00 a01 a02 a10 a11 a12 Acces order 1 2 3 5 6 int sumarrowcols() Adress 4 8 12 16 20 Contents a00 a01 a02 a10 a11 a12 Acces order 1 3 5 2 6 Arnoud Visser Computer Systems – the impact of caches

15 Expect: Stride-1 is better!
int A[2][4] Arnoud Visser Computer Systems – the impact of caches

16 Reality: small matrices fit in cache
int A[32][32] Arnoud Visser Computer Systems – the impact of caches

17 Reality: Performance-drop cache L2 / L1 not dramatic
int A[180][180] Arnoud Visser Computer Systems – the impact of caches

18 Reality: Only when DRAM is accessed, the penalty can be seen
int A[512][512] Arnoud Visser Computer Systems – the impact of caches

19 Computer Systems – the impact of caches
Memory Mountain Arnoud Visser Computer Systems – the impact of caches

20 Computer Systems – the impact of caches
Summary As long as your data fits in the cache, and your program shows good locality, good performance is guaranteed. Arnoud Visser Computer Systems – the impact of caches

21 Computer Systems – the impact of caches
Assignment Practice Problem 6.9 (p. 624): 'Order three functions to the spatial locality enjoyed by each.' Practice Problem 6.22 (p. 659): 'Estimate the time, in CPU cycles, to read a 8-byte word, from the different L1-d of a i7 processor Arnoud Visser Computer Systems – the impact of caches


Download ppt "Computer Systems – the impact of caches"

Similar presentations


Ads by Google