Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality)

Similar presentations


Presentation on theme: "Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality)"— Presentation transcript:

1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality) Prof. Dr. M. Ashraf Chughtai

2 Today’s Topics Recap: Storage trends and memory hierarchy Concept of Cache Memory Principle of Locality Cache Addressing Techniques RAM vs. Cache Transaction Summary MAC/VU-Advanced Computer Architecture2 Lecture 26 Memory Hierarchy (2)

3 Recap: Storage Devices Design features of semiconductor memories SRAM DRAM magnetic disk storage MAC/VU-Advanced Computer Architecture3 Lecture 26 Memory Hierarchy (2)

4 Recap: Speed and Cost per byte – –DRAM is slow but cheap relative to SRAM – –Main memory of the processor to hold moderately large amount of data and instructions – – Disk storage is slowest and cheapest – –secondary storage to hold bulk of data and instructions MAC/VU-Advanced Computer Architecture4 Lecture 26 Memory Hierarchy (2)

5 Recap: CPU-Memory Access-Time –The gap between the speed of DRAM and Disk with respect to the speed of processor, as compared to that of the SRAM, is increasing very fast with time MAC/VU-Advanced Computer Architecture 5 Lecture 26 Memory Hierarchy (2)

6 CPU-Memory Gap … Cont’d MAC/VU-Advanced Computer Architecture6 Lecture 26 Memory Hierarchy (2)

7 Memory Hierarchy Principles The speed of DRAM and CPU complement each other The speed of DRAM and CPU complement each other Organize memory in hierarchy, based on the Concept of Caching; and – Principle of Locality MAC/VU-Advanced Computer Architecture7 Lecture 26 Memory Hierarchy (2)

8 1: Concept of Caching staging area or temporary-place to: – store frequently-used subset of the data or instructions from the relatively cheaper, larger and slower memory; and – To avoid having to go to the main memory every time this information is needed MAC/VU-Advanced Computer Architecture8 Lecture 26 Memory Hierarchy (2)

9 Caching and Memory Hierarchy Memory devices of different type are used for each value k – the device level –the faster, smaller device at level k, serves as a cache for the larger, slower device at level k+1 –The programs tend to access the data or instructions at level k more often than they access the data at level k+1 MAC/VU-Advanced Computer Architecture9 Lecture 26 Memory Hierarchy (2)

10 Caching and Memory Hierarchy –Storage at level k+1 can be slower, but larger and cheaper per bit A large pool of memory that costs as much as the cheap storage at the highest level (near the bottom in hierarchy) serves data or instructions at the rate of the fast storage at the lowest level (near the top in hierarchy) MAC/VU-Advanced Computer Architecture10 Lecture 26 Memory Hierarchy (2)

11 Examples of Caching in the Hierarchy Hardware 0 On-Chip TLBAddress translations TLB Web browser 10,000,000Local diskWeb pagesBrowser cache Web cache Network buffer cache Buffer cache Virtual Memory L2 cache L1 cache Registers Cache Type Web pages Parts of files 4-KB page 32-byte block 4-byte word What Cached Web proxy server 1,000,000,000Remote server disks OS100Main memory Hardware1On-Chip L1 Hardware10Off-Chip L2 AFS/NFS client 10,000,000Local disk Hardware+ OS 100Main memory Compiler0 CPU registers Managed By Latency (cycles) Where Cached MAC/VU-Advanced Computer Architecture11 Lecture 26 Memory Hierarchy (2)

12 2: Principle of Locality Programs access a relatively small portion of the address space at any instant of time E.g.; we all have a lot of friends, but at any given time most of us can only keep in touch with a small group of them MAC/VU-Advanced Computer Architecture12 Lecture 26 Memory Hierarchy (2)

13 Principle of Locality MAC/VU-Advanced Computer Architecture13 Lecture 26 Memory Hierarchy (2) Physics Literature Electronics Computers Chemistry Electrical Engg. Civil Engg We select 4 books; 2 each of Electronics and Computers; place them on a small table for fast access

14 Types of Locality TemporalSpatial Temporal locality is the locality in time which says if an item is referenced, it will tend to be referenced again soon. MAC/VU-Advanced Computer Architecture14 Lecture 26 Memory Hierarchy (2)

15 Types of Locality Spatial locality It is the locality in space. It says if an item is referenced, items whose addresses are close by tend to be referenced soon MAC/VU-Advanced Computer Architecture15 Lecture 26 Memory Hierarchy (2)

16 A well-written program tends to reuse data and instructions which are: –either near those they have used recently –or that were recently referenced themselves MAC/VU-Advanced Computer Architecture16 Lecture 26 Memory Hierarchy (2)

17 Principle of Locality –Spatial locality: Items with nearby addresses (i.e., nearby in space) be located at the same level, as they tend to be referenced close together in time –Temporal locality: Recently referenced items (i.e., referenced close in time) be placed at the same memory level, as they are likely to be referenced in the near future MAC/VU-Advanced Computer Architecture17 Lecture 26 Memory Hierarchy (2)

18 Locality Example: Program sum = 0; for (i = 0; i < n; i++) sum + = a[i]; return sum; MAC/VU-Advanced Computer Architecture18 Lecture 26 Memory Hierarchy (2)

19 Locality Example Spatial Locality: All the array-elements a[ i ] or data, reference in succession at each loop iteration, so all the array elements be located at the same level All the instructions of the loop are referenced repeatedly in sequence therefore be located at the same level sum = 0; for (i = 0; i < n; i++) sum + = a[i]; return sum; MAC/VU-Advanced Computer Architecture19 Lecture 26 Memory Hierarchy (2)

20 Locality Example Temporal Locality The data, sum is referred each iteration; i.e., recently referred data is referred in each iteration The Instructions of a loop, sum += a[i] Cycle through loop repeatedly sum = 0; for (i = 0; i < n; i++) sum + = a[i]; return sum; MAC/VU-Advanced Computer Architecture20 Lecture 26 Memory Hierarchy (2)

21 Based on Locality Principle Based on Locality Principle How Memory Hierarchy works? MAC/VU-Advanced Computer Architecture21 Lecture 26 Memory Hierarchy (2) ―the memory hierarchy will keep the more recently accessed data items closer to the processor because chances are the processor will access them again soon

22 Based on Locality Principle Based on Locality Principle How Memory Hierarchy works? MAC/VU-Advanced Computer Architecture22 Lecture 26 Memory Hierarchy (2) NOT ONLY do we move the item that has just been accessed closer to the processor, but we ALSO move the data items that are adjacent to it

23 Hierarchy List Register FileLevel 0 Datapath L1 Level 1 Cache on Chip L2 Level 2 External Cache Main memoryLevel 3 System Board DRAM Disk cacheLevel 4 Disk drive DiskLevel 5 Magnetic disk OpticalLevel 6 CDs etc- bulk storage TapeLevel 7 Huge cheapest Storage MAC/VU-Advanced Computer Architecture23 Lecture 26 Memory Hierarchy (2)

24 Intel Processor Cache 80386 – no on chip cache 80486 – 8k byte lines Pentium (all versions) – two on chip L1 caches –Data & instructions Pentium 4 L1 caches Two 8k bytes L2 cache 256k –Feeding both L1 caches MAC/VU-Advanced Computer Architecture24 Lecture 26 Memory Hierarchy (2)

25 Cache Devices Cache device is a small SRAM which is made directly accessible to the processor; and DRAM, which is accessible by the cache as well as by the user or programmer, is placed at the next higher level as the Main- Memory Larger storage such as disk, is placed away from the main memory MAC/VU-Advanced Computer Architecture25 Lecture 26 Memory Hierarchy (2)

26 Cache Organization MAC/VU-Advanced Computer Architecture26 Lecture 26 Memory Hierarchy (2) Main Memory

27 Caching in a Memory Hierarchy 0123 4567 891011 12131415 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks (say 16 blocks) Data is copied between levels in block-sized transfer units 8 9143 Smaller, faster, more expensive device at level k caches a subset of the blocks (say 4 blocks) from level k+1 Level k: Level k+1: 4 4 4 10 MAC/VU-Advanced Computer Architecture27 Lecture 26 Memory Hierarchy (2)

28 Cache Organization MAC/VU-Advanced Computer Architecture28 Lecture 26 Memory Hierarchy (2)

29 Cache Addressing – Direct Addressing Level k: 4 blocks addressed by 2-bit code: zz The n th block from k+1 level is placed at n MOD 4 at level k MAC/VU-Advanced Computer Architecture29 Lecture 26 Memory Hierarchy (2) 8 9 14 3 4 10 0123 4567 89 11 12131415 4 10 00 01 10 11 0000 0100 1000 1100 0001 0101 1001 1101 0010 0110 1010 1110 0011 0111 1011 1111 00yy 01yy 10yy 11yy xx00 xx01 xx10 xx11 Level k+1: 16 blocks addressed by 4-bit code: xxyy

30 MAC/VU-Advanced Computer Architecture Lecture 26 Memory Hierarchy (2) 30 Memory Hierarchy Terminology Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y

31 MAC/VU-Advanced Computer Architecture Lecture 26 Memory Hierarchy (2) 31 Memory Hierarchy Terminology Hit: the data the processor wants to access appears in some block in the upper level (example: Block X) – –Hit Rate: the fraction of memory access that are found in the upper level (i.e., HIT) – –Hit Time: Time to access the upper level which consists of (i) RAM access time (ii) Time to determine if this is hit or miss

32 MAC/VU-Advanced Computer Architecture Lecture 26 Memory Hierarchy (2) 32 Memory Hierarchy Terminology … Cont’d Miss: data needed by the processor is not found in the upper level and has to be retrieved from a block in the lower level (Block Y) – –Miss Rate = 1 - (Hit Rate) – –Miss Penalty is the sum of the time: (i) to replace a block in the upper level (ii) to deliver the block the processor Recommendation: Hit Time must be much much smaller than Miss Penalty, otherwise no need for memory hierarchy

33 Request 14 Cache Hit CPU needs object d, which is stored in some block b, say 14 of the (k+1) memory and corresponding block 2 of the cache at level k 93 0123 4567 891011 12131415 Level k: Level k+1: 14 12 d 0123 4* MAC/VU-Advanced Computer Architecture33 Lecture 26 Memory Hierarchy (2) CPU Cache hit –Program finds block b (14) in the cache at level k. –Object d is transferred to CPU

34 Request 12 Cache Miss Program needs object A, which is stored in some block C say block 12 at level K+1 Cache miss – –Block C (12 from K+1) is not at level k- It is cache Miss – –Hence, level k cache must fetch it from level k+1; and – –transfer object A to the CPU 93 0123 4567 891011 12131415 Level k: Level k+1: 14 12 0123 Request 12 MAC/VU-Advanced Computer Architecture34 Lecture 26 Memory Hierarchy (2) 12 A

35 Placement and Replacement Policies – –If level k cache is full, then some current block must be replaced (evicted), which one is the “victim”? – –It depends upon: Cache design that defines the relationship of cache addresses with the higher level memory addresses Placement policy that determines where can the new block go? and Replacement policy that defines which block should be evicted? MAC/VU-Advanced Computer Architecture35 Lecture 26 Memory Hierarchy (2)

36 Types of misses Cold (compulsory) miss –occurs when the cache is empty; at the beginning of the cache access Capacity miss –occurs when the set of active cache blocks (working set) is larger than the cache Conflict miss –occurs when the level k cache is large enough, but multiple data objects all map to the same level k block. MAC/VU-Advanced Computer Architecture36 Lecture 26 Memory Hierarchy (2)

37 Conflict Miss: Example … Cont’d If the placement policy is based on the direct addressing, then: – – Block n at level k+1 must be placed in block (n mod 4) at level k In this case, referencing blocks 0, 8, 0, 8, 0, 8,... would miss every time as 8 mod 4 = 0, as both the blocks 0 and 8 of level k+1 are placed at the location 00 at level k MAC/VU-Advanced Computer Architecture37 Lecture 26 Memory Hierarchy (2)

38 Cache Design We have observed that more than one blocks from the level k+1 memory (say of the main memory), having N blocks, may be placed at the same location (given by N MOD M) in the level-k memory (say cache) having M blocks Hence, a tag must be associated with each block in the level-k (cache) memory to identify its position in the level k+1 memory (Main memory) MAC/VU-Advanced Computer Architecture38 Lecture 26 Memory Hierarchy (2)

39 Direct Mapping Example MAC/VU-Advanced Computer Architecture39 Lecture 26 Memory Hierarchy (2) The 16 MB main memory has 24 address bus It is organized in 32-bit blocks 16 K word (64 KB) cache requires 16-bit address and 8-bit tag

40 Direct Mapping Address Structure Tag s-rLine or Slot rWord w 8 142 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier – for the main memory –8 bit tag (=22-14) –14 bit slot or line or index value for cache No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

41 Direct Mapping Cache Organization MAC/VU-Advanced Computer Architecture41 Lecture 26 Memory Hierarchy (2)

42 MAC/VU-Advanced Computer Architecture Lecture 26 Memory Hierarchy (2) 42 Cache Design Another Example Let us consider another example with realistic numbers: Assume we have a 1 KB direct mapped cache with block size equals to 32 bytes In other words, each block associated with the cache tag will have 32 bytes in it (Row 1). 0 1 2 3 : Cache Data Byte 0 : 22 bit address 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Line Number or Index Valid Bit :

43 MAC/VU-Advanced Computer Architecture Lecture 26 Memory Hierarchy (2) 43 Address Translation – Direct Mapped Cache Assume the k+1 level main memory of 4GB, with Block Size equals to 32 bytes, and a k level cache of 1Kbyte Cache Index 0431 Cache Tag Ex: 0x01 Stored as part of the cache “state” Valid Bit : 0 1 2 3 : Cache Data Byte 0 : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Byte Select Ex: 0x00 9

44 MAC/VU-Advanced Computer Architecture Lecture 26 Memory Hierarchy (2) 44 Cache Design With Block Size equals to 32 bytes, the 5 least significant bits of the address will be used as byte select within the cache block. Since the cache size is 1K byte, the upper 32 minus 10 bits, or 22 bits of the address will be stored as cache tag The rest of the address bits in the middle, that is bit 5 through 9, will be used as Cache Index to select the proper cache block entry


Download ppt "Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality)"

Similar presentations


Ads by Google