Module IV Memory Organization
Set Associative Cache It combines both the concepts. The cache lines are grouped into sets. The number of lines in a set can vary from 2 to 16. A part of address specify which set hold the address. Data is stored in any of the lines in the set.
Set Associative Cache Two lines per set is called two way set associative. Each entry has its own tag. A set is selected using its index
Set Associative Cache Assume you have • 16 bit memory address • 2 KB of cache 16 byte lines 2 way set associative The memory address is defined as follows: Word = log2(16) =4 Number of lines = 2 KB / 16 = 211/24=27 =128 Number of sets = 128 / 2 = 64 Set bits = log2(64) = 6 Tag bits = 16-(4+6)=6 Tag Set Word 6bits 6 bits 4bits
Example Suppose we want to read or write a byte at the address 357A Tag = 13 Line = 23 Word = 10 If set 23 in cache has tag 13, then data at 357A is in cache. Else, a miss has occurred Contents of any cache line of set 23 is replaced by contents of memory line 001101010111 = 855
Simulation Consider line size of 4 bytes No. of cache memory lines is 8 Cache is 2-way set associative. No. of sets =8/2 =4 No. of main memory lines is 24
Simulation Main Memory Set Set 0 Set1 Tag Data 1 2 3 Cache Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 3 Cache
Simulation 2mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 1 2 98 3 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 3 2mod 4 =2 Cache MISS !!!
Simulation 7mod 4 =3 Main Memory Set Set 0 Set1 Tag Data 1 2 98 3 7 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 3 7 1283 7mod 4 =3 Cache MISS !!!
Simulation 15mod 4 =3 Main Memory Set Set 0 Set1 Tag Data 1 2 98 3 7 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 3 7 1283 15 993 15mod 4 =3 Cache MISS !!!
Simulation 22mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 1 2 98 22 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 22 1232 3 7 1283 15 993 22mod 4 =2 Cache MISS !!!
Simulation 17mod 4 =1 Main Memory Set Set 0 Set1 Tag Data 1 17 12 2 98 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 17 12 2 98 22 1232 3 7 1283 15 993 17mod 4 =1 Cache MISS !!!
Simulation 16mod 4 =0 Main Memory Set Set 0 Set1 Tag Data 16 982 1 17 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 1 17 12 2 98 22 1232 3 7 1283 15 993 16mod 4 =0 Cache MISS !!!
Simulation 14mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 16 982 1 17 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 1 17 12 2 14 22 1232 3 7 1283 15 993 14mod 4 =2 Cache MISS !!!
Simulation 18mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 16 982 1 17 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 1 17 12 2 14 18 1123 3 7 1283 15 993 18mod 4 =2 Cache MISS !!!
Simulation 8mod 4 =0 Main Memory Set Set 0 Set1 Tag Data 16 982 8 1232 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 8mod 4 =0 Cache MISS !!!
Simulation 4mod 4 =0 Main Memory Set Set 0 Set1 Tag Data 4 8172 8 1232 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 4 8172 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 4mod 4 =0 Cache MISS !!!
Simulation 15mod 4 =3 Main Memory Set Set 0 Set1 Tag Data 4 8172 8 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 4 8172 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 15mod 4 =3 Cache HIT !!!
Simulation 18mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 4 8172 8 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 4 8172 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 18mod 4 =2 Cache HIT !!!
Replacement Algorithms Because of its simplicity of implementation, LRU is the most popular replacement algorithm. Another method is FIFO : Replace that block in the set that has been in the cache longest. Still another method is LFU: Replace that block in the set that has the fewest references. A technique not based on usage is to pick a line at random from among the candidate lines. Studies shows that random replacement provides inferior performance to an algorithm based on usage
Write Policy There are 2 policies : Writeback Writethrough Writeback: writing results only to cache Adv : faster writes Disadv: out-of-date main memory data Writethrough: writing to cache and main memory Adv : maintain valid data in main memory Disadv: requires long write times
Line Size Two specific effects come into play: Larger blocks reduce the number of blocks that fit into a cache. As a block becomes larger, the additional words are farther from the requested word and therefore less likely to be needed in the near future.
Number of Caches Two aspects of this design issue number of levels of caches use of unified versus split caches Multilevel Caches : The simplest such organization is known as a two-level cache. Now a days we have three-level cache : L1,L2,L3
Number of Caches Unified Versus Split Cache Unified Cache: a single cache is used to store references to both data and instructions. Split Cache: Uses two caches : one dedicated to instructions and one dedicated to data. These two caches both exist at the same level, typically as two L1 caches.
Advantages of Unified Cache It has a higher hit rate than split caches Only one cache needs to be designed and implemented Advantage of Split Cache It eliminates contention for the cache between the instructions fetch/decode unit and the execution unit
Cache Coherency Needed if more than one device (typically a processor) shares cache and main memory . If data in one cache are altered, this invalidates not only the corresponding word in main memory, but also that same word in other caches Even if a write-through policy is used, the other caches may contain invalid data. A system that prevents this problem is said to maintain cache coherency.
Cache Coherency Possible approaches to cache coherency include the following: Bus watching with write-through Hardware transparency Noncacheable memory
Bus watching with write-through Each cache controller monitors the address lines to detect write operations to memory by other bus masters. If another master writes to a location in shared memory that also resides in the cache memory, the cache controller invalidates that cache entry.
Hardware Transparency Additional hardware is used to ensure that all updates to main memory via cache are reflected in all caches. If one processor modifies a word in its cache, this update is written to main memory. In addition, any matching words in other caches are similarly updated.
Noncacheable memory Only a portion of main memory is shared by more than one processor, and this is designated as noncacheable. In such a system, all accesses to shared memory are cache misses, because the shared memory is never copied into the cache. The noncacheable memory can be identified using chip-select logic or high-address bits.
Memory Interleaving Reduce memory access time Main memory is divided into a number of modules and addresses are arranged such that successive bytes are stored in different modules. CPU access successive locations and access to them can be done in parallel, reducing access time.
Memory Interleaving Lower order k bits are used to select a module and higher order m bits are to access a location in the module It should have 2k modules else there will be gaps of non-existent locations.
Associative Memory To search an object, no. of memory accesses depends on the location of object and efficiency of search algorithm. Time to find an object can be reduced, if objects are selected based on their contents. This type of memory is called Associative Memory or Content Addressable Memory (CAM).
Block Diagram
Associative Memory It consists of memory array with match logic for m n-bit words Argument Register (A) and Key Register (K) have n-bits per word Each word in memory is compared with A in parallel and the match is set in match register. Read can be done based on match register content
Associative Memory K is used to mask A Only bits of A with corresponding bits set of K is compared. Example
Associative Memory Cells