Memory Principles
Locality Temporal Locality Spatial Locality Items recently used are likely to be used again Instructions in loop Local variables Spatial Locality Items near recently used items are likely to be used soon Sequential instructions Arrays
Memory Technologies
SRAM SRAM : Static Random Access Memory Circuit that maintains state 6-8 transistors per circuit Equal access speed to all addresses
DRAM DRAM : Dynamic RAM Each cell = one transistor an one capacitor Capacitor charge represents value Denser than SRAM Slower to read/write than SRAM Charge must be refreshed frequently
DRAM Synchronous DRAM DDR SDRAM Clock controlled DDR : Dual Data Rate Xfer rising and falling clock S : Synchronous
DRAM Details Changing rows is slower than cols
DRAM Details Multiple banks used in parallel: Spatial Locality Byte 0: Row 0, Col 0 From Banks 0-7 Byte 1: Row 0, Col 1 From Banks 8-15 Byte 2: Row 0, Col 2 From Banks 16-23 Spatial Locality
DRAM Stats Access speed groth slowed substantially
Hard Drives Hard Drive Nonvolitile (no power required) Rotating Magnetic storage
Hard Drives Data arranged in sectors Seek Time Time for read/write head to be positioned and data to spin past it
Flash Flash Memory USB sticks Memory cards Solid State Drives
Flash Transistor…
Flash Strong current used to inject/remove electrons from floating gate
Flash Charged floating gate = off Uncharged = on
Flash vs HDD HDD : Flash Denser Longer lifespan Much faster seek Faster reads
Choices Ideal memory Access time of SRAM Capacity and cost/GB of disk
Since 1980, CPU has outpaced DRAM ... 3 cycle delay for memory access
Since 1980, CPU has outpaced DRAM ... Intel i7-6700 42 cycles + 51 ns (~200 cycles) delay for main memory access
Memory Hierarchy
Hierarchy Memory Heirarchy Information moved from larger/slower units to smaller faster as needed
Hierarchy Cache Memory : Provide invisible speedup to main memory
Cache Cache memory Small, fast (SRAM) memory Stores subset of main memory we think is most important
Differences Modern cache all on CPU May be local to one core or shared May be split Instruction Data
Cache How important is it?
Memory Units Main memory Byte addressed
Memory Units Main memory Machine Word Byte addressed 2-8 bytes … Main memory Byte addressed Machine Word 2-8 bytes 32 bit = 4 64 bit = 8
Memory Units Main memory Machine Word Cache Byte addressed 2-8 bytes Line 0 Line 1 … Main memory Byte addressed Machine Word 2-8 bytes Cache Line or block of 1+ words
Process I need memory location 0x000E Is it in L1 cache? Is it in L2? Yes : Hit – return it No : Miss – go search next level and bring back whole line Is it in L2? Yes : Hit – return line No : Miss – go search next level bring back whole line Is it in L3… Is it in memory…
Direct Mapping Direct mapping : every memory block has one cache entry it can use
Direct Mapped Cache Example : Cache has 4 entries Each entry holds 1 word Cache location = Word # mod 4
Direct Mapped Cache Need to track who is in the slot ???
Address Breakdown 4 cache entries 2 bits Last 2 bits of word # specify cache index
Direct Mapped Cache Tag: Identifies which word is in cache All the rest address bits
Book Cache Usage Sample 8 blocks, 1 word per block Valid bit tracks if cache line actually has data
Book Cache Usage Sample
Book Cache Usage Sample
Book Cache Usage Sample
Book Cache Usage Sample
Book Cache Usage Sample
Address Subdivision Cache system with 1024, 1 word entries 4 byte words 64 bit memory addresses
Direct Mapped Cache Address format based on 4 bytes per word 2 bits 1 words per line 0 bits 1024 cache lines 10 bits for index Rest is tag
Address Subdivision Cache system with 256, 16 word entries 4 byte words 32 bit memory addresses
Direct Mapped Cache Address format based on 4 bytes per word 2 bits 16 words per line 4 bits to pick word out of line 256 cache lines 8 bits for index Rest is tag
Direct Mapped Cache Where is this address in cache? 0010 0101 1111 0010 0000 0000 1010 1110
Direct Mapped Cache Where is this address in cache? 0010 0101 1111 0010 0000 0000 1010 1110 Cache index 0000 0010 (2 of the 256 lines) Word 1011 (11 of the 16 in the cache line) Byte 10 (2 of 4)
Size Considerations Bigger lines take advantage of spatial locality
Size Considerations Bigger lines take advantage of spatial locality At the expense of fewer entries More competition!!
Bad Situations for Cache Large 2D arrays traversed in column major order… 1 2 76 82 83 84 94 88 93 Row Major Access Col Major Access 1 2 3 4 5 6 7 8 76 82 83 84 94 88 93
Bad Situations for Cache Data with poor locality Complex object oriented programming structure
Bad Situations for Cache Data with good locality Packed structures: