Department of Electronics Advanced Information Storage 16 Atsufumi Hirohata 16:00 28/November/2013 Thursday (V 120)
Quick Review over the Last Lecture FeRAM : * ** PRAM : *** ReRAM :
16 Cache Memory Level 1 Level 2 Level 3 Racetrack memory Register
Cache Memory * In a PC, cache is used to make processing data fast : * To overcome the von Neumann bottleneck : Access speed : Processor memories
Roles of Cache * To hold the instructions / data which are very commonly used or computer uses frequently. To read the likely data; that is data which is to be most probably read in near future.
Cache Types *
Level 1 Cache Static memory integrated with a processor core To store information recently accessed by a processor To improve data access speed in cases when the CPU accesses the same data multiple times Access time : L1 cache > system memory * Level 1 / primary cache (L1 cache) : * In a modern PC, Split into two caches of equal size One for storing programme data Another for storing microprocessor instructions **
Level 2 Cache * Level 2 / secondary cache (L2 cache) : * ** Large static memory (may be) integrated with a processor core To store recently accessed information To reduce data access time when the same data was already accessed before Access time : L1 cache > L2 cache In a modern PC, Data pre-fetching feature to buffer programme instructions and data to be requested Inclusive cache : requested data stays Exclusive cache : requested data removed after transfer to L1 cache Unified for storing both programme data and microprocessor instructions
Level 3 Cache * Level 3 cache (L3 cache) : * ** Very Large static memory outside a processor core and shared by the cores To store copies of requested items in case a different core makes a subsequent request. Access time : L1 cache > L2 cache > L3 cache > DRAM In a modern PC, Inclusive cache : requested data stays Exclusive cache : requested data removed after transfer to L1 cache Unified for storing both programme data and microprocessor instructions
Data Associativity Cache memory stores data by a blocked line (64 Bytes for Intel Pentium 4 L1) : * * Direct mapped : Fastest hit times and best trade-off for large caches 2-way set / skewed associative : Best trade-off for 4 ~ 8 kbyte caches 4-way set associative Fully associative : Lowest miss rates and best trade-off for very high penalty
Cache Miss * SPEC CPU2000 benchmark test carried out by Hill and Cantin : Refill process is performed once cache miss occurs : Round robin : Refill data in order Least Recently Used (LRU) : Refill from the oldest data accessed Random Hit rate : LRU > Random > Round robin Complexity : LRU > Random > Round robin
Example : Cache Sizes * Intel Nehalem (2008) :
Example : Cache Architecture * Intel Nehalem (2008) :
Memory Development * Deeper memory hierarchy :
Racetrack Memory In 2008, 3-bit racetrack memory was demonstrated by Stuart S. P. Parkin (IBM) : * * Utilise domain-wall motion by STT * S. S. P. Parkin, Sci. Am. 300, 76 (2009).
Read / Write Operation * S. S. P. Parkin, Sci. Am. 300, 76 (2009). Fully electrical read-out / write-in :
Racetrack-Memory Properties Racetrack memory architecture : * Utilise magnetic domain walls 1 : head-to-head wall 0 : tail-to-tail wall CMOS process compatible 3-dimensional (3D) structure ×Reproducible domain-wall trapping ×3D fabrication
Racetrack Memory Demonstration MRAM cell structure : * nm wide, 20 nm thick and 10 mm long ferromagnetic wires CMOS implementation
Information Technology Pyramid Layered structures between CPU and storages : * *
Register Register is a very fast memory directly attached to a processor : * * *