Presentation is loading. Please wait.

Presentation is loading. Please wait.

CACHE _View 9/30/2016 1. Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels.

Similar presentations


Presentation on theme: "CACHE _View 9/30/2016 1. Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels."— Presentation transcript:

1 CACHE _View 9/30/2016 1

2 Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels differ in technology, size, CPU distance, speed, cost

3 Memory Hierarchy Registers: built with flipflops, on CPU, small (32 32-bit) Caches: built from SRAM: L1 on CPU, small (2-64 KB), L2 on/near CPU, medium (256 KB – 2MB) Main Memory: built from DRAM, not near CPU, large (128 – 256 MB (no limit)) Hard Drive: magnetic disk, far from CPU, huge (10-40 GB)

4 Memory Hierarchy 1.Registers – Fast storage internal to processor – Speed = 1 CPU clock cycle Persistence = Few cycles Capacity ~ 0.1K to 2K Bytes 2.Cache – Fast storage internal or external to processor – Speed = A few CPU clock cycles Persistence = Tens to Hundreds of pipeline cycles, 0.5MB to 2MB 3.Main Memory (physical) – Main storage usually external to processor, < 16GB – Speed = 1-10 CPU clocks Persistence = msec to days 4.Disk Storage (virtual) – Very slow – Access time = 1 to 15 msec – Used for backing store for main memory 4

5 Memory Hierarchy Primary D/SRAM difference: DRAM packs more bits than SRAM into given silicon area at expense of speed. SRAM is form of flipflop; DRAM is completely different (must refresh) Tech- nology Access time (ns) $/GByte (2004) SRAM0.5-5ns$(4-10)000 DRAM50-70ns$(1-2)00 Mag disk5-20mil ns$0.50-2

6 Memory Hierarchy CPU is fast: it needs quick access to lots of data, otherwise it just sits and waits! A 32-bit 800 MHz Pentium III can process 4 bytes simultaneously, 800 million times per second (MHz = mil clock cycles/sec) (GHZ = thou mil)

7 Memory Hierarchy Differences in speed, size, cost among memory technologies means it makes sense to build memory as a hierarchy Way too expensive to use only fastest!

8 Memory Hierarchy Impact of Memory Hierarchy: due to the locality principle, hierarchy gives the illusion that available memory has the size of largest level (least expensive technology) AND access speed of smallest level (most expensive technology)

9 Memory Hierarchy Terminology Memory level closer to CPU will contain subset of data at any level farther away CPU Cache Main Memory Hard Drive Data only copied between 2 adjacent levels at one time: from lower/bigger/slower level to upper/smaller/faster level Block: minimum single data unit present in or copied to/from a level Hit: data present in upper level block Miss: data not present in any upper level block, so lower level is accessed to copy block containing the data

10 Memory Hierarchy Terminology CPU Cache (SRAM) Main Memory (DRAM) Magnetic Disk Hit Time: time to access upper level (includes time to determine if it’s a hit or a miss) Miss Penalty: time to replace block in upper level with block in lower level + time to deliver data to processor (>>hit time) Terms hold across any 2 levels (e.g. main mem hit/cache hit) Hit Rate: fraction of memory accesses found in the upper level (for any 2 adjacent levels) Miss Rate: fraction of memory accesses not found in the upper level (1 – hit rate)

11 Types of Memory There are two kinds of main memory: random access memory, RAM, and read-only-memory, ROM. There are two types of RAM, dynamic RAM (DRAM) and static RAM (SRAM). Dynamic RAM consists of capacitors that slowly leak their charge over time. Thus they must be refreshed every few milliseconds to prevent data loss. DRAM is “cheap” memory owing to its simple design. 11

12 Types of Memory SRAM consists of circuits similar to the D flip-flop. SRAM is very fast memory and it doesn’t need to be refreshed like DRAM does. It is used to build cache memory. ROM also does not need to be refreshed, either. In fact, it needs very little charge to retain its memory. ROM is used to store permanent, or semi- permanent data that persists even while the system is turned off. 12

13 RAM RAM (Random Access Memory): contents writable/volatile; used to build large memories memory accesses take same time no matter where in memory read (vs. magnetic hard disk) – (Dynamic) DRAM: stores data as a capacitor charge (contents must be periodically refreshed by a memory controller – read then rewrite every few milliseconds) Implemented with one transistor and a capacitor slower and takes up less space on a memory chip to store n bits, which makes DRAM less expensive 9/30/2016 13

14 The Memory Hierarchy This storage organization can be thought of as a pyramid: 14

15 Hierarchy List Registers – volatile L1 Cache – volatile L2 Cache – volatile CDRAM (main memory) cache – volatile Main memory – volatile Disk cache – volatile Disk – non-volatile Optical – non-volatile Tape – non-volatile 15

16 Cache What is it? A cache is a small amount of fast memory What makes small fast? – Simpler decoding logic – More expensive SRAM technology – Close proximity to processor – Cache sits between normal main memory and CPU or it may be located on CPU chip or module 16

17 Cache (continued) 17

18 Moving Data between Memory and CPU Processor Instructions and Data are sent between Main Memory and CPU on a Bus/Channel. The goal of Main Memory Architecture is to make this as fast as possible. Control Datapath REGISTERSREGISTERS ALU DEVICES IN PUT OUT PUT Main Memory PC 9/30/2016 18

19 Suppose we’ve got a Processor, Main Memory, and a channel (Bus) that connects them to send data PROCESSOR MAIN MEMORY BUS 9/30/2016 19

20 Suppose we’re running a program that makes this series of main memory accesses: PROCESSOR MAIN MEMORY BUS M I S S I S S I P P I Assume a letter represents both the memory address and the value (4 secs to access) 9/30/2016 20

21 PROCESSOR MAIN MEMORY BUS M M 1) The Processor calls for each letter 2) The letter is found in Main Memory 3) The letter is delivered to the Processor through the Bus I I S S S SI P PI I S S SSI I P P I 9/30/2016 21

22 So that took quite a bit of time, right? (44 secs) (11 * 4) PROCESSOR MAIN MEMORY BUS Especially tedious b/c we called the same letters over and over 9/30/2016 22

23 Now suppose we add 2 caches. They sit right next to the processor. (1 sec to access) PROCESSOR MAIN MEMORY BUS Cache2Cache1 9/30/2016 23

24 Cache1 stores values from memory locations A - M PROCESSOR MAIN MEMORY BUS Cache2Cache1 9/30/2016 24

25 Cache2 stores values from memory locations N - Z PROCESSOR MAIN MEMORY BUS Cache2Cache1 9/30/2016 25

26 Each Cache can store up to 2 values PROCESSOR MAIN MEMORY BUS Cache2Cache1 9/30/2016 26

27 PROCESSOR MAIN MEMORY BUS M M 1) Letter found in Main Memory 2) Bus delivers it to a Cache 3) Cache delivers it to Processor I I S S S P I S S I P P I Now, Cache delivers letter to Processor if it’s in there. Otherwise, Cache2Cache1 9/30/2016 27

28 That was a lot faster (23 secs) (4*4 + 7*1) PROCESSOR MAIN MEMORY BUS For 7/11 calls, letters were found in a cache Cache2Cache1 9/30/2016 28

29 This example illustrated temporal locality: PROCESSOR MAIN MEMORY BUS Cache2Cache1 If we access a memory location once, we’re likely to access it again 9/30/2016 29

30 But suppose instead we’re running a program that makes this series of sequential main memory access: PROCESSOR MAIN MEMORY BUS A B C D E F Cache2Cache1 A A B B C C D D E E F F 9/30/2016 30

31 The cache didn’t help us here because every new value kicked an old one out of the cache PROCESSOR MAIN MEMORY BUS Cache2Cache1 9/30/2016 31

32 This example illustrated spatial locality: PROCESSOR MAIN MEMORY BUS Cache2Cache1 If we access a memory location, we’re likely to access an adjacent memory location 9/30/2016 32

33 Cache1 E MAIN MEMORY F Suppose now that when a letter is called, the bus delivers it + letter at next address PROCESSOR BUS A B C D E F Cache2 A A B B C C D D E F We’re faster again, because half the time the letter is in the cache Class Participation 1) How many secs? 3*4+6*1=18 9/30/2016 33

34 e.g., loops evoke same instructions and data repeatedly PROCESSOR MAIN MEMORY BUS Cache2Cache1 Typical memory accesses exhibit temporal locality 9/30/2016 34

35 e.g., instructions and arrays are accessed sequentially PROCESSOR MAIN MEMORY BUS Cache2Cache1 Typical memory accesses also exhibit spatial locality 9/30/2016 35

36 Ideally, memory design will take advantage of both temporal and spatial locality PROCESSOR MAIN MEMORY BUS Cache2Cache1 The principle of locality 9/30/2016 36


Download ppt "CACHE _View 9/30/2016 1. Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels."

Similar presentations


Ads by Google