What is it and why do we need it? Chris Ward CS147 10/16/2008.

Slides:



Advertisements
Similar presentations
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Computer Maintenance Unit Subtitle: Cache Concepts Excerpted from Copyright © Texas Education Agency, 2011.
Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
Revision Mid 2 Prof. Sin-Min Lee Department of Computer Science.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Systems I Locality and Caching
Memory Systems Architecture and Hierarchical Memory Systems
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
CS1104: Computer Organisation School of Computing National University of Singapore.
1 CSCI 2510 Computer Organization Memory System I Organization.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
IT253: Computer Organization
Chapter Twelve Memory Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Chapter 9 Memory Organization By Jack Chung. MEMORY? RAM?
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Revision Mid 2, Cache Prof. Sin-Min Lee Department of Computer Science.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Hierarchical Memory Systems Prof. Sin-Min Lee Department of Computer Science.
COMPUTER SYSTEMS ARCHITECTURE A NETWORKING APPROACH CHAPTER 12 INTRODUCTION THE MEMORY HIERARCHY CS 147 Nathaniel Gilbert 1.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Cache memory Replacement Policy, Virtual Memory Prof. Sin-Min Lee Department of Computer Science.
Cache Advanced Higher.
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Chapter 2 Memory and process management
The Goal: illusion of large, fast, cheap memory
Ramya Kandasamy CS 147 Section 3
How will execution time grow with SIZE?
ECE 445 – Computer Organization
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Memory Operation and Performance
ECE 463/563 Fall `18 Memory Hierarchies, Cache Memories H&P: Appendix B and Chapter 2 Prof. Eric Rotenberg Fall 2018 ECE 463/563, Microprocessor Architecture,
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory and Performance
Presentation transcript:

What is it and why do we need it? Chris Ward CS147 10/16/2008

What drives us to require cache? How and Why does it work? ???

 What we would prefer in our Computer Memory:  Fast  Large  Cheap However,  Very Fast memory = Very expen$ive memory  Since we need large capacity ( today multi- gigabyte memories) we need to build a system that is the best compromise to keep the total $$ reasonable.

SRAM DRAM DRAMs use only one transistor, plus a capacitor. DRAMs are smaller and less expensive because SRAMs are made from four to six transistors (flip flops) per bit. SRAMs don't require external refresh circuitry or other work in order for them to keep their data intact. SRAM is faster than DRAM

 In the early days of PC technology, memory access was only slightly slower than register access  Since the 1980s the performance gap between processor and memory has been growing.  CPU speed continues to double every few years, while the speed of disk and RAM cannot boast such a rapid rate of speed improvements.  For Main Memory RAM, the speed has increased from 50 nanoseconds (one billionth of a second) to <2 nanoseconds, a 25x improvement over a 30-year period

 It has been discovered that for about 90% of the time that our programs execute only 10% of our code is used!  This is known as the Locality Principle  Temporal Locality  When a program asks for a location in memory, it will likely ask for that same location again very soon thereafter  Spatial Locality  When a program asks for a memory location at a memory address (lets say 1000)… It will likely need a nearby location 1001,1002,1003,10004 … etc.

Construct a Memory Hierarchy which tricks the CPU into thinking it has a very fast, large, cheap memory system.

fastest possible access (usually 1 CPU cycle) Registers <1 ns often accessed in just a few cycles, usually tens – hundreds of kilobytes ~$80/MB Level 1 (SRAM) cache 2-8ns higher latency than L1 by 2× to 10×, now multi- MB ~$80/MB Level 2 (SRAM) cache 5-12ns may take hundreds of cycles, but can be multiple gigabytes eg.2GB $11 ($0.0055/MB) Main memory (DRAM) ns millions of cycles latency, but very large eg.1TB $139 ($ /MB) Disk storage 3,000, ,000,000 ns several seconds latency, can be huge Tertiary storage (really slow) For a 1 GHz CPU a 50 ns wait means 50 wasted clock cycles Main Memory and Disk estimates Fry’s Ad 10/16/2008

 We established that the Locality principle states that only a small amount of Memory is needed for most of the program’s lifetime…  We now have a Memory Hierarchy that places very fast yet expensive RAM near the CPU and larger – slower – cheaper RAM further away…  The trick is to keep the data that the CPU wants in the small expensive fast memory close to the CPU … and how do we do that???

 Hardware and the Operating System are responsible for moving data throughout the Memory Hierarchy when the CPU needs it.  Modern programming languages mainly assume two levels of memory, main memory and disk storage.  Programmers are responsible for moving data between disk and memory through file I/O.  Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.

 A computer program or a hardware-maintained structure that is designed to manage a cache of information  When the smaller cache is full, the algorithm must choose which items to discard to make room for the new data  The "hit rate" of a cache describes how often a searched-for item is actually found in the cache  The "latency" of a cache describes how long after requesting a desired item the cache can return that item

Each replacement strategy is a compromise between hit rate and latency.  Direct Mapped Cache  The direct mapped cache is the simplest form of cache and the easiest to check for a hit.  Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored.  Fully Associative Cache  The fully associative cache has the best hit ratio because any line in the cache can hold any address that needs to be cached.  However, this cache suffers from problems involving searching the cache  A replacement algorithm is used usually some form of a LRU "least recently used" algorithm  N-Way Set Associative Cache  The set associative cache is a good compromise between the direct mapped and set associative caches.

What happens when we run out of main memory? Our programs need more and more RAM!

 Virtual Memory is basically the extension of physical main memory (RAM) into a lower cost portion of our Memory Hierarchy (lets say… Hard Disk)  A form of the Overlay approach, managed by the OS, called Paging is used to swap “pages” of memory back and forth between the Disk and Physical Ram.  Hard Disks are huge, but to you remember how slow they are??? Millions of times slower that the other memories in our pyramid.

        Cache-Memory/ Cache-Memory/  

m(mili)10^-3k(kilo)10^3 micro (µ)(micro)10^-6M(mega)10^6 n(nano)10^-9G(giga)10^9 p(pico)10^-12T(tera)10^12 f(femto)10^-15P(peta)10^15 a(atto)10^-18E(exa)10^18 z(zepto)10^-21Z(zeta)10^21 Y(yotta)10^24