Memory and cache CPU Memory I/O.

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Virtual Memory Introduction to Operating Systems: Module 9.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Memory Organization.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 Virtual Memory Management B.Ramamurthy Chapter 10.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Memory and cache CPU Memory I/O.
Computer Architecture Lecture 28 Fasih ur Rehman.
Lecture 19: Virtual Memory
IT253: Computer Organization
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
Memory: Page Table Structure
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Virtual Memory Chapter 7.4.
The Memory System (Chapter 5)
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
CS352H: Computer Systems Architecture
Lecture 12 Virtual Memory.
Lecture: Large Caches, Virtual Memory
Ramya Kandasamy CS 147 Section 3
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
CSE 153 Design of Operating Systems Winter 2018
Module 9: Virtual Memory
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory and cache CPU Memory I/O.
Lecture 23: Cache, Memory, Virtual Memory
ECE 445 – Computer Organization
Lecture 22: Cache Hierarchies, Memory
5: Virtual Memory Background Demand Paging
Lecture 29: Virtual Memory-Address Translation
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
M. Usha Professor/CSE Sona College of Technology
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE451 Virtual Memory Paging Autumn 2002
CSC3050 – Computer Architecture
Computer Architecture
Chapter Five Large and Fast: Exploiting Memory Hierarchy
CSE 153 Design of Operating Systems Winter 2019
Fundamentals of Computing: Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Module 9: Virtual Memory
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Virtual Memory.
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Memory and cache CPU Memory I/O

The Memory Hierarchy Larger Faster More Expensive ~2ns ~4-5ns ~30ns >1ms (~6ms) Registers Primary cache Larger Secondary cache Faster More Expensive Main memory Magnetic disk CEG 320/520 10: Memory and cache

Cache & Locality Cache sits between the CPU and main memory Invisible to the CPU Only useful if recently used items are used again Fortunately, this happens a lot. We call this property locality of reference. CEG 320/520 10: Memory and cache

Locality of reference Temporal locality Spatial locality Recently accessed data/instructions are likely to be accessed again. Most program time is spent in loops Arrays are often scanned multiple times Spatial locality If I access memory address n, I am likely to then access another address close to n (usually n+1, n+2, or n+4) Linear execution of code Linear access of arrays CEG 320/520 10: Memory and cache

How a cache exploits locality Temporal – When an item is accessed from memory it is brought into the cache If it is accessed again soon, it comes from the cache and not main memory Spatial – When we access a memory word, we also fetch the next few words of memory into the cache The number of words fetched is the cache line size, or the cache block size for the machine CEG 320/520 10: Memory and cache

Cache write policies As long as we are only doing READ operations, the cache is an exact copy of a small part of the main memory When we write, should we write to cache or memory? Write through cache – write to both cache and main memory. Cache and memory are always consistent Write back cache – write only to cache and set a “dirty bit”. When the block gets replaced from the cache, write it out to memory. When might the write-back policy be dangerous? CEG 320/520 10: Memory and cache

Cache mapping Direct mapped – each memory block can occupy one and only one cache block Example: Cache block size: 16 words Memory = 64K (4K blocks) Cache = 2K (128 blocks) CEG 320/520 10: Memory and cache

Direct Mapped Cache Memory block n occupies cache block (n mod 128) Consider address $2EF4 001011101111 0100 block: $2EF = 751 word: 4 Cache: 00101 1101111 0100 tag: 5 block: 111 word: 4 CEG 320/520 10: Memory and cache

Fully Associative Cache More efficient cache utilization No wasted cache space Slower cache search Must check the tag of every entry in the cache to see if the block we want is there. CEG 320/520 10: Memory and cache

Set-associative mapping Blocks are grouped into sets Each memory block can occupy any block in its set This example is 2-way set-associative Which of the two blocks do we replace? CEG 320/520 10: Memory and cache

Replacement algorithms Random Oldest first Least accesses Least recently used (LRU): replace the block that has gone the longest time without being referenced. This is the most commonly-used replacement algorithm Easy to implement with a small counter… CEG 320/520 10: Memory and cache

Implementing LRU replacement Suppose we have a 4-way set associative cache… Hit: Increment lower counters Reset counter to 00 Miss Replace the 11 Set to 00 Increment all other counters Block 0 11 Block 1 10 Set 0 Block 2 01 Block 3 00 CEG 320/520 10: Memory and cache

Interactions with DMA If we have a write-back cache, DMA must check the cache before acting. Many systems simply flush the cache before DMA write operations What if we have a memory word in the cache, and the DMA controller changes that word? Stale data We keep a valid bit for each cache line. When DMA changes memory, it must also set the valid bit to 0 for that cache line. “Cache coherence” CEG 320/520 10: Memory and cache

Typical Modern Cache Architecture L0 cache On chip Split 16 KB data/16 KB instructions L1 cache 64 KB unified L2 cache Off chip 128 KB to 16+ MB CEG 320/520 10: Memory and cache

Memory Interleaving Memory is organized into modules or banks Within a bank, capacitors must be recharged after each read operation Successive reads to the same bank are slow Non-interleaved Interleaved Module Byte Byte Module 0000 0002 0004 0006 … 00F8 00FA 00FC 00FE 0100 0102 0104 0106 … 01F8 01FA 01FC 01FE 0200 0202 0204 0206 … 02F8 02FA 02FC 02FE 0000 0006 000C 0012 … 02E8 02EE 02F4 02FA 0002 0008 000E 0014 … 02EA 02F0 02F6 02FC 0004 000A 0010 0016 … 02EC 02F2 02F8 02FE CEG 320/520 10: Memory and cache

Measuring Cache Performance No cache: Often about 10 cycles per memory access Simple cache: tave = hC + (1-h)M C is often 1 clock cycle Assume M is 17 cycles (to load an entire cache line) Assume h is about 90% tave = .9 (1) + (.1)17 = 2.6 cycles/access What happens when h is 95%? CEG 320/520 10: Memory and cache

Multi-level cache performance tave = h1C1 + (1-h1) h2C2 + (1-h1) (1-h2) M h1 = hit rate in primary cache h2 = hit rate in secondary cache C1 = time to access primary cache C2 = time to access secondary cache M = miss penalty (time to load an entire cache line from main memory) CEG 320/520 10: Memory and cache

Virtual Memory CEG 320/520 10: Memory and cache

Virtual Memory: Introduction Motorola 68000 has 24-bit memory addressing and is byte addressable Can address 224 bytes of memory – 16 MB Intel Pentiums have 32-bit memory addressing and are byte addressable Can address 232 bytes of memory – 4 GB What good is all that address space if you only have 256MB of main memory (RAM)? CEG 320/520 10: Memory and cache

Virtual Memory: Introduction Unreal Tournament (full install) uses 2.4 gigabytes on your hard disk. You may only have 256 megabytes of RAM (main memory). How can you play the game if you can’t fit it all into main memory? Other Examples: Neverwinter Nights – 2 gigabytes Microsoft Office – 243 megabytes Quicken Basic 2000 – 28 megabytes CEG 320/520 10: Memory and cache

Working Set Not all of a program needs to be in memory while you are executing it: Error handling routines are not called very often. Character creation generally only happens at the start of the game… why keep that code easily available? Working set is the memory that is consumed at any moment by a program while it is running. Includes stack, allocated memory, active instructions, etc. Examples: Unreal Tournament – 100MB (requires 128MB) Internet Explorer – 20MB CEG 320/520 10: Memory and cache

Virtual Memory In modern computers it is possible to use more memory than the amount physically available in the system Memory not currently being used is temporarily stored on magnetic disk Essentially, the main memory acts as a cache for the virtual memory on disk. CEG 320/520 10: Memory and cache

Memory Management Unit (MMU) Virtual memory must be invisible to the CPU Appears as one large address space The MMU sits between the CPU and the memory system Translates virtual addresses to real (physical) addresses CEG 320/520 10: Memory and cache

Paged Memory Memory is divided into pages Conceptually like a cache block Typically 2K to 16K in size A page can live anywhere in main memory, or on the disk Like cache, we need some way of knowing what page is where in memory Page table instead of tags Page table base register stores the address of the page table CEG 320/520 10: Memory and cache

Page lookup in the page table Control bits: Valid Modified Accessed Occasionally cleared by the OS Used for LRU replacement TLB: A cache for the page table CEG 320/520 10: Memory and cache

The TLB A special, fully-associative cache for the page table is used to speed page lookup This cache is called the Translation Lookaside Buffer or TLB. CEG 320/520 10: Memory and cache

What if you have too many page faults? If the memory address desired cannot be found in the TLB, then a page fault has occurred, and the page containing that memory must be loaded from the disk into main memory. Page faults are costly because moving 2K – 16K (page size) can take a while. What if you have too many page faults? CEG 320/520 10: Memory and cache

Thrashing If a process does not have enough frames to hold its current working set, the page-fault rate is very high Thrashing a process is thrashing when it spends more time paging than executing w/ local replacement algorithms, a process may thrash even though memory is available w/ global replacement algorithms, the entire system may thrash Less thrashing in general, but is it fair? CEG 320/520 10: Memory and cache

Thrashing Diagram Why does paging work? Locality model Process migrates from one locality to another. Localities may overlap. Why does thrashing occur?  size of locality > total memory size What should we do? suspend one or more processes! CEG 320/520 10: Memory and cache

Program Structure How should we arrange memory references to large arrays? Is the array stored in row-major or column-major order? Example: Array A[1024, 1024] of type integer Page size = 1K Each row is stored in one page System has one frame Program 1 for i := 1 to 1024 do for j := 1 to 1024 do A[i,j] := 0; 1024 page faults Program 2 for j := 1 to 1024 do for i := 1 to 1024 do A[i,j] := 0; 1024 x 1024 page faults CEG 320/520 10: Memory and cache