Morgan Kaufmann Publishers

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Advertisements

Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Systems I Locality and Caching
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CMSC 611: Advanced Computer Architecture
Memory COMPUTER ARCHITECTURE
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
CS352H: Computer Systems Architecture
Associativity in Caches Lecture 25
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
The University of Adelaide, School of Computer Science
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
5.2 Eleven Advanced Optimizations of Cache Performance
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Lecture 21: Memory Hierarchy
Part V Memory System Design
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
Lecture 22: Cache Hierarchies, Memory
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Adapted from slides by Sally McKee Cornell University
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 22: Cache Hierarchies, Memory
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
CSC3050 – Computer Architecture
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Lecture 21: Memory Hierarchy
Cache - Optimization.
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Principle of Locality: Memory Hierarchies
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Morgan Kaufmann Publishers 24 August, 2018 Chapter 5 The Memory Hierarchy 4 Design choices for a cache Block placement Finding a block (these first two are really the shape of the cache) Replacement policy Write policy Types of cache misses: Cold Capacity Conflict How to reduce each? Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 1 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Cache Design Trade-offs Morgan Kaufmann Publishers 24 August, 2018 Cache Design Trade-offs Design change Effect on miss rate Negative performance effect Increase cache size Decrease capacity misses May increase access time Increase associativity Decrease conflict misses Increase block size Decrease compulsory misses Increases miss penalty. For very large block size, may increase miss rate due to pollution. Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Multilevel On-Chip Caches Morgan Kaufmann Publishers 24 August, 2018 Multilevel On-Chip Caches §5.13 The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

2-Level TLB Organization Morgan Kaufmann Publishers 24 August, 2018 2-Level TLB Organization Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 4 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Supporting Multiple Issue Morgan Kaufmann Publishers 24 August, 2018 Supporting Multiple Issue Both have multi-banked caches that allow multiple accesses per cycle assuming no bank conflicts Core i7 cache optimizations Return requested word first Non-blocking cache Hit under miss Miss under miss Data prefetching Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

DGEMM Combine cache blocking and subword parallelism §5.14 Going Faster: Cache Blocking and Matrix Multiply Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 6

Morgan Kaufmann Publishers 24 August, 2018 Pitfalls Byte vs. word addressing Example: 32-byte direct-mapped cache, 4-byte blocks Byte 36 maps to block 1 Word 36 maps to block 4 Ignoring memory system effects when writing or generating code Example: iterating over rows vs. columns of arrays Large strides result in poor locality §5.15 Fallacies and Pitfalls Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 7 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Morgan Kaufmann Publishers 24 August, 2018 Pitfalls In multiprocessor with shared L2 or L3 cache Less associativity than cores results in conflict misses More cores  need to increase associativity Using AMAT to evaluate performance of out-of-order processors Ignores effect of non-blocked accesses Instead, evaluate performance by simulation Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Morgan Kaufmann Publishers 24 August, 2018 Pitfalls Extending address range using segments E.g., Intel 80286 But a segment is not always big enough Makes address arithmetic complicated Implementing a VMM on an ISA not designed for virtualization E.g., non-privileged instructions accessing hardware resources Either extend ISA, or require guest OS not to use problematic instructions Someone summarize this chapter for me. Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Morgan Kaufmann Publishers 24 August, 2018 Concluding Remarks Fast memories are small, large memories are slow We really want fast, large memories  Caching gives this illusion  Principle of locality Programs use a small part of their memory space frequently Memory hierarchy L1 cache  L2 cache  …  DRAM memory  disk Memory system design is critical for multiprocessors §5.16 Concluding Remarks Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10 Chapter 5 — Large and Fast: Exploiting Memory Hierarchy