Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms,

Slides:



Advertisements
Similar presentations
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
The Study of Cache Oblivious Algorithms Prepared by Jia Guo.
Virtual Memory main memory can act as a cache for secondary storage motivation: Allow programs to use more memory that there is available transparent to.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Hierarchy-conscious Data Structures for String Analysis Carlo Fantozzi PhD Student (XVI ciclo) Bioinformatics Course - June 25, 2002.
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Current Trends in Algorithms, Complexity Theory,
O(log n) bits or atomic elements
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
External-Memory and Cache-Oblivious Algorithms: Theory and Experiments Gerth Stølting Brodal Oberwolfach Workshop on ”Algorithm Engineering”, Oberwolfach,
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Cache-Oblivious B-Trees
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Translation Buffers (TLB’s)
Virtual Memory. Why do we need VM? Program address space: 0 – 2^32 bytes –4GB of space Physical memory available –256MB or so Multiprogramming systems.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Cache Oblivious Search Trees via Binary Trees of Small Height
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Virtual Memory Management B.Ramamurthy. Paging (2) The relation between virtual addresses and physical memory addres- ses given by page table.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Cache Organization of Pentium
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Lecture 19: Virtual Memory
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
Λίστα Εργασιών External Memory Data Structures Vitter, J. S. and Shriver, E. 1994a. Algorithms for parallel memory I: Two-level memories. Algorithmica.
Chapter 91 Logical Address in Paging  Page size always chosen as a power of 2.  Example: if 16 bit addresses are used and page size = 1K, we need 10.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
4.3 Virtual Memory. Virtual memory  Want to run programs (code+stack+data) larger than available memory.  Overlays programmer divides program into pieces.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Demand Paging Reference Reference on UNIX memory management
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms – Matrix multiplication – Distribution sort – Static searching.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Virtual Memory CS740 October 13, 1998 Topics page tables TLBs Alpha 21X64 memory system.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
1 ECE3055 Computer Architecture and Operating Systems Lecture 8 Memory Subsystem Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
Memory and cache CPU Memory I/O.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture 11: DMBS Internals
Part V Memory System Design
Memory and cache CPU Memory I/O.
External-Memory and Cache-Oblivious Algorithms: Theory and Experiments
Virtual Memory فصل هشتم.
Presented By Gregory Giguashvili
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Presentation transcript:

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, [BFJ02] Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob. Cache-Oblivious Search Trees via Binary Trees of Small Height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 39-48, [JM13] Tomasz Jurkiewicz, Kurt Mehlhorn. The cost of address translation, In Proc. 15th Annual Meeting on Algorithm Engineering & Experiments (ALENEX), , 2013.

Memory Hierarchies vs Efficiency  Cache misses (L1, L2, L3,...)  Prefetching  Cache associativity  Virtual to physical mapping  Translation Look-aside Buffer (TLB)  TLB misses

Some Typical Access times LevelAccess timeCache line size L1 Data ~16 KB L1 Instruction ~16 KB 5 ns64 bytes L2 ~512 KB 20 ns64 bytes L3 ~10 MB 30 ns64 bytes Main memory 60 ns Disk 10 ms4 KB

Intel(R) Core(TM) i GHz  32nm, 4 core [8 threads], L1, L2 and L3 line size 64 bytes  L1 instruction 32K 8-way write-through per core  L1 data 32K 8-way write-back per core  L1 cache latency 3 clock cycles  L2 256KB 8-way write-back unified cache per core  L2 cache latency 12 clock cycles  L3 10MB 20-way write-back unified cache shared by ALL cores  L3 cache latency clock cycles  L1 instruction TLB, 4K pages, 64 entries, 4-way  L1 data TLB, 4K pages, 64 entries, 4-way  L2 TLB, 4K pages, 512 entries, 4-way  ALL caches and TLBs use a pseudo LRU replacement policy

Virtual to Physical Address Mapping

Cost of Address Translation [JM13] Tomasz Jurkiewicz, Kurt Mehlhorn. The cost of address translation, In Proc. 15th Annual Meeting on Algorithm Engineering & Experiments (ALENEX), , 2013.

Cache-Oblivious Model  I/O model...but algorithms do not know B and M  Assume optimal cache replacement strategy  Optimal on all levels (under some assumptions) M B [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012.

Recursive Tree Layout (van Emde Boas layout) Harald Prokop 1999, MIT MSc thesis ”Cache-Oblivious Algorithms”, June 1999 Binary treeSearches O(log B N) IOs Range Searches O(log B N + k/B)

Four Tree Layouts DFS BFS Inorder Recursive / van Emde Boas

vEB Random Searches in Pointer Layouts

9-ary bfs Random Searches in Implicit Layouts

Making Trees Dynamic ?  Trees of bounded depth Andersson and Lai 1990  Rebuild subtrees when depth  log n + O(1)  Insert: O(log 2 n) amortized

Static  Dynamic  Emded dynamic tree into a complete tree  Static layout of tree (e.g. van Emde Boas layout)  Search O(log B N)  Update O(log B N + (log 2 N)/B) [BFJ02] Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob. Cache-Oblivious Search Trees via Binary Trees of Small Height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 39-48, 2002.

Insertions into Implicit Layout  Insertions factor slower than searches

Matrix Transpose N x N matrix, divided by N 2 Multiply N x N matrix, divided by N 3 [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012.