CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer 2010 1.

Slides:



Advertisements
Similar presentations
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Tyler Robison Summer
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
CSE332: Data Abstractions Lecture 7: AVL Trees Tyler Robison Summer
CSE332: Data Abstractions Lecture 8: AVL Delete; Memory Hierarchy Dan Grossman Spring 2010.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Cmpt-225 Algorithm Efficiency.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
CMPE 421 Parallel Computer Architecture
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
IT253: Computer Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
CSE373: Data Structure & Algorithms Lecture 24: Memory Hierarchy and Data Locality Aaron Bauer Winter 2014.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality Lauren Milne Summer 2015.
CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality Nicki Dell Spring 2014.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 CSE 326: Data Structures Trees. 2 Today: Splay Trees Fast both in worst-case amortized analysis and in practice Are used in the kernel of NT for keep.
CS321 Data Structures Jan Lecture 2 Introduction.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
CSE373: Data Structures & Algorithms Lecture 26: Memory Hierarchy and Locality 1 Kevin Quinn, Fall 2015.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Ramya Kandasamy CS 147 Section 3
Cache Memory Presentation I
Data Structures and Algorithms
Data Structures and Algorithms
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Lecture 13: Computer Memory
CSE 373 Data Structures and Algorithms
Richard Anderson Spring 2016
CSE 332: Data Abstractions Memory Hierarchy
Presentation transcript:

CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer

Now what? 2  We have a data structure for the dictionary ADT that has worst-case O( log n) behavior  One of several interesting/fantastic balanced-tree approaches  We are about to learn another balanced-tree approach: B Trees  First, to motivate why B trees are better for really large dictionaries (say, over 1GB = 2 30 bytes), need to understand some memory-hierarchy basics  Don’t always assume “every memory access has an unimportant O(1) cost”  Learn more in CSE351/333/471 (and CSE378), focus here on relevance to data structures and efficiency

A typical hierarchy 3 “Every desktop/laptop/server is different” but here is a plausible configuration these days CPU Disk: 1TB = 2 40 Main memory: 2GB = 2 31 L2 Cache: 2MB = 2 21 L1 Cache: 128KB = 2 17 instructions (e.g., addition): 2 30 /sec get data in L1: 2 29 /sec = 2 instructions get data in L2: 2 25 /sec = 30 inst get data in main memory: 2 22 /sec = 250 inst get data from “new place” on disk: 2 7 /sec =8,000,000 inst “streamed”: 2 18 /sec

Morals 4 It is much faster to do:Than: 5 million arithmetic ops1 disk access 2500 L2 cache accesses1 disk access 400 main memory accesses1 disk access Why are computers built this way?  Physical realities (speed of light, closeness to CPU)  Cost (price per byte of different technologies)  Disks get much bigger not much faster  Spinning at 7200 RPM accounts for much of the slowness and unlikely to spin faster in the future  Speedup at higher levels makes lower levels relatively slower  Later in the course: more than 1 CPU!

“Fuggedaboutit”, usually 5 The hardware automatically moves data into the caches from main memory for you  Replacing items already there  So algorithms much faster if “data fits in cache” (often does) Disk accesses are done by software (e.g., ask operating system to open a file or database to access some data) So most code “just runs” but sometimes it’s worth designing algorithms / data structures with knowledge of memory hierarchy  And when you do, you often need to know one more thing…

Block/line size 6  Moving data up the memory hierarchy is slow because of latency (think distance-to-travel)  Since we’re making the trip anyway, may as well carpool  Get a block of data in the same time it would take to get a byte  What to send? How about nearby memory:  It’s easy (close by)  And likely to be asked for soon (spatial locality)  Side note: Once in cache, may as well keep it around for awhile; accessed once, a value is more likely to be accessed again in the near future (more likely than some random other value): temporal locality

Block/line size 7  The amount of data moved from disk into memory is called the “block” size or the “(disk) page” size  Not under program control  The amount of data moved from memory into cache is called the “line” size  As in “cache line”  Not under program control  Not under our control, but good to be aware of

Connection to data structures 8  An array benefits more than a linked list from block moves  Language (e.g., Java) implementation can put the linked list nodes anywhere, whereas array is typically contiguous memory  Arrays benefit more from spatial locality  Note: “array” doesn’t mean “good”  Sufficiently large array won’t fit in one block  Binary heaps “make big jumps” to percolate (different block)

BSTs? 9  Since looking things up in balanced binary search trees is O( log n), even for n = 2 39 (512GB) we don’t have to worry about minutes or hours  Still, number of disk accesses matters  AVL tree could have height of, say, 55  Which, based on our proof, is a lot of nodes  Most of the nodes will be on disk: the tree is shallow, but it is still many gigabytes big so the tree cannot fit in memory  Even if memory holds the first 25 nodes on our path, we still need 30 disk accesses

Note about numbers; moral 10  All the numbers in this lecture are “ballpark” “back of the envelope” figures  Even if they are off by, say, a factor of 5, the moral is the same: If your data structure is mostly on disk, you want to minimize disk accesses  A better data structure in this setting would exploit the block size to avoid disk accesses…