CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.

Slides:



Advertisements
Similar presentations
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
Advertisements

Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
© 2004 Goodrich, Tamassia (2,4) Trees
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Memory Management Last Update: July 31, 2014 Memory Management1.
CSC 213 – Large Scale Programming. Today’s Goals  Review a new search tree algorithm is needed  What real-world problems occur with old tree?  Why.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
CSC 213 – Large Scale Programming. What is “the BTree?”  Common multi-way tree implementation  Every BTree has an order (“BTree of order m ”) ‏  m.
CSC 213 – Large Scale Programming. Problems with Search Trees  Great at organizing information for searching  Processing is maintained at consistent.
IT253: Computer Organization
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
B-Trees and Red Black Trees. Binary Trees B Trees spread data all over – Fine for memory – Bad on disks.
 … we have been assuming that the data collections we have been manipulating were entirely stored in memory.
CSC 213 Lecture 10: BTrees. Announcements You should not need to do more than the lab exercise states  If only says add a CharRange, you should not need.
CSC 213 – Large Scale Programming. Announcements Tuesday, May 10 from 10:15 – 12:15 in OM200  CSC213 final exam has been scheduled: Tuesday, May 10 from.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.
CSC 213 Lecture 8: (2,4) Trees. Review of Last Lecture Binary Search Tree – plain and tall No balancing, no splaying, no speed AVL Tree – liberté, égalité,
CSC 213 – Large Scale Programming Lecture 38: BTrees.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
CSC 213 – Large Scale Programming. Explicit Memory Management  Traditional form of memory management  Used a lot, but fallen out of favor  malloc /
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
Tree-Structured Indexes
B-Trees 7/5/2018 4:26 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
B+ Tree.
Chapter Trees and B-Trees
Chapter Trees and B-Trees
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
Other time considerations
B-Trees CSE 373 Data Structures CSE AU B-Trees.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees

Today’s Goal Look at advanced Tree structures Part of most databases, operating systems Anywhere there is lot of data to be held Already examined related (2,4) trees Now look at more general definition Also examine why we should care

Big-Oh notation not always accurate For example, treats memory accesses equally But many different memories inside machine Organized in a pyramid Higher == faster Lower == cheaper (Cheaper also means more memory available) register L1 cache main memory (RAM) hard drive L2 cache Lies My Professor Told Me

Hierarchy In Perspective Suppose the processor needs a beverage Registers -- Drink from the mug in its hand L1 Cache -- Get from a case in the fridge L2 Cache -- Get from tapped barrel in the cellar Main memory -- Purchase corner Wilson Farms Hard drive -- Drive to closest brewery & buy vat Network -- Go to Germany & buy Bavaria

Waiting Is a Pain

Not All Access Are Equal Want to limit access to lowest possible level Easy when we are only using a few objects Difficult when working with non-trivial data sets Two common approaches to avoid the wait Caching -- hold data from hard drive in RAM Usually stores most recently or frequently used data Locality -- organize data to limit amount used By matching internal storage to improve cache effectiveness

Virtual Memory “Extends” RAM by using space on hard drive Big win if we rarely access the material on disk Incredibly slow if always stuck driving to brewery Works by dividing memory into pages Each page is a constant size (usually 4096 bytes) Operating system handles memory at page level Limits overhead and maximizes efficiency Evicts unused pages to the hard drive for storage Reloads pages when it is then accessed

Problems with Binary Trees Good way to organize information Provides consistent O (log n ) processing times Organization is very bad for locality, however Nodes contain only 1 piece of data Must then jump to one of its two children Nodes can get randomly spread over heap Good torture test for roommates computer (2,4) trees provide some improvement Still have at most 3 elements & 4 children Does not use anything like 4096 bytes in a page

( a, b ) Trees to the Rescue! Real-world solution to killing disks by paging Linux & MacOS to track files & directories Organization used by MySQL & other databases Found in many other places where paging occurs (2,4) trees are one example of these Can also create others, just follow the rules All leaves are found at same level of the tree All internal nodes but root have at least a children All internal nodes have at most b child Nodes

Improving Locality For (2,4) trees, a == 2 and b == 4 Process of splitting and merging nodes still holds We only vary the number of children in Node Minimize paging using good size for a & b Store all the elements in an additional dictionary Make sure full node, including dictionary and child references fill a page Limit number of nearly empty pages by selecting reasonable value for a

Insertion Always insert data into a leaf node Once inserted check for overflow! Trying to make larger than allowed Example: insert(30)

Split In Case Of Overflow Split overflowing Node 2 new nodes Promote median element to the parent Node Divide remaining elements into the two new Nodes This may cause parent Node to overflow So must repeat the process until we hit the root If the root node overflows, we create a new root!

Parent Overflow Example: insert(29)

Parent Overflow Example: insert(29)

Underflow and Fusion Deleting Entry may cause underflow Two possible solutions depending on situation Example: remove(15)

6 8 Case 1: Transfer Has adjacent sibling with elements to spare Steal closest Entry from parent & sibling’s child Parent takes sibling’s closest Entry We’re done Example: remove(10)

Case 2: Fusion Emptied node has siblings of minimum size Merge node & sibling into one Steal Entry from parent that was between siblings May propagate underflow to parent! Example: remove(15)

For Next Lecture Look at most popular version of ( a, b )Tree How a BTree is implemented Ways of reading an writing these trees to disk