CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.

Slides:



Advertisements
Similar presentations
CS 206 Introduction to Computer Science II 04 / 10 / 2009 Instructor: Michael Eckmann.
Advertisements

CS 206 Introduction to Computer Science II 03 / 23 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 04 / 27 / 2009 Instructor: Michael Eckmann.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
CS 206 Introduction to Computer Science II 11 / 19 / 2008 Instructor: Michael Eckmann.
1 Balanced Search Trees  several varieties  AVL trees  trees  Red-Black trees  B-Trees (used for searching secondary memory)  nodes are added.
CS 206 Introduction to Computer Science II 12 / 05 / 2008 Instructor: Michael Eckmann.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
CS 206 Introduction to Computer Science II 11 / 04 / 2009 Instructor: Michael Eckmann.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
CS 206 Introduction to Computer Science II 09 / 22 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 02 / 11 / 2009 Instructor: Michael Eckmann.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Index Structures Parin Shah Id:-207. Topics Introduction Structure of B-tree Features of B-tree Applications of B-trees Insertion into B-tree Deletion.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
CS 206 Introduction to Computer Science II 09 / 30 / 2009 Instructor: Michael Eckmann.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B-Trees and Red Black Trees. Binary Trees B Trees spread data all over – Fine for memory – Bad on disks.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
CS 206 Introduction to Computer Science II 10 / 05 / 2009 Instructor: Michael Eckmann.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
CS 206 Introduction to Computer Science II 02 / 13 / 2009 Instructor: Michael Eckmann.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
1 CS 310 – Data Structures All figures labeled with “Figure X.Y” Copyright © 2006 Pearson Addison-Wesley. All rights reserved. photo ©Oregon Scenics used.
Week 8 - Wednesday.  What did we talk about last time?  Level order traversal  BST delete  2-3 trees.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
1 More Trees Trees, Red-Black Trees, B Trees.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Multiway Search Trees Data may not fit into main memory
B+-Trees.
B+-Trees.
B+-Trees.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B+-Trees (Part 1).
CSE 373: Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees.
Presentation transcript:

CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS Fall 2008 Today’s Topics Questions/comments? AVL trees –implementation

An AVL tree is a BST with the added restriction that –for all nodes, the height of its left subtree is at most 1 different than the height of its right subtree. The height of an empty subtree is -1. –A node in an AVL tree contains the data item, reference to left child node reference to right child node height of the node Balanced BSTs

When we insert a node into an AVL tree we first insert as we would in a BST and update the height information in all the nodes on its path back to the root but there are several situations that we can encounter: a) the tree retains the properties of an AVL tree (that is, no node has left and right subtrees that differ in height by more than 1)‏ b) or some node, alpha, along the path back to the root has subtrees that differ in height by 2 (stop there with the updating of height information) --- there are 4 cases 1) node inserted into LST of the LC of alpha 2) node inserted into RST of the LC of alpha 3) node inserted into LST of the RC of alpha 4) node inserted into RST of the RC of alpha Note: only nodes on the path back to the root could possibly be effected by the insertion. LST = left subtree, RST = right subtree, LC = left child, RC = right child Balanced BSTs

Red/Black trees are another form of balanced binary search tree, that we will not discuss. There are others too. What I'd like to introduce now is an idea of a tree that worls well for large amounts of data, too big to all fit in memory (RAM). The data will have to partially reside in memory and the rest (the bulk of it) on disk. Why do we care? Any ideas? Other Balanced Trees

We care because disk accesses take so much longer than main memory accesses. Some numbers on the board. Leading up to B-Trees

We would rather do many many calculations in order to save us from having to access the disk. The worst case height for an AVL tree is 1.44 log n which is close to optimal (least) height for a set of nodes, but we'll see that for large amounts of data that reside on disk, it will not perform as well as we'd like. The worst case height of an optimal BST is log n. Log n is the best we can do with binary trees and that's not good enough. We want a small constant number of disk accesses. So, instead of using a Binary tree, we'll use an M-ary tree (where M>2.)‏ The least height of an M-ary tree is log M n. Example on the board for M=5 B-Trees are discussed in section 19.8 in your text. Leading up to B-Trees

B-Trees were created to take that discrepancy in processing time into account (when we have large volumes of data to process.)‏ A B-tree can guarantee only a few disk accesses. An M-ary B-Tree has the following properties –data items stored in leaves only –interior (non-leaf) nodes store a maximum of M-1 keys –root is a leaf or has from 2 to M children –all non-leaf nodes have between the ceil(M/2) up to M children –all leaves at same depth –all leaves have between the ceil(L/2) up to L children L means... (see next few slides)‏ all non leaf nodes have at least half of M children, so for large M this will guarantee that the M-ary tree will not approach anything close to a binary tree (why is that a good thing?)‏ B-Trees

Each node represents a disk block (the amount of data read in one disk access). –example: if a disk block is 8k each node holds M-1 keys and M branches (links)‏ –let's say our keys are of size 32 bytes each –and a link is 4 bytes How would we determine M? –the largest value such that a node doesn't hold more than 8k Determine L by the number of records we can store in one block. –if our records are 256 bytes, how many could we store in one block of 8k? B-Trees

Each node represents a disk block (the amount of data read in one disk access). –example: if a disk block is 8k (=8192 bytes)‏ each node holds M-1 keys and M branches (links)‏ –let's say our keys are of size 32 bytes each –and a link is 4 bytes How would we determine M? –the largest value such that a node doesn't hold more than 8k 32*(M-1) + 4*M = 8192 M = 228 (is the largest M such that we don't go over 8192)‏ Determine L by the number of records we can store in one block. –if our records are 256 bytes, how many could we store in one block of 8k? 8192/256 = 32 From the rules of our B-Tree, each leaf then has to have between 16 and 32 records and each non-leaf node (except the root) has to have at least 114 children (up to 228 children). B-Trees

Example: –From the rules of our B-Tree, each leaf then has to have between 16 and 32 records and each non-leaf node (except the root) has to have at least 114 children (up to 228 children). –Picking the worst case B-tree for our example, that is the one with the least number of children per node to give us our highest possible B-Tree for 10,000,000 records, we'll have at most 625,000 leaves which means the height of our B-tree is 4. The worst height of an M-ary tree is approx. log M/2 n. Why? B-Trees