Search Trees: BSTs and B-Trees David Kauchak cs161 Summer 2009.

Slides:



Advertisements
Similar presentations
COL 106 Shweta Agrawal and Amit Kumar
Advertisements

Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Trees Types and Operations
Search Trees: BSTs and B-Trees David Kauchak cs302 Spring 2013.
Binary Search Trees Azhar Maqsood School of Electrical Engineering and Computer Sciences (SEECS-NUST)
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Binary Search Trees Briana B. Morrison Adapted from Alan Eugenio.
Trees, Binary Trees, and Binary Search Trees COMP171.
BST Data Structure A BST node contains: A BST contains
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Fundamentals of Python: From First Programs Through Data Structures
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 BST Trees A binary search tree is a binary tree in which every node satisfies the following: the key of every node in the left subtree is.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
David Luebke 1 9/18/2015 CS 332: Algorithms Red-Black Trees.
Chapter 12. Binary Search Trees. Search Trees Data structures that support many dynamic-set operations. Can be used both as a dictionary and as a priority.
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Balanced Search Trees Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005.
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Binary SearchTrees [CLRS] – Chap 12. What is a binary tree ? A binary tree is a linked data structure in which each node is an object that contains following.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
Binary Search Trees (10.1) CSE 2011 Winter November 2015.
Topic 15 The Binary Search Tree ADT Binary Search Tree A binary search tree (BST) is a binary tree with an ordering property of its elements, such.
Outline Binary Trees Binary Search Tree Treaps. Binary Trees The empty set (null) is a binary tree A single node is a binary tree A node has a left child.
 Trees Data Structures Trees Data Structures  Trees Trees  Binary Search Trees Binary Search Trees  Binary Tree Implementation Binary Tree Implementation.
S. Raskhodnikova and A. Smith. Based on slides by C. Leiserson and E. Demaine. 1 Adam Smith L ECTURES Binary Search Trees Algorithms and Data Structures.
Lec 15 Oct 18 Binary Search Trees (Chapter 5 of text)
Preview  Graph  Tree Binary Tree Binary Search Tree Binary Search Tree Property Binary Search Tree functions  In-order walk  Pre-order walk  Post-order.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
David Stotts Computer Science Department UNC Chapel Hill.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Heaps and basic data structures David Kauchak cs161 Summer 2009.
Binary Tree. Some Terminologies Short review on binary tree Tree traversals Binary Search Tree (BST)‏ Questions.
1/14/20161 BST Operations Data Structures Ananda Gunawardena.
Binary Search Trees (BST)
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
Binary Search Trees … From
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Mergeable Heaps David Kauchak cs302 Spring Admin Homework 7?
CSE 2331/5331 Topic 8: Binary Search Tree Data structure Operations.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CS6045: Advanced Algorithms Data Structures. Dynamic Sets Next few lectures will focus on data structures rather than straight algorithms In particular,
Multiway Search Trees Data may not fit into main memory
BST Trees
Binary Search Tree (BST)
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Lecture 22 Binary Search Trees Chapter 10 of textbook
Binary Trees, Binary Search Trees
Binary Search Trees.
Lec 12 March 9, 11 Mid-term # 1 (March 21?)
Wednesday, April 18, 2018 Announcements… For Today…
Binary Trees, Binary Search Trees
Binary Trees, Binary Search Trees
Presentation transcript:

Search Trees: BSTs and B-Trees David Kauchak cs161 Summer 2009

Administrative Midterm SCPD contacts Review session: Friday, 7/17 2:15-4:05pm in Skilling Auditorium Practice midterm Homework late/grading policy HW 2 solution HW 3 Feedback

Number guessing game I’m thinking of a number between 1 and n You are trying to guess the answer For each guess, I’ll tell you “correct”, “higher” or “lower” Describe an algorithm that minimizes the number of guesses

Binary Search Trees BST – A binary tree where a parent’s value is greater than its left child and less than or equal to it’s right child Why not? Can be implemented with with pointers or an array

Example

What else can we say? All elements to the left of a node are less than the node All elements to the right of a node are greater than or equal to the node The smallest element is the left-most element The largest element is the right-most element

Another example: the loner 12

Another example: the twig

Operations Search(T,k) – Does value k exist in tree T Insert(T,k) – Insert value k into tree T Delete(T,x) – Delete node x from tree T Minimum(T) – What is the smallest value in the tree? Maximum(T) – What is the largest value in the tree? Successor(T,x) – What is the next element in sorted order after x Predecessor(T,x) – What is the previous element in sorted order of x Median(T) – return the median of the values in tree T

Search How do we find an element?

Finding an element Search(T, 9)

Finding an element Search(T, 9)

Finding an element Search(T, 9) 9 > 12?

Finding an element Search(T, 9)

Finding an element Search(T, 9)

Finding an element Search(T, 13)

Finding an element Search(T, 13)

Finding an element Search(T, 13)

Finding an element ? Search(T, 13)

Iterative search

Is BSTSearch correct?

Running time of BST Worst case? O(height of the tree) Average case? O(height of the tree) Best case? O(1)

Height of the tree Worst case height? n-1 “the twig” Best case height? floor(log 2 n) complete (or near complete) binary tree Average case height? Depends on two things: the data how we build the tree!

Insertion

Similar to search

Insertion Similar to search Find the correct location in the tree

Insertion keeps track of the previous node we visited so when we fall off the tree, we know

Insertion add node onto the bottom of the tree

Correctness? maintain BST property

Correctness What happens if it is a duplicate?

Inserting duplicate Insert(T, 14)

Running time O(height of the tree)

Running time O(height of the tree) Why not Θ(height of the tree)?

Running time Insert(T, 15)

Height of the tree Worst case: “the twig” – When will this happen?

Height of the tree Best case: “complete” – When will this happen?

Height of the tree Average case for random data? Randomly inserted data into a BST generates a tree on average that is O(log n)

Visiting all nodes In sorted order

Visiting all nodes In sorted order

Visiting all nodes In sorted order , 8

Visiting all nodes In sorted order , 8, 9

Visiting all nodes In sorted order , 8, 9, 12

Visiting all nodes What’s happening? , 8, 9, 12

Visiting all nodes In sorted order , 8, 9, 12, 14

Visiting all nodes In sorted order , 8, 9, 12, 14, 20

Visiting all nodes in order

any operation

Is it correct? Does it print out all of the nodes in sorted order?

Running time? Recurrence relation: j nodes in the left subtree n – j – 1 in the right subtree Or How much work is done for each call? How many calls? Θ(n)

What about?

Preorder traversal , 8, 5, 9, 14, 20 How is this useful? Tree copying: insert in to new tree in preorder prefix notation: (2+3)*4 -> *

What about?

Postorder traversal , 9, 8, 20, 14, 12 How is this useful? postfix notation: (2+3)*4 -> * ?

Min/Max

Running time of min/max? O(height of the tree)

Successor and predecessor Predecessor(12)?9

Successor and predecessor Predecessor in general? largest node of all those smaller than this node rightmost element of the left subtree

Successor Successor(12)?13

Successor Successor in general? smallest node of all those larger than this node leftmost element of the right subtree

Successor What if the node doesn’t have a right subtree? smallest node of all those larger than this node leftmost element of the right subtree 9 5

Successor What if the node doesn’t have a right subtree? node is the largest the successor is the node that has x as a predecessor 9

Successor successor is the node that has x as a predecessor 9

Successor successor is the node that has x as a predecessor 9

Successor successor is the node that has x as a predecessor 9

Successor successor is the node that has x as a predecessor 9 keep going up until we’re no longer a right child

Successor

if we have a right subtree, return the smallest of the right subtree

Successor find the node that x is the predecessor of keep going up until we’re no longer a right child

Successor running time O(height of the tree)

Deletion Three cases!

Deletion: case 1 No children Just delete the node

Deletion: case 1 No children Just delete the node

Deletion: case 2 One child Splice out the node

Deletion: case 2 One child Splice out the node

Deletion: case 3 Two children Replace x with it’s successor

Deletion: case 3 Two children Replace x with it’s successor

Deletion: case 3 Two children Will we always have a successor? Why successor? No children Larger than the left subtree Less than or equal to right subtree

Height of the tree Most of the operations takes time O(height of the tree) We said trees built from random data have height O(log n), which is asymptotically tight Two problems: We can’t always insure random data What happens when we delete nodes and insert others after building a tree?

Balanced trees Make sure that the trees remain balanced! Red-black trees AVL trees trees … B-trees

B-tree Defined by one parameter: t Balance n-ary tree Each node contains between t-1 and 2t-1 keys/data values (i.e. multiple data values per tree node) keys/data are stored in sorted order one exception: root can have only < t-1 keys Each internal node contains between t and 2t children the keys of a parent delimit the values of the children keys For example, if key i = 15 and key i+1 = 25 then child i + 1 must have keys between 15 and 25 all leaves have the same depth

Example B-tree: t = 2 AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 Balanced: all leaves have the same depth AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 Each node contains between t-1 and 2t – 1 keys stored in increasing order AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 Each node contains between t and 2t children AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 The keys of a parent delimit the values that a child’s keys can take AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 The keys of a parent delimit the values that a child’s keys can take AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 The keys of a parent delimit the values that a child’s keys can take AHDE F G N T C Q L MR S W K Y Z X P

Example B-tree: t = 2 The keys of a parent delimit the values that a child’s keys can take AHDE F G N T C Q L MR S W K Y Z X P

When do we use B-trees over other balanced trees? B-trees are generally an on-disk data structure Memory is limited or there is a large amount of data to be stored In the extreme, only one node is kept in memory and the rest on disk Size of the nodes is often determined by a page size on disk. Why? Databases frequently use B-trees

Notes about B-trees Because t is generally large, the height of a B-tree is usually small t = 1001 with height 2 can have over one billion values We will count both run-time as well as the number of disk accesses. Why?

Height of a B-tree B-trees have a similar feeling to BSTs We saw for BSTs that most of the operations depended on the height of the tree How can we bound the height of the tree? We know that nodes must have a minimum number of keys/data items For a tree of height h, what is the smallest number of keys?

Minimum number of nodes at each depth? AHDE F G N T C Q L MR S W K Y Z X P 2 children 2t children 2t h-1 children In general? 1 root

Minimum number of keys/values root min. keys per node min. number of nodes

Minimum number of nodes so,

Searching B-Trees number of keys key[i] child[i]

Searching B-Trees make disk reads explicit

Searching B-Trees iterate through the sorted keys and find the correct location

Searching B-Trees if we find the value in this node, return it

Searching B-Trees if it’s a leaf and we didn’t find it, it’s not in the tree

Searching B-Trees Recurse on the proper child where the value is between the keys

Search example: R AHDE F G N T C Q L MR S W K Y Z X P

Search example: R AHDE F G N T C Q L MR S W K Y Z X P

Search example: R find the correct location AHDE F G N T C Q L MR S W K Y Z X P

Search example: R the value is not in this node AHDE F G N T C Q L MR S W K Y Z X P

Search example: R this is not a leaf node AHDE F G N T C Q L MR S W K Y Z X P

Search example: R AHDE F G N T C Q L MR S W K Y Z X P

Search example: R AHDE F G N T C Q L MR S W K Y Z X P find the correct location

Search example: R AHDE F G N T C Q L MR S W K Y Z X P not in this node and this is not a leaf

Search example: R AHDE F G N T C Q L MR S W K Y Z X P

Search example: R AHDE F G N T C Q L MR S W K Y Z X P find the correct location

Search example: R AHDE F G N T C Q L MR S W K Y Z X P

Search running time How many calls to BTreeSearch? O(height of the tree) O(log t n) Disk accesses One for each call – O(log t n) Computational time: O(t) keys per node linear search O(t log t n) Why not binary search to find key in a node?

B-Tree insert Starting at root, follow the search path down the tree If the node is full (contains 2t - 1 keys) split the keys into two nodes around the median value add the median value to the parent node If the node is a leaf, insert it into the correct spot Observations Insertions always happen in the leaves When does the height of a B-tree grow? Why do we know it’s always ok when we’re splitting a node to insert the median value into the parent?

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S

Insertion: t = 2 G G C N A H E K Q M F W L T Z D P R X Y S

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S C G

Insertion: t = 2 C G N G C N A H E K Q M F W L T Z D P R X Y S

Insertion: t = 2 C G N G C N A H E K Q M F W L T Z D P R X Y S Node is full, so split

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S Node is full, so split G CN

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A CN

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A CN ?

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A CH N

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A CH N ?

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A C EH N

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A C EH N ?

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A C EH K N

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A C EH K N ?

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G A C EH K N Node is full, so split

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G K A C E Node is full, so split HN

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G K A C EHN Q

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G K A C EHM N Q

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S G K A C EHM N Q

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S C G K AHM N QE

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S C G K AHM N QE F

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S C G K AHM N QE F

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S C G K AHM N QE F root is full, so split ?

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S AHM N QE F root is full, so split G C K

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S AHM N QE F node is full, so split G C K

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S AHE F node is full, so split G C K N MQ

Insertion: t = 2 G C N A H E K Q M F W L T Z D P R X Y S AHE F G C K N MQ W

Insertion: t = 2 G C N A H E K Q M F W … AHE F G C K N MQ W

Correctness of insert Starting at root, follow search path down the tree If the node is full (contains 2t - 1 keys), split the keys around the median value into two nodes and add the median value to the parent node If the node is a leaf, insert it into the correct spot Does it add the value in the correct spot? Follows the correct search path Inserts in correct position

Correctness of insert Starting at root, follow search path down the tree If the node is full (contains 2t - 1 keys), split the keys around the median value into two nodes and add the median value to the parent node If the node is a leaf, insert it into the correct spot Do we maintain a proper B-tree? Maintain t-1 to 2t-1 keys per node? Always split full nodes when we see them Only split full nodes All leaves at the same level? Only add nodes at leaves

Insert running time Without any splitting Similar to BTreeSearch, with one extra disk write at the leaf O(log t n) disk accesses O(t log t n) computation time

When a node is split How many disk accesses? 3 disk write operations 2 for the new nodes created by the split (one is reused, but must be updated) 1 for the parent node to add median value Runtime to split a node O(t) – iterating through the elements a few times since they’re already in sorted order Maximum number of nodes split for a call to insert? O(height of the tree)

Running time of insert O(log t n) disk accesses O(t log t n) computational costs

Deleting a node from a B-tree Similar to insertion must make sure we maintain B-tree properties (i.e. all leaves same depth and key/node restrictions) Proactively move a key from a child to a parent if the parent has t-1 keys O(log t n) disk accesses O(t log t n) computational costs

Summary of operations Search, Insertion, Deletion disk accesses: O(log t n) computation: O(t log t n) Max, Min disk accesses: O(log t n) computation: O(log t n) Tree traversal disk accesses: if 2t ~ page size: O(minimum # pages to store data) Computation: O(n)

Done

B-tree A balanced n-ary tree: Each node x contains between t-1 and 2t-1 keys (denoted n(x)) stored in increasing order, denoted K x keys are the actual data multiple data points per node

B-tree A balanced n-ary tree: Each internal node also contains n(x)+1 children (t and 2t ), denoted C x =C x [1], C x [2], …, C x [n(x)+1] The keys of a parent delimit the values that a child’s keys can take: For example, if K x [i] = 15 and K x [i+1] = 25 then child i + 1 must have keys between 15 and 25

B-tree A balanced n-ary tree: all leaves have the same depth