Lecture 11: Binary Search Trees Shang-Hua Teng. Data Format KeysEntryKeysSatellite data.

Slides:



Advertisements
Similar presentations
David Luebke 1 6/1/2014 CS 332: Algorithms Medians and Order Statistics Structures for Dynamic Sets.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Jan Binary Search Trees What is a search binary tree? Inorder search of a binary search tree Find Min & Max Predecessor and successor BST insertion.
CS 332: Algorithms Binary Search Trees. Review: Dynamic Sets ● Next few lectures will focus on data structures rather than straight algorithms ● In particular,
ALGORITHMS THIRD YEAR BANHA UNIVERSITY FACULTY OF COMPUTERS AND INFORMATIC Lecture six Dr. Hamdy M. Mousa.
David Luebke 1 5/4/2015 Binary Search Trees. David Luebke 2 5/4/2015 Dynamic Sets ● Want a data structure for dynamic sets ■ Elements have a key and satellite.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Binary Search Trees Comp 550.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
TTIT33 Algorithms and Optimization – Lecture 5 Algorithms Jan Maluszynski - HT TTIT33 – Algorithms and optimization Lecture 5 Algorithms ADT Map,
Binary Search Trees CIS 606 Spring Search trees Data structures that support many dynamic-set operations. – Can be used as both a dictionary and.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Universal Hashing When attempting to foil an malicious adversary, randomize the algorithm Universal hashing: pick a hash function randomly when the algorithm.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
1.1 Data Structure and Algorithm Lecture 12 Binary Search Trees Topics Reference: Introduction to Algorithm by Cormen Chapter 13: Binary Search Trees.
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
David Luebke 1 7/2/2015 ITCS 6114 Binary Search Trees.
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
CS 2133: Data Structures Binary Search Trees.
Spring 2015 Lecture 6: Hash Tables
David Luebke 1 9/18/2015 CS 332: Algorithms Red-Black Trees.
Chapter 12. Binary Search Trees. Search Trees Data structures that support many dynamic-set operations. Can be used both as a dictionary and as a priority.
Elementary Data Structures Data Structures and Algorithms A. G. Malamos.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 4: Data Structures.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
2IL50 Data Structures Fall 2015 Lecture 7: Binary Search Trees.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Binary SearchTrees [CLRS] – Chap 12. What is a binary tree ? A binary tree is a linked data structure in which each node is an object that contains following.
Outline Binary Trees Binary Search Tree Treaps. Binary Trees The empty set (null) is a binary tree A single node is a binary tree A node has a left child.
Lecture 9 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Red-Black Trees. Review: Binary Search Trees ● Binary Search Trees (BSTs) are an important data structure for dynamic sets ● In addition to satellite.
Mudasser Naseer 1 1/25/2016 CS 332: Algorithms Lecture # 10 Medians and Order Statistics Structures for Dynamic Sets.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSE 2331/5331 Topic 8: Binary Search Tree Data structure Operations.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures.
CS6045: Advanced Algorithms Data Structures. Dynamic Sets Next few lectures will focus on data structures rather than straight algorithms In particular,
Many slides here are based on E. Demaine , D. Luebke slides
Binary Search Trees What is a binary search tree?
CS 332: Algorithms Hash Tables David Luebke /19/2018.
CS 332: Algorithms Red-Black Trees David Luebke /20/2018.
CS200: Algorithms Analysis
Lecture 7 Algorithm Analysis
Binary Trees, Binary Search Trees
Lecture 7 Algorithm Analysis
CS6045: Advanced Algorithms
Lecture 7 Algorithm Analysis
Binary Trees, Binary Search Trees
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 5243: Algorithms Hash Tables.
Binary SearchTrees [CLRS] – Chap 12.
CS 3343: Analysis of Algorithms
Chapter 12&13: Binary Search Trees (BSTs)
Binary Trees, Binary Search Trees
Presentation transcript:

Lecture 11: Binary Search Trees Shang-Hua Teng

Data Format KeysEntryKeysSatellite data

Insertion and Deletion on dynamic sets Insert(S,x) –A modifying operation that augments the set S with the element pointed by x Delete –Given a pointer x to an element in the set S, removes x from S –Notice that this operation uses a pointer to an element x, not a key value

Querying on dynamic sets Search(S,k) –given a set S and a key value k, returns a pointer x to an element in S such that key[x] = k, or NIL if no such element belongs S Minimum(S) –on a totally ordered set S that returns a pointer to the element S with the smallest key Maximum(S) Successor(S,x) –Given an element x whose key is from a totally ordered set S, returns a pointer to the next larger element in S, or NIL if x is the maximum element Predecessor(S,x)

subtree Trees Node Edge parent root leaf interior node child path Degree? Depth/Level? Height?

Binary Tree Node –data –left child –right child –parent (optional) Parent Data Left Right Tree root

Trees Full tree of height h –all leaves present at level h –all interior nodes full –total number of nodes in a full binary tree? Complete tree of height h

Applications - Expression Trees * (5*3)+(8-4) To represent infix expressions

Applications - Parse Trees Used in compilers to check syntax statement ifcondthen statementelsestatement ifcondthen statement

Application - Game Trees

Binary Search Trees Binary Search Trees (BSTs) are an important data structure for dynamic sets In addition to satellite data, elements have: –key: an identifying field inducing a total ordering –left: pointer to a left child (may be NULL) –right: pointer to a right child (may be NULL) –p: pointer to a parent node (NULL for root)

Binary Search Trees BST property: key[leftSubtree(x)]  key[x]  key[rightSubtree(x)] Example: F BH KDA

Traversals for a Binary Tree Pre order –visit the node –go left –go right In order –go left –visit the node –go right Post order –go left –go right –visit the node Level order / breadth first –for d = 0 to height visit nodes at level d

Traversal Examples A B D G C E H F I Pre order A B D G H C E F I In order G D H B A E C F I Post order G H D B E I F C A Level order A B C D E F G H I

Traversal Implementation recursive implementation of preorder –base case? –self reference visit node pre-order(left child) pre-order(right child) What changes need to be made for in-order, post-order?

Inorder Tree Walk In order TreeWalk(x) TreeWalk(left[x]); print(x); TreeWalk(right[x]); Prints elements in sorted (increasing) order

In order Tree Walk Example: How long will a tree walk take? In order walk prints in monotonically increasing order. Why? F BH KDA

Evaluating an expression tree Walk the tree in postorder When visiting a node, use the results of its children to evaluate it *

Operations on BSTs: Search Given a key and a pointer to a node, returns an element with that key or NULL: TreeSearch(x, k) if (x = NULL or k = key[x]) return x; if (k < key[x]) return TreeSearch(left[x], k); else return TreeSearch(right[x], k);

BST Search: Example Search for D and C: F BH KDA

Operations of BSTs: Insert Adds an element x to the tree so that the binary search tree property continues to hold The basic algorithm –Like the search procedure above –Insert x in place of NULL –Use a “trailing pointer” to keep track of where you came from (like inserting into singly linked list)

BST Insert: Example Example: Insert C F BH KDA C

BST Search/Insert: Running Time What is the running time of TreeSearch() or TreeInsert()? – O(h), where h = height of tree What is the height of a binary search tree? –worst case: h = O(n) when tree is just a linear string of left or right children We’ll keep all analysis in terms of h for now Later we’ll see how to maintain h = O(lg n)

Animation Animated Binary Tree

Sorting With Binary Search Trees Informal code for sorting array A of length n: BSTSort(A) for i=1 to n TreeInsert(A[i]); InorderTreeWalk(root); Argue that this is  (n lg n) What will be the running time in the –Worst case? –Average case? (hint: remind you of anything?)

Sorting With BSTs Average case analysis –It’s a form of quicksort! for i=1 to n TreeInsert(A[i]); InorderTreeWalk(root);

Sorting with BSTs Same partitions are done as with quicksort, but in a different order –In previous example Everything was compared to 3 once Then those items < 3 were compared to 1 once Etc. –Same comparisons as quicksort, different order!

Sorting with BSTs Since run time is proportional to the number of comparisons, same expected time as quicksort: O(n lg n) Which do you think is better, quicksort or BSTsort? Why?

Sorting with BSTs Since run time is proportional to the number of comparisons, same time as quicksort: O(n lg n) Which do you think is better, quicksort or BSTSort? Why? Answer: quicksort –Better constants –Sorts in place –Doesn’t need to build data structure

More BST Operations A priority queue supports –Insert –Minimum –Extract-Min BSTs are good for more than sorting. For example, can implement a priority queue

BST Operations: Minimum How can we implement a Minimum() query? What is the running time? – O(h)

BST Operations: Successor For deletion, we will need a Successor() operation Draw Fig 13.2 What is the successor of node 3? Node 15? Node 13? What are the general rules for finding the successor of node x? (hint: two cases)

BST Operations: Successor Two cases: –x has a right subtree: successor is minimum node in right subtree –x has no right subtree: successor is first ancestor of x whose left child is also ancestor of x Intuition: As long as you move to the left up the tree, you’re visiting smaller nodes. Predecessor: similar algorithm

BST Operations: Delete Deletion is a bit tricky 3 cases: –x has no children: Remove x –x has one child: Splice out x –x has two children: Swap x with successor Perform case 1 or 2 to delete it F BH KDA C Example: delete K or H or B

BST Operations: Delete Why will case 2 always go to case 0 or case 1? –because when x has 2 children, its successor is the minimum in its right subtree Could we swap x with predecessor instead of successor? –Of course

Next Lecture Up next: guaranteeing an O(lg n) height tree

Dictionary/Table Keys Operation supported: search Given a student ID find the record (entry)

Data Format KeysEntry

What if student ID is 9-digit social security number Well, we can still sort by the ids and apply binary search. If we have n students, we need O(n) space And O(log n) search time

What if new students come and current students leave Dynamic dictionary –Yellow page update once in a while –Which is not truly dynamic Operations to support –Insert: add a new (key, entry) pair –Delete: remove a (key, entry) pair from the dictionary –Search: Given a key, find if it is in the dictionary, and if it is, return the data record associated with the key

How should we implement a dynamic dictionary? How often are entries inserted and removed? How many of the possible key values are likely to be used? What is the likely pattern of searching for keys?

(Key,Entry) pair For searching purposes, it is best to store the key and the entry separately (even though the key’s value may be inside the entry) “Smith”“Smith”, “124 Hawkers Lane”, “ ”“Yeo”“Yeo”, “1 Apple Crescent”, “ ” keyentry (key,entry)

Implementation 1: unsorted sequential array An array in which (key,entry)- pair are stored consecutively in any order insert: add to back of array; O(1) search: search through the keys one at a time, potentially all of the keys; O(n) remove: find + replace removed node with last node; O(n) 0 … keyentry and so on

Implementation 2: sorted sequential array An array in which (key,entry) pair are stored consecutively, sorted by key insert: add in sorted order; O(n) find: binary search; O(log n) remove: find, remove node and shuffle down; O(n) 0 … keyentry and so on

Implementation 3: linked list (unsorted or sorted) (key,entry) pairs are again stored consecutively insert: add to front; O(1) or O(n) for a sorted list find: search through potentially all the keys, one at a time; O(n) still O(n) for a sorted list remove: find, remove using pointer alterations; O(n) keyentry and so on

Direct Addressing Suppose: –The range of keys is 0..m-1 (Universe) –Keys are distinct The idea: –Set up an array T[0..m-1] in which T[i] = xif x  T and key[x] = i T[i] = NULLotherwise

Direct addressing is a simple technique that works well when the universe of keys is small. Assuming each key corresponds to a unique slot. Direct-Address-Search(T,k) return T[k] Direct-Address-Insert(T,x) return T[key[x]]  x Direct-Address-Delete(T,x) return T[key[x]]  Nil / / / / / entry Direct-address Table O(1) time for all operations

The Problem With Direct Addressing Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers? –Example: spell checking –Problem 1: direct-address table will have 2 32 entries, more than 4 billion –Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be Solution: map keys to smaller range 0..m-1 This mapping is called a hash function

Hash function A hash function determines the slot of the hash table where the key is placed. Previous example the hash function is the identity function We say that a record with key k hashes into slot h(k) T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

Next Problem collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

Pigeonhole Principle Parque de las Palomas San Juan, Puerto Rico

Resolving Collisions How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing

Chaining Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k4k4 k1k1 —— k7k7 k3k3 k8k8 k6k6 k5k5 k2k2

Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 ——

Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 —— k2k2 k3k3

Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 —— k2k2 k3k3 k4k4 k1k1

Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 —— k2k2 k3k3 k4k4 k1k1 k5k5 k2k2 k6k6

Chaining (Insert to the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k4k4 k1k1 —— k7k7 k3k3 k8k8 k6k6 k5k5 k2k2

Operations Direct-Hash-Search(T,k) Search for an element with key k in list T[h(k)] (running time is proportional to length of the list) Direct-Hash-Insert(T,x) (worst case O(1)) Insert x at the head of the list T[h(key[x])] Direct-Hash-Delete(T,x) Delete x from the list T[h(key[x])] (For singly linked list we might need to find the predecessor first. So the complexity is just like that of search)

Analysis of hashing with chaining Given a hash table with m slots and n elements The load factor  = n/m The worst case behavior is when all n elements hash into the same location (  (n) for searching) The average performance depends on how well the hash function distributes elements Assumption: simple uniform hashing: Any element is equally likely to hash into any of the m slot For any key h(k) can be computed in O(1) Two cases for a search: –The search is unsuccessful –The search is successful

Unsuccessful search Theorem 11.1 : In a hash table in which collisions are resolved by chaining, an unsuccessful search takes  (1+  ), on the average, under the assumption of simple uniform hashing. Proof: Simple uniform hashing  any key k is equally likely to hash into any of the m slots. The average time to search for a given key k is the time it takes to search a given slot. The average length of each slot is  = n/m: the load factor. The time it takes to compute h(k) is O(1).  Total time is  (1+  ).

Successful Search Theorem 11.2 : In a hash table in which collisions are resolved by chaining, a successful search takes  (1+  ), under the assumption of simple uniform hashing. Proof: Simple uniform hashing  any key k is equally likely to hash into any of the m slots. Note Chained-Hash-Insert inserts a new element in the front of the list The expected number of elements visited during the search is 1 more than the number of elements of the list after the element is inserted

Successful Search Take the average over the n elements (i  1)/m is the expected length of the list to which i was added. The expected length of each list increases as more elements are added. (1) (2) (3)

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? O(1+  ) What will be the average cost of a successful search? O(1 +  /2) = O(1 +  )

Analysis of Chaining Continued So the cost of searching = O(1 +  ) If the number of keys n is proportional to the number of slots in the table, what is  ? A:  = O(1) –In other words, we can make the expected cost of searching constant if we make  constant

Choosing A Hash Function Choosing the hash function well is crucial –Bad hash function puts all elements in same slot –A good hash function: Should distribute keys uniformly into slots Should not depend on patterns in the data Three popular methods: –Division method –Multiplication method –Universal hashing

The Division Method h(k) = k mod m –In words: hash k into a table with m slots using the slot given by the remainder of k divided by m Elements with adjacent keys hashed to different slots: good If keys bear relation to m: bad In Practice: pick table size m = prime number not too close to a power of 2 (or 10)

The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA -  kA  )  In practice: –Choose m = 2 P –Choose A not too close to 0 or 1 –Knuth: Good choice for A = (  5 - 1)/2 Fractional part of kA

Universal Hashing When attempting to foil an malicious adversary, randomize the algorithm Universal hashing: pick a hash function randomly when the algorithm begins –Guarantees good performance on average, no matter what keys adversary chooses –Need a family of hash functions to choose from –Think of quicksort

Universal Hashing Let  be a (finite) collection of hash functions –…that map a given universe U of keys… –…into the range {0, 1, …, m - 1}.  is said to be universal if: –for each pair of distinct keys x, y  U, the number of hash functions h   for which h(x) = h(y) is |  |/m –In other words: With a random hash function from , the chance of a collision between x and y is exactly 1/m (x  y)

Universal Hashing Theorem 11.3: –Choose h from a universal family of hash functions –Hash n keys into a table of m slots, n  m –Then the expected number of collisions involving a particular key x is less than 1 –Proof: For each pair of keys y, z, let c yx = 1 if y and z collide, 0 otherwise E[c yz ] = 1/m (by definition) Let C x be total number of collisions involving key x Since n  m, we have E[C x ] < 1

A Universal Hash Function Choose table size m to be prime Decompose key x into r+1 bytes, so that x = {x 0, x 1, …, x r } –Only requirement is that max value of byte < m –Let a = {a 0, a 1, …, a r } denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} –Define corresponding hash function h a   : –With this definition,  has m r+1 members

A Universal Hash Function  is a universal collection of hash functions (Theorem 11.5) How to use: –Pick r based on m and the range of keys in U –Pick a hash function by (randomly) picking the a’s –Use that hash function on all keys

Example Let m = 5, and the size of each string is 2 bits (binary). Note the maximum value of a string is 3 and m = 5 a = 1,3, chosen at random from 0,1,2,3,4 Example for x = 4 = 01,00 (note r = 1) h a (4) = 1  (01) + 3  (00) = 1

Open Addressing Basic idea (details in Section 12.4): –To insert: if slot is full, try another slot, …, until an open slot is found (probing) –To search, follow same sequence of probes as would be used when inserting the element If reach element with correct key, return it If reach a NULL pointer, element is not in table Good for fixed sets (adding but no deletion) Table needn’t be much bigger than n