Optimal Binary Search Tree

Slides:



Advertisements
Similar presentations
FUNCTIONAL ICT LEVEL 2 Advanced Excell. Data types.
Advertisements

Chapter 12 Binary Search Trees
O ptimal B inary S earch T ree. Optimal( 理想 ) Binary ( 二元 ) Search ( 搜尋 ) Tree ( 樹 ) OBST =
CPSC 252 AVL Trees Page 1 AVL Trees Motivation: We have seen that when data is inserted into a BST in sorted order, the BST contains only one branch (it.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Abstraction and Problem Solving with JAVA Walls and Mirrors Frank M. Carrano and Janet J. Prichard © 2001 Addison Wesley Data Abstraction and Problem.
Data Abstraction and Problem Solving with JAVA Walls and Mirrors Frank M. Carrano and Janet J. Prichard © 2001 Addison Wesley Data Abstraction and Problem.
CS 171: Introduction to Computer Science II
InOrder Traversal Algorithm // InOrder traversal algorithm inOrder(TreeNode n) { if (n != null) { inOrder(n.getLeft()); visit(n) inOrder(n.getRight());
Chapter 5 Trees PROPERTIES OF TREES 3 4.
1 Advanced Data Structures. 2 Topics Data structures (old) stack, list, array, BST (new) Trees, heaps, union-find, hash tables, spatial, string Algorithm.
CSE 326: Data Structures Binary Search Trees Ben Lerner Summer 2007.
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )
Optimal binary search trees
Selection Sort
1 Section 9.2 Tree Applications. 2 Binary Search Trees Goal is implementation of an efficient searching algorithm Binary Search Tree: –binary tree in.
1 times table 2 times table 3 times table 4 times table 5 times table
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
Data Structures and Algorithms Semester Project – Fall 2010 Faizan Kazi Comparison of Binary Search Tree and custom Hash Tree data structures.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Binary Tree. Binary Trees – An Informal Definition A binary tree is a tree in which no node can have more than two children Each node has 0, 1, or 2 children.
Different Tree Data Structures for Different Problems
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
Chapter 13 B Advanced Implementations of Tables – Balanced BSTs.
We learnt what are forests, trees, bintrees, binary search trees. And some operation on the tree i.e. insertion, deletion, traversing What we learnt.
§3 Dynamic Programming 3. Optimal Binary Search Tree —— The best for static searching (without insertion and deletion) Given N words w 1 < w 2 < ……
Binary SearchTrees [CLRS] – Chap 12. What is a binary tree ? A binary tree is a linked data structure in which each node is an object that contains following.
MA/CSSE 473 Day 21 AVL Tree Maximum height 2-3 Trees Student questions?
CS2223 Recitation 6 Finding the optimal Binary Search Tree for Keys with Given Probabilities Song Wang April 20.
Computer Science: A Structured Programming Approach Using C Trees Trees are used extensively in computer science to represent algebraic formulas;
CS 361 – Chapter 3 Sorted dictionary ADT Implementation –Sorted array –Binary search tree.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Chapter 13 A Advanced Implementations of Tables. © 2004 Pearson Addison-Wesley. All rights reserved 13 A-2 Balanced Search Trees The efficiency of the.
AVL Trees. AVL Node Structure The AVL node structure follows the same structure as the binary search tree, with the addition of a term to store the.
Selection Sort
Elementary Data Organization. Outline  Data, Entity and Information  Primitive data types  Non primitive data Types  Data structure  Definition 
CS 361 – Chapter 5 Priority Queue ADT Heap data structure –Properties –Internal representation –Insertion –Deletion.
CS223 Advanced Data Structures and Algorithms 1 Priority Queue and Binary Heap Neil Tang 02/09/2010.
Binary Search Trees Lecture 6 Asst. Prof. Dr. İlker Kocabaş 1.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
H EAPS. T WO KINDS OF HEAPS : MAX AND MIN Max: Every child is smaller than its parent Meaning the max is the root of the tree 10 / \ 9 7 / \ 6 8 / \ 2.
Denary (our numbers) Binary
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
Jim Anderson Comp 750, Fall 2009 Splay Trees - 1 Splay Trees In balanced tree schemes, explicit rules are followed to ensure balance. In splay trees, there.
Exam 3 Review Data structures covered: –Hashing and Extensible hashing –Priority queues and binary heaps –Skip lists –B-Tree –Disjoint sets For each of.
Binary Tree Implementation. Binary Search Trees (BST) Nodes in Left subtree has smaller values Nodes in right subtree has bigger values.
Binary Search Trees Lecture 6 Prof. Dr. Aydın Öztürk.
Question 4 Tutorial 8. Part A Insert 20, 10, 15, 5,7, 30, 25, 18, 37, 12 and 40 in sequence into an empty binary tree
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Data Structures Red-Black Trees Design and Analysis of Algorithms I.
Nov 2, 2001CSE 373, Autumn Hash Table example marking deleted items + choice of table size.
Red-Black Tree Neil Tang 02/07/2008
Red-Black Tree Neil Tang 02/04/2010
G64ADS Advanced Data Structures
Binary Search Tree Chapter 10.
Tree data structure.
Tree data structure.
Search Sorted Array: Binary Search Linked List: Linear Search
CS223 Advanced Data Structures and Algorithms
Binary Trees: Motivation
Given value and sorted array, find index.
CS223 Advanced Data Structures and Algorithms
d-heaps Same as binary heaps, except d children instead of 2.
CPS216: Advanced Database Systems
Search Sorted Array: Binary Search Linked List: Linear Search
Chapter 11 Trees © 2011 Pearson Addison-Wesley. All rights reserved.
Topic 10 Trees.
Optimal Binary Search Tree. 1.Preface  OBST is one special kind of advanced tree.  It focus on how to reduce the cost of the search of the BST.  It.
Presentation transcript:

Optimal Binary Search Tree Rytas 12/12/04

1.Preface OBST is one special kind of advanced tree. It focus on how to reduce the cost of the search of the BST. It may not have the lowest height ! It needs 3 tables to record probabilities, cost, and root.

2.Premise It has n keys (representation k1,k2,…,kn) in sorted order (so that k1<k2<…<kn), and we wish to build a binary search tree from these keys. For each ki ,we have a probability pi that a search will be for ki. In contrast of, some searches may be for values not in ki, and so we also have n+1 “dummy keys” d0,d1,…,dn representating not in ki. In particular, d0 represents all values less than k1, and dn represents all values greater than kn, and for i=1,2,…,n-1, the dummy key di represents all values between ki and ki+1. *The dummy keys are leaves (external nodes), and the data keys mean internal nodes.

3.Formula & Prove The case of search are two situations, one is success, and the other, without saying, is failure. We can get the first statement : (i=1~n) ∑ pi + (i=0~n) ∑ qi = 1 Failure Success

Because we have probabilities of searches for each key and each dummy key, we can determine the expected cost of a search in a given binary search tree T. Let us assume that the actual cost of a search is the number of nodes examined, i.e., the depth of the node found by the search in T,plus1. Then the expected cost of a search in T is : (The second statement) E[ search cost in T] = (i=1~n) ∑ pi .(depthT(ki)+1) + (i=0~n) ∑ qi .(depthT(di)+1) =1 + (i=1~n) ∑ pi .depthT(ki) + (i=0~n) ∑ qi .depthT(di) Where depthT denotes a node’s depth in the tree T.

k2 k2 k1 k4 k1 k5 d0 d1 d0 d1 d5 k4 k3 k5 d2 d3 d4 d5 d4 k3 Figure (a) i 1 2 3 4 5 pi 0.15 0.10 0.05 0.20 qi d2 d3 Figure (b)

Node# Depth probability cost k1 1 0.15 0.30 k2 0.10 k3 2 0.05 k4 0.20 By Figure (a), we can calculate the expected search cost node by node: Node# Depth probability cost k1 1 0.15 0.30 k2 0.10 k3 2 0.05 k4 0.20 K5 0.60 d0 d1 3 d2 d3 d4 d5 0.40 Cost= Probability * (Depth+1)

And the total cost = (0.30 + 0.10 + 0.15 + 0.20 + 0.60 + 0.15 + 0.30 + 0.20 + 0.20 + 0.20 + 0.40 ) = 2.80 So Figure (a) costs 2.80 ,on another, the Figure (b) costs 2.75, and that tree is really optimal. We can see the height of (b) is more than (a) , and the key k5 has the greatest search probability of any key, yet the root of the OBST shown is k2.(The lowest expected cost of any BST with k5 at the root is 2.85)

Step1:The structure of an OBST To characterize the optimal substructure of OBST, we start with an observation about subtrees. Consider any subtree of a BST. It must contain keys in a contiguous range ki,…,kj, for some 1≦i ≦j ≦n. In addition, a subtree that contains keys ki,…,kj must also have as its leaves the dummy keys di-1 ,…,dj.

We need to use the optimal substructure to show that we can construct an optimal solution to the problem from optimal solutions to subproblems. Given keys ki ,…, kj, one of these keys, say kr (I ≦r ≦j), will be the root of an optimal subtree containing these keys. The left subtree of the root kr will contain the keys (ki ,…, kr-1) and the dummy keys( di-1 ,…, dr-1), and the right subtree will contain the keys (kr+1 ,…, kj) and the dummy keys( dr ,…, dj). As long as we examine all candidate roots kr, where I ≦r ≦j, and we determine all optimal binary search trees containing ki ,…, kr-1 and those containing kr+1 ,…, kj , we are guaranteed that we will find an OBST.

There is one detail worth nothing about “empty” subtrees There is one detail worth nothing about “empty” subtrees. Suppose that in a subtree with keys ki,...,kj, we select ki as the root. By the above argument, ki ‘s left subtree contains the keys ki,…, ki-1. It is natural to interpret this sequence as containing no keys. It is easy to know that subtrees also contain dummy keys. The sequence has no actual keys but does contain the single dummy key di-1. Symmetrically, if we select kj as the root, then kj‘s right subtree contains the keys, kj+1 …,kj; this right subtree contains no actual keys, but it does contain the dummy key dj.

Step2: A recursive solution We are ready to define the value of an optimal solution recursively. We pick our subproblem domain as finding an OBST containing the keys ki,…,kj, where i≧1, j ≦n, and j ≧ i-1. (It is when j=i-1 that ther are no actual keys; we have just the dummy key di-1.) Let us define e[i,j] as the expected cost of searching an OBST containing the keys ki,…, kj. Ultimately, we wish to compute e[1,n].

The easy case occurs when j=i-1. Then we have just the dummy key di-1 The easy case occurs when j=i-1. Then we have just the dummy key di-1. The expected search cost is e[i,i-1]= qi-1. When j≧1, we need to select a root krfrom among ki,…,kj and then make an OBST with keys ki,…,kr-1 its left subtree and an OBST with keys kr+1,…,kj its right subtree. By the time, what happens to the expected search cost of a subtree when it becomes a subtree of a node? The answer is that the depth of each node in the subtree increases by 1.

By the second statement, the excepted search cost of this subtree increases by the sum of all the probabilities in the subtree. For a subtree with keys ki,…,kj let us denote this sum of probabilities as w (i , j) = (l=i~j) ∑ pl + (l=i-1~j) ∑ ql Thus, if kr is the root of an optimal subtree containing keys ki,…,kj, we have E[i,j]= pr + (e[i,r-1]+w(i,r-1))+(e[r+1,j]+w(r+1,j)) Nothing that w (i , j) = w(i,r-1)+ pr +w(r+1,j)

We rewrite e[i,j] as e[i,j]= e[i,r-1] + e[r+1,j]+w(i,j) The recursive equation as above assumes that we know which node kr to use as the root. We choose the root that gives the lowest expected search cost, giving us our final recursive formulation: E[i,j]= case1: if i≦j,i≦r≦j E[i,j]=min{e[i,r-1]+e[r+1,j]+w(i,j)} case2: if j=i-1; E[i,j]= qi-1

The e[i,j] values give the expected search costs in OBST The e[i,j] values give the expected search costs in OBST. To help us keep track of the structure of OBST, we define root[i,j], for 1≦i≦j≦n, to be the index r for which kr is the root of an OBST containing keys ki,…,kj.

Step3: Computing the expected search cost of an OBST We store the e[i.j] values in a table e[1..n+1, 0..n]. The first index needs to run to n+1rather than n because in order to have a subtree containing only the dummy key dn, we will need to compute and store e[n+1,n]. The second index needs to start from 0 because in order to have a subtree containing only the dummy key d0, we will need to compute and store e[1,0]. We will use only the entries e[i,j] for which j≧i-1. we also use a table root[i,j], for recording the root of the subtree containing keys ki,…, kj. This table uses only the entries for which 1≦i≦j≦n.

We will need one other table for efficiency We will need one other table for efficiency. Rather than compute the value of w(i,j) from scratch every time we are computing e[i,j] ----- we tore these values in a table w[1..n+1,0..n]. For the base case, we compute w[i,i-1] = qi-1 for 1≦i ≦n. For j≧I, we compute : w[i,j]=w[i,j-1]+pi+qi

OPTIMAL—BST(p,q,n) For i 1 to n+1 do e[i,i-1] qi-1 do w[i,i-1] qi-1 For l 1 to n do for i 1 to n-l +1 do j i+l-1 e[i,j] ∞ w[i,j] w[i,j-1]+pj+qj For r i to j do t e[i,r-1]+e[r+1,j]+w[i,j] if t<e[i,j] then e[i,j] t root [i,j] r Return e and root

The tables e[i,j], w[i,j], and root [i,j]computed by Optimal-BST 1 5 1 5 2 4 2.75 2 4 3 1.00 3 1.75 2.00 3 0.70 0.80 3 2 1.25 4 1.20 1.30 2 0.55 4 5 0.50 0.60 1 0.90 0.70 0.60 0.90 1 0.45 0.35 5 0.30 0.50 0.25 0.50 6 0.45 0.40 0.30 6 0.30 0.25 0.15 0.20 0.35 0.05 0.10 0.05 0.05 0.05 0.10 0.05 0.10 0.05 0.05 0.05 0.10 root 5 1 2 4 2 3 2 4 3 2 2 2 5 4 2 5 1 4 5 1 3 4 5 1 2 The tables e[i,j], w[i,j], and root [i,j]computed by Optimal-BST

Advanced Proof-1 All keys (including data keys and dummy keys) of the weight sum (probability weight) and that can get the formula: + Because the probability of ki is pi and di is qi; Then rewrite that + =1 ……..formula (1)

Advanced Proof-2 We first focus on the probability weight ; but not in all, just for some part of the full tree. That means we have ki, …, kj data, and 1≦i ≦j ≦n, and ensures that ki, …, kj is just one part of the full tree. By the time, we can rewrite formula (1) into w[i,j] = + For recursive structure, maybe we can get another formula for w[i,j]=w[i,j-1]+Pj+Qj By this , we can struct the weight table.

Advanced Proof-3 Finally, we want to discuss our topic, without saying, the cost, which is expected to be the optimal one. Then define the recursive structure’s cost e[i,j], which means ki, …, kj, 1≦i ≦j ≦n, cost. And we can divide into root, leftsubtree, and rightsubtree.

Advanced Proof-4 The final cost formula: E[i,j] = Pr + e[i,r-1] + w[i,r-1] + e[r+1,j] + w[r+1,j] Nothing that : Pr + w[i,r-1] + w[r+1,j] = w[i,j] So, E[i,j] = (e[i,r-1] + e[r+1,j]) + w[i,j] And we use it to struct the cost table! P.S. Neither weight nor cost calculating, if ki,…, kj, but j=i-1, it means that the sequence have no actual key, but a dummy key. Get the minimal set

Exercise i 1 2 3 4 5 6 7 pi 0.04 0.06 0.08 0.02 0.10 0.12 0.14 qi 0.05