Download presentation
Presentation is loading. Please wait.
Published byLeslie Evans Modified over 9 years ago
1
1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees 6.3 HeapSort and BucketSort 6.3.1 HeapSort 6.3.2 BucketSort
2
2 Addendum: A Webseite with animations for AVL-trees: http://www.seanet.com/users/arsen/avltree.html A Webseite with animation for Heapsort: http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/heapsort.html
3
3 W Z a b c x y W Z a b c xy new First rotation Second rotation
4
4 6.3 BucketSort All sorting procedures we have seen so far are based on the comparission of two keys The general bottom bound for the cost for this kind of procedures is: O(n log n). For certain sets of keys: Sorting without comparing keys and more efficient !
5
5 Idea: use the keys to calculate the the storing addresses for the elements of the sequence to be sorted (like in Hashing). Example (ideal situation, not frequent): Set of n data objects {s 0,..., s n-1 } with key values 0,..., n-1, without duplicates given as an array S. Sorting algoritm: for(int i = 0, i < n, i++) T[S[i].key] = S[i]; cost: O(n).
6
6 BucketSort Sets of n data objects {s 0,..., s n-1 } with key values 0,..., m-1, given as array S. duplicate keys are allowed. void BucketSort(S) { int i; int j; for(j=0; j<m; j++) B[j] = null; //the buckets, lists for(i=0; i<n; i++) insert(S[i], B[S[i].key()] ); for(j=0; j<m; j++) output(B[j]); } cost: O(n+m).
7
7 RadixSort Sets of n data objects {s 0,..., s n-1 } with key values 0,..., n k -1, given as an array S. Duplicate keys allowed. The bucketsort for that would take: O(n + n k ). Making it better (RadixSort): Write the keys on base n. We have numbers of k ciphers Run k times the BucketSort algorithm sorting the objects according to each cipher, in order, starting from the less significant cipher (last?)until the most significant one (first?) (e.g. using mod and div ). cost: O(kn).
8
8 Example for RadixSort: n=10, k=2. Sequence to be sorted: 64, 17, 3, 99, 79, 78, 19, 13, 67, 34. 1. step: insert them in buckets according to the last cipher: after that, output them in the order they are : 3, 13, 64, 34, 17, 67, 78, 99, 79, 19 0123456789 3 13 64 34 17 67 7899 79 19
9
9 Continuation RadixSort 2nd. step: the sequence obtained from the step 1 3, 13, 64, 34, 17, 67, 78, 99, 79, 19 Insert in the buckets according to the penultimate cipher and output them: 3, 13, 17, 19, 34, 64, 67, 78, 79, 99. 0123456789 313 17 19 3464 67 78 79 99
10
10 Generalizing: Ciphers in different possitions can have a different value range. Example: Date=(year, month, day) ( [0..9999], [1..12], [1..31] ) BucketSort the dates according to day, month and year.
11
General things about binary trees They are recursive structures, this means, many algorithms over them are “better” (shorter, more elegant) expressed in a recursive way This means, in most cases it is necessary to execute recursively the algorithm on one or both sub-trees and analyze the root node (the order may vary according to the task) (one of) The base case(s) (when there is no recursive call any more) is when the pointer to the root of the (sub-)tree is null (empty tree) To improve efficiency we can avoid recursive calls when there is no child; 11
12
Example 1 : search 12 Node search(int x, Node y) { //returns a pointer to the node containing y //null if it is not in the tree if (y == null) return null; if (y.key == x) return y if (y.key > x) return search(x, y.left); return search(x, y.right); }
13
Exmple 2 : count 13 int count(Node y) { //returns the number of nodes in the tree if (y == null) return 0; int a = count(y.right); int b = count(y.left); return a + b + 1; // return count(y.right)+count(y.left)+1; }
14
Exmple 3 : check if search tree 14 boolean isBST(Node y) { //returns true if the tree is a //binary search tree if (y == null) return true; if (y.right == null && y.left == null) return true; if (!isBST(y.left) || !isBST(y.right)) return false if (y.left != null && max(y.left) < y.key && y.right != null && min(y.right) > y.key) return true; return false; }
15
Exmple 3 : more effienciently 15 class Resp { int min, max; boolean ok; Resp(int x, int y, boolean z){ min = x; max = y; ok = z; } } resp isBST(Node y) { if (y == null) return new Resp(0,0,true); Resp a = null, b = null; c= new Resp(y.key, y.key, true); if (y.left != null) a = isBST(y.left); if (y.right != null) b = isBST(y.right); if ( a != null && b != null) { c.min = a.min; c.max = b.max; c.ok = a.ok && b.ok && a.max y.key; } if (a != null && b == null) { c.min = a.min; c.ok = a.ok && a.max < y.key; } if ( a == null && b != null) { c.max = b.max; c.ok = b.ok && b.min > y.key; } return c; }
16
16 Chapter 7: Selected Algorithms 7.1 External Search
17
17 7.1 External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er) Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower) Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM For efficiency: keep the number of accesses to the pages low!
18
18 For external search: a variant of search trees: 1 node = 1 page Multiple way search trees!
19
19 Definition (Multiple way-search trees) An empty tree is a multiple way search tree with an empty set of keys {}. Be T 0,..., T n multiple way-search trees with keys taken from a common key set S, and be k 1,...,k n a sequence of keys with k 1 <...< k n. Then is the sequence: T 0 k 1 T 1 k 2 T 2 k 3.... k n T n a multiple way-search trees only when: for all keys x from T 0 x < k 1 for i=1,...,n-1, for all keys x in T i, k i < x < k i +1 for all keys x from T n k n < x
20
20 B-Tree Definition 7.1.2 A B-Tree of Order m is a multiple way tree with the following characteristics 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes. All paths from the root to a leaf are equally long. Each internal node (not leaf) which has s keys has exactly s+1 children.
21
21 Example: a B-tree of order 2:
22
22 Assessment of B-trees The minimal possible number of nodes in a B-tree of order m and height h: Number of nodes in each sub-tree 1 + (m+1) + (m+1) 2 +.... + (m+1) h-1 = ( (m+1) h – 1) / m. The root of the minimal tree has only one key and two children, all other nodes have m keys. Altogether: number of keys n in a B-tree of height h: n 2 (m+1) h – 1 Thus the following holds for each B-tree of height h with n keys: h log m+1 ((n+1)/2).
23
23 Example The following holds for each B-tree of height h with n keys: h log m+1 ((n+1)/2). Example: for Page size: 1 KByte and each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have h log 64 500 000.5 < 4 and with that h max = 3.
24
24 Algorithms for searching keys in a B-tree Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)
25
25 Algorithms for inserting and deleting of keys in a B-tree Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }
26
26 Algorithm overflow (p) = split (p) Algorithm split (p) first case: p has a parent q. Divide the overflowed node. The key of the middle goes to the parent. remark: the splitting may go up until the root, in which case the height of the tree is incremented by one. Algorithm Split (1)
27
27 Algorithm split (p) second case: p is the root. Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key). Algorithm Split (2)
28
28 //delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } } Algorithm delete (r,x)
29
29 Algorithm underflow (p) if p has a neighboring node with s>m nodes { balance (p,p') } else // because p cannot be the root, p must have a neighbor with m keys { be p' the neighbor with m keys; merge (p,p')}
30
30 Algorithm balance (p, p') // balance node p with its neighbor p' (s > m, r = (m+s)/2 -m )
31
31 Algorithm merge (p,p') // merge node p with its neighbor perform the following operation: afterwards: if( q <> root) and (q has m-1 keys) underflow (q) else (if(q= root) and (q empty)) {free q let root point to p^}
32
32 Recursion If when performing underflow we have to perform merge, we might have to perform underflow again one level up This process might be repeated until the root.
33
33 Example: B-Tree of order 2 (m = 2)
34
34 Cost Be m the order of the B-tree, n the number of keys. Costs for search, insert and delete: O(h) = O(log m+1 ((n+1)/2) ) = O(log m+1 (n)).
35
35 Remark: B-trees can also be used as internal storage structure: Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes). Cost of search, insert, delete: O(log n).
36
36 Remark: use of storage memory Over 50% reason: the condition: 1/2k #(keys in the node) k For nodes root (k=2m)
37
37 Even higher usage ratio of memory is possible to achieve with the following condition ( 66%): 2/3k #(keys in nodes) k For all nodes and their children This can be reached by 1) modified balancing also when inserting 2) split only then, when 2 neighbors are full. Drawback : More frequent reorganization is necessary when inserting and deleting..
38
38 7.2 External Sorting Problem: Sorting big amount of data, as in external searching, stored in blocks (pages). efficiency: number of the access to pages should be kept low! Strategy: Sorting algorithm which processes the data sequentially (no frequent page exchanges): MergeSort!
39
39 Start: n data in a file g 1, divided in pages of size b: Page 1: s 1,…,s b Page 2: s b+1,…s 2b … Page k: s (k-1)b+1,…,s n ( k = [n/b] + ) When sequentially processed: only k page accesses instead of n.
40
40 Variation of MergeSort for external sorting MergeSort: Divide-and-Conquer-Algorithm for external sorting: without divide-step, only merge. Definition: run := ordered subsequence within a file. Strategy: by merging increasingly generated runs until everything is sorted.
41
41 Algorithm 1. Step: Generate from the sequence in the input file g 1 „starting runs“ and distribute them in two files f 1 and f 2, with the same number of runs ( 1) in each. (for this there are many strategies, later). Now: use four files f 1, f 2, g 1, g 2.
42
42 2. Step (main step): While the number of runs > 1 repeat: { Merge each two runs from f 1 and f 2 to a double sized run alternating to g 1 und g 2, until there are no more runs in f 1 and f 2. Merge each two runs from g 1 and g 2 to a double sized run alternating to f 1 and f 2, until there are no more runs in g 1 und g 2. } Each loop = two phases
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.