Review of the second half Single-Source Shortest Paths 2019/4/25 chapter25
Find a shortest path from station A to station B. -need serious thinking to get a correct algorithm. 2019/4/25 chapter25
Adjacency-list representation Let G=(V, E) be a graph. V– set of nodes (vertices) E– set of edges. For each uV, the adjacency list Adj[u] contains all nodes in V that are adjacent to u. 2 1 5 4 3 / (a) (b) 2019/4/25 chapter25
Dijkstra’s Algorithm: Dijkstra’s algorithm assumes that w(e)0 for each e in the graph. maintain a set S of vertices such that Every vertex v S, d[v]=(s, v), i.e., the shortest-path from s to v has been found. (Intial values: S=empty, d[s]=0 and d[v]=) (a) select the vertex uV-S such that d[u]=min {d[x]|x V-S}. Set S=S{u} D[u]= (s, u) at this moment! Why? (b) for each node v adjacent to u do RELAX(u, v, w). Repeat step (a) and (b) until S=V. 2019/4/25 chapter25
Continue: DIJKSTRA(G,w,s): INITIALIZE-SINGLE-SOURCE(G,s) S Q V[G] while Q do u EXTRACT -MIN(Q) S S {u} for each vertex v Adj[u] do RELAX(u,v,w) 2019/4/25 chapter25
Implementation: a adaptable priority queue Q stores vertices in V-S, keyed by their d[] values. the graph G is represented by adjacency lists so that it takes O(1) time to find an edge (u, v) in (b). 2019/4/25 chapter25
u v 10 5 2 1 3 4 6 9 7 8 s Single Source Shortest Path Problem x y (a) 2019/4/25 chapter25
u v 1 10 8 10 9 s 2 3 4 6 7 5 5 8 2 x y (b) (s,x) is the shortest path using one edge. It is also the shortest path from s to x. 2019/4/25 chapter25
Assume EXTRACT -MIN(Q)=x. (s,x) is the shortest path using one edge. Why? Since (s, x) is the shortest among all edges starting from s. It is also the shortest path from s to x. Proof: (1) Suppose that path P: s->u…->x is the shortest path. Then w (s,u) w(s, x). (2) Since edges have non-negative weight, the total weight of path P is at least w(s,u) w(s, x). (3) So, the edge (s, x) is the shortest path from s to x. 2019/4/25 chapter25
u v 1 8 14 10 9 s 2 3 4 6 7 5 5 7 2 x y (c) 2019/4/25 chapter25
Statement: Suppose S={s, x} and d[y]=min d(v). ……(1) vV-S Then d[y] is the cost of the shortest, i.e., either s->x->y or (s->y) is the shortest path from s to y. Why? If (s, y) is the shortest path, d[y] is correct. Consider that case: s->x->y is the shortest path for y. Proof by contradiction. Assume that s->x->y is not the shortest path and that P1: s->yy->…->y is a shortest path and yyS. (At this moment, we already tried the case for yy=s and yy=x in the alg.) Thus, w(P1)< w(s->x->y). (from assumption: s->x-Y is not the shortest) Since w(e)0 for any e, w(s->yy)<w(P1)<w(s->x->y). Therefore, d[yy]<d[y] and (1) is not true. 2019/4/25 chapter25
u v 1 8 13 10 9 s 2 3 4 6 7 5 5 7 2 x y (d) 2019/4/25 chapter25
7 9 5 8 10 2 1 3 4 6 s u v x y (e) 2019/4/25 chapter25
u v 1 8 9 10 9 s 2 3 4 6 7 5 5 7 2 x y (f) 2019/4/25 chapter25
Theorem: Let S be the set in algorithm and d[y]=min d(v). ……(1) vV-S Then d[y] is the cost of the shortest. (hard part) Proof: Assume that (1) for any v in S, y[v] is the cost of the shortest path. We want to show that d[y] is also the cost of the shortest path after execution of Step (a) If the shortest path from s to v contains vertices in S ONLY, then d[v] is the length of the shortest. Assume that (2) d[y] is NOT the cost of the shortest path from s to y and that P1: s…->yy->…->y is a shortest path and yyS is the 1st node in P1 not in S. Thus, w(P1)<d[y]. So, w(s…->yy)<w(P1). (weight of edge >=0) Thus, w(s…->yy)<w(P1)<d[y]. From (1) and (2), after execution of step (a) d[yy]w(s…->yy). Therefore, d[yy]<d[y] and (1) is not true. 2019/4/25 chapter25
Time complexity of Dijkstra’s Algorithm: Time complexity depends on implementation of the adaptable priority. Method 1: Use an array to story the Queue EXTRACT -MIN(Q) --takes O(|V|) time. Totally, there are |V| EXTRACT -MIN(Q)’s. time for |V| EXTRACT -MIN(Q)’s is O(|V|2). RELAX(u,v,w) --takes O(1) time. Totally |E| RELAX(u, v, w)’s are required. time for |E| RELAX(u,v,w)’s is O(|E|). Total time required is O(|V|2+|E|)=O(|V|2) Backtracking with [] gives the shortest path in inverse order. Method 2: The adaptable priority queue is implemented as a heap. It takes O(log |V|) time to do EXTRACT-MIN(Q) and O(log |V|) time for each RELAX(u, v, w)’s. The total running time is O((|V|+|E|)log |V|)=O(|E|log |V|) assuming the graph is connected. When |E| is O(|V|) the second implementation is better. If |E|=O(|V|2), then the first implementation is better. 2019/4/25 chapter25
Method 3: The priority queue is implemented as a Fibonacci heap Method 3: The priority queue is implemented as a Fibonacci heap. It takes O(log |V|) time to do EXTRACT-MIN(Q) and O(1) time to decrease the key value of an entry. The total running time is O(|V|log |V|+|E|). (not required) 2019/4/25 chapter25
Binary search tree and AVL tree 2019/4/25 chapter25
ADT for Map: Map stores elements (entries) so that they can be located quickly using keys. Each element (entry) is a key-value pair (k, v), where k is the key and v can be any object to store additional information. Each key is unique. (different entries have different keys.) Map support the following methods: Size(): Return the number of entries in M isEmpty(): Test whether M is empty get(k); If M contains an entry e with key=k, then return e else return null. put(k, v): If M does not contain an entry with key=k then add (k, v) to the map and return null; else replace the entry with (k, v) and return the old value. 2019/4/25 chapter25
Methods of Map (continued) remove(k): remove from M the netry with key=k and return its value; if M has no such entry with key=k then return null. keys(); Return an iterable collection containing all keys stored in M values(): Return an iterable collection containing all values in M entries(): return an iterable collection containing all key-value entries in M. Remakrs: hash table is an implementation of Map. 2019/4/25 chapter25
ADT for Dictionary: A Dictionary stores elements (entries). Each element (entry) is a key-value pair (k, v), where k is the key and v can be any object to store additional information. The key is NOT unique. Dictionary support the following methods: size(): Return the number of entries in D isEmpty(): Test whether D is empty find(k): If D contains an entry e with key=k, then return e else return null. findAll(k): Return an iterable collection containing all entries with key=k. insert(k, v): Insert an entry into D, returning the entry created. remove(e): remove from D an enty e, returing the removed entry or null if e was not in D. entries(): return an iterable collection of the key-value entries in D. 2019/4/25 chapter25
Part-F1 Binary Search Trees < 6 2 > 9 1 4 = 8 2019/4/25 chapter25
Search Trees Tree data structure that can be used to implement a dictionary. find(k): If D contains an entry e with key=k, then return e else return null. findAll(k): Return an iterable collection containing all entries with key=k. insert(k, v): Insert an entry into D, returning the entry created. remove(e): remove from D an enty e, returing the removed entry or null if e was not in D. 2019/4/25 chapter25
Binary Search Trees A binary search tree is a binary tree storing keys (or key-value entries) at its internal nodes and satisfying the following property: Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. We have key(u) key(v) key(w) Different nodes can have the same key. External nodes do not store items An inorder traversal of a binary search trees visits the keys in increasing order 6 9 2 4 1 8 2019/4/25 chapter25
Search To search for a key k, we trace a downward path starting at the root The next node visited depends on the outcome of the comparison of k with the key of the current node If we reach a leaf, the key is not found and we return null Example: find(4): Call TreeSearch(4,root) Algorithm TreeSearch(k, v) if T.isExternal (v) return v if k < key(v) return TreeSearch(k, T.left(v)) else if k = key(v) else { k > key(v) } return TreeSearch(k, T.right(v)) < 6 2 > 9 1 4 = 8 2019/4/25 chapter25
Insertion < > > w w 6 2 9 1 4 8 6 2 9 1 4 8 5 To perform operation insert(k, o), we search for key k (using TreeSearch) Algorithm TreeINsert(k, x, v): Input: A search key, an associate value x and a node v of T to start with Output: a new node w in the subtree T(v) that stores the entry (k, x) W TreeSearch(k,v) If k=key(w) then return TreeInsert(k, x, T.left(w)) T.insertAtExternal(w, (k, x)) Return Example: insert 5 Example: insert another 5? 2 9 > 1 4 8 > w 6 2 9 1 4 8 w 5 2019/4/25 chapter25
Deletion To perform operation remove(k), we search for key k < 6 To perform operation remove(k), we search for key k Assume key k is in the tree, and let v be the node storing k If node v has a leaf child w, we remove v and w from the tree with operation removeExternal(w), which removes w and its parent and replace v with the remaining child. Example: remove 4 < 2 9 > v 1 4 8 w 5 6 2 9 1 5 8 2019/4/25 chapter25
Deletion (cont.) 1 v We consider the case where the key k to be removed is stored at a node v whose children are both internal we find the internal node w that follows v in an inorder traversal we copy key(w) into node v we remove node w and its left child z (which must be a leaf) by means of operation removeExternal(z) Example: remove 3 3 2 8 6 9 w 5 z 1 v 5 2 8 6 9 2019/4/25 chapter25
Deletion (Another Example) 1 v 3 2 8 6 9 w 4 z 5 1 v 4 2 8 6 9 5 2019/4/25 chapter25
Later, we will try to keep h =O(log n). Performance Consider a dictionary with n items implemented by means of a binary search tree of height h the space used is O(n) methods find, insert and remove take O(h) time The height h is O(n) in the worst case and O(log n) in the best case Later, we will try to keep h =O(log n). Review the past 2019/4/25 chapter25
Part-F2 AVL Trees 6 3 8 4 v z 2019/4/25 chapter25
AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1. An example of an AVL tree where the heights are shown next to the nodes: 2019/4/25 chapter25
Balanced nodes A internal node is balanced if the heights of its two children differ by at most 1. Otherwise, such an internal node is unbalanced. 2019/4/25 chapter25
3 4 n(1) n(2) Height of an AVL Tree Fact: The height of an AVL tree storing n keys is O(log n). Proof: Let us bound n(h): the minimum number of internal nodes of an AVL tree of height h. We easily see that n(1) = 1 and n(2) = 2 For n > 2, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and another of height n-2. That is, n(h) = 1 + n(h-1) + n(h-2) Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(n-6), … (by induction), n(h) > 2in(h-2i)>2 {h/2 -1} (1) = 2 {h/2 -1} Solving the base case we get: n(h) > 2 h/2-1 Taking logarithms: h < 2log n(h) +2 Thus the height of an AVL tree is O(log n) h-1 2019/4/25 chapter25 h-2
Insertion in an AVL Tree Insertion is as in a binary search tree Always done by expanding an external node. Example: 44 17 78 32 50 88 48 62 44 17 78 32 50 88 48 62 54 c=z a=y b=x w before insertion after insertion It is no longer balanced 2019/4/25 chapter25
Names of important nodes w: the newly inserted node. (insertion process follow the binary search tree method) The heights of some nodes in T might be increased after inserting a node. Those nodes must be on the path from w to the root. Other nodes are not effected. z: the first node we encounter in going up from w toward the root such that z is unbalanced. y: the child of z with higher height. y must be an ancestor of w. (why? Because z in unbalanced after inserting w) x: the child of y with higher height. x must be an ancestor of w. The height of the sibling of x is smaller than that of x. (Otherwise, the height of y cannot be increased.) See the figure in the last slide. 2019/4/25 chapter25
Algorithm restructure(x): Input: A node x of a binary search tree T that has both parent y and grand-parent z. Output: Tree T after a trinode restructuring. Let (a, b, c) be the list (increasing order) of nodes x, y, and z. Let T0, T1, T2 T3 be a left-to-right (inorder) listing of the four subtrees of x, y, and z not rooted at x, y, or z. Replace the subtree rooted at z with a new subtree rooted at b.. Let a be the left child of b and let T0 and T1 be the left and right subtrees of a, respectively. Let c be the right child of b and let T2 and T3 be the left and right subtrees of c, respectively. 2019/4/25 chapter25
Restructuring (as Single Rotations) 3 2 1 a = x b = y c = z single rotation 2019/4/25 chapter25
Restructuring (as Double Rotations) c = z b = x a = y T 3 1 2 2019/4/25 chapter25
Insertion Example, continued unbalanced... 4 T 1 44 x 2 3 17 62 y z 1 2 2 32 50 78 1 1 1 ...balanced 48 54 88 T 2 T T 1 T 2019/4/25 chapter25 3
Theorem: One restructure operation is enough to ensure that the whole tree is balanced. Proof: Look at the four cases on slides 20 and 21. 2019/4/25 chapter25
Removal in an AVL Tree Removal begins as in a binary search tree by calling removal(k) for binary tree. may cause an imbalance. Example: 44 17 78 32 50 88 48 62 54 44 w 17 62 50 78 48 54 88 before deletion of 32 after deletion 2019/4/25 chapter25
Rebalancing after a Removal Let z be the first unbalanced node encountered while travelling up the tree from w. w-parent of the removed node (in terms of structure, not the name) let y be the child of z with the larger height, let x be the child of y defined as follows; If one of the children of y is taller than the other, choose x as the taller child of y. If both children of y have the same height, select x be the child of y on the same side as y (i.e., if y is the left child of z, then x is the left child of y; and if y is the right child of z then x is the right child of y.) The way to obtain x, y and z are different from insertion. 2019/4/25 chapter25
Rebalancing after a Removal We perform restructure(x) to restore balance at z. As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached 62 a=z 44 44 78 w 17 62 b=y 17 50 88 50 78 c=x 48 54 48 54 88 2019/4/25 chapter25
Unbalanced after restructuring 1 1 62 h=3 a=z 44 h=4 h=5 h=5 44 78 w 17 62 b=y 17 50 88 32 50 78 c=x 88 2019/4/25 chapter25
Rebalancing after a Removal We perform restructure(x) to restore balance at z. As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached 62 a=z 44 44 78 w 17 62 b=y 17 50 88 50 78 c=x 48 54 48 54 88 2019/4/25 chapter25
Example a: Which node is w? Let us remove node 17. w 44 17 78 32 50 88 48 62 54 44 w 32 62 50 78 48 54 88 before deletion of 32 after deletion 2019/4/25 chapter25
Rebalancing: We perform restructure(x) to restore balance at z. As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached 62 a=z 44 44 78 w 32 62 b=y 32 50 88 50 78 c=x 48 54 48 54 88 2019/4/25 chapter25
Running Times for AVL Trees a single restructure is O(1) using a linked-structure binary tree find is O(log n) height of tree is O(log n), no restructures needed insert is O(log n) initial find is O(log n) Restructuring up the tree, maintaining heights is O(log n) remove is O(log n) 2019/4/25 chapter25
Part E Hash Tables 1 2 3 4 025-612-0001 981-101-0002 451-229-0004 1 025-612-0001 2 981-101-0002 3 4 451-229-0004 2019/4/25 chapter25
Motivations of Hash Tables We have n items, each contains a key and value (k, value). The key uniquely determines the item. Each key could be anything, e.g., a number in [0, 232], a string of length 32, etc. How to store the n items such that given the key k, we can find the position of the item with key= k in O(1) time. Another constraint: space required is O(n). Linked list? Space O(n) and Time O(n). Array? Time O(1) and space: too big, e.g., If the key is an integer in [0, 2 32], then the space required is 2 32. if the key is a string of length 30, the space required is 26 30. Hash Table: space O(n) and time O(1). 2019/4/25 chapter25
Basic ideas of Hash Tables A hash function h maps keys of a given type with a wide range to integers in a fixed interval [0, N - 1], where N is the size of the hash table such that if k≠k then h(k)≠h(k’) ….. (1) . Problem: It is hard to design a function h such that (1) holds. What we can do: We can design a function h so that with high chance, (1) holds. i.e., (1) may not always holds, but (1) holds for most of the n keys. 2019/4/25 chapter25
Hash Functions A hash function h maps keys of a given type to integers in a fixed interval [0, N - 1] Example: h(x) = x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x A hash table for a given key type consists of Hash function h Array (called table) of size N the goal is to store item (k, o) at index i = h(k) 2019/4/25 chapter25
Example We design a hash table storing entries as (HKID, Name), where HKID is a nine-digit positive integer Our hash table uses an array of size N = 10,000 and the hash function h(x) = last four digits of x Need a method to handle collision. 1 2 3 4 9997 9998 9999 … 451-229-0004 981-101-0002 200-751-9998 025-612-0001 2019/4/25 chapter25
Collision Handling Collisions occur when different elements are mapped to the same cell Separate Chaining: let each cell in the table point to a linked list of entries that map there 1 2 3 4 451-229-0004 981-101-0004 025-612-0001 Separate chaining is simple, but requires additional memory outside the table 2019/4/25 chapter25
Open Addressing the colliding item is placed in a different cell of the table Load factor: n/N, where n is the number of items to store and N the size of the hash table. n/N≤1. To get a reasonable performance, n/N<0.5. 2019/4/25 chapter25
Linear Probing Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes Example: h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 1 2 3 4 5 6 7 8 9 10 11 12 2019/4/25 chapter25
Search with Linear Probing Consider a hash table A that uses linear probing get(k) We start at cell h(k) We probe consecutive locations until one of the following occurs An item with key k is found, or An empty cell is found, or N cells have been unsuccessfully probed To ensure the efficiency, if k is not in the table, we want to find an empty cell as soon as possible. The load factor can NOT be close to 1. Algorithm get(k) i h(k) p 0 repeat c A[i] if c = return null else if c.key () = k return c.element() else i (i + 1) mod N p p + 1 until p = N 2019/4/25 chapter25
Linear Probing Search for key=20. Search for key=15 Example: h(20)=20 mod 13 =7. Go through rank 8, 9, …, 12, 0. Search for key=15 h(15)=15 mod 13=2. Go through rank 2, 3 and return null. Example: h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, 12, 20 in this order 1 2 3 4 5 6 7 8 9 10 11 12 20 41 18 44 59 32 22 31 73 12 1 2 3 4 5 6 7 8 9 10 11 12 2019/4/25 chapter25
Updates with Linear Probing To handle insertions and deletions, we introduce a special object, called AVAILABLE, which replaces deleted elements remove(k) We search for an entry with key k If such an entry (k, o) is found, we replace it with the special item AVAILABLE and we return element o Else, we return null Have to modify other methods to skip available cells. put(k, o) We throw an exception if the table is full We start at cell h(k) We probe consecutive cells until one of the following occurs A cell i is found that is either empty or stores AVAILABLE, or N cells have been unsuccessfully probed We store entry (k, o) in cell i 2019/4/25 chapter25
Updates with Linear Probing Algorithm put(k,o) i h(k) p 0 repeat c A[i] if c = return null else if c.key () = k A[i]=o else i (i + 1) mod N p p + 1 until p = N if p=N+1 the array is full. Example: h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, 20, 12 in this order Ti insert 12, we look at rank 12 and then rank 0. 1 2 3 4 5 6 7 8 9 10 11 12 12 41 18 44 59 32 22 31 73 20 1 2 3 4 5 6 7 8 9 10 11 12 2019/4/25 chapter25
A complete example Example: A A h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, 20, 12 in this order Remove(): 20, 12 Get(11): check the cell after AVAILABLE cells. Insert keys 10, 11. 10 is at rank 12 and 11 is at rank 0. The Available cells are hard to deal with. Separate Chaining approach is simpler. 1 2 3 4 5 6 7 8 9 10 11 12 A A 12 41 18 44 59 32 22 31 73 20 1 2 3 4 5 6 7 8 9 10 11 12 2019/4/25 chapter25
Performance of Hashing In the worst case, searches, insertions and removals on a hash table take O(n) time The worst case occurs when all the keys inserted into the map collide The load factor a = n/N affects the performance of a hash table Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is 1 / (1 - a) The expected running time of all the operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Applications of hash tables: small databases compilers browser caches 2019/4/25 chapter25
Adaptable Heap A heap that allows changing the key value of an entry in a heap. (See tutorial 11) 2019/4/25 chapter25
Summary ADT: map, dictionary, priority queue, adaptable priority queue. Hash table, binary tree, AVL tree, heap, adaptable heap. Shortest path problem Algorithm Data structures (a complete example for solving an non-trivial problem) 2019/4/25 chapter25
Final Exam The format is similar to that of the mid term. ADT Data structures: AVT tree, binary tree, adaptable heap, heap, linked list, stack, hash table, …. Describe the algorithm/java code for a method Give a concrete instance, e.g., a graph, a tree etc, you are asked to do insertion or deletion, … Shortest path problem Six questions in the final. Answer ALL. 2019/4/25 chapter25