CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann
Michael Eckmann - Skidmore College - CS Fall 2008 Today’s Topics Questions? Dijkstra's algorithm Hash tables and functions
Graphs the shortest path could be in terms of minimum weight for weighted graphs (note: weights are non-negative) e.g. finding the lowest cost flights Dijkstra's algorithm solves this problem –It attempts to minimize the weight at each step. Dijkstra's algorithm is a greedy algorithm. That is, its strategy is to locally minimize the weight, hoping that's the best way to get the minimum weight of the whole graph. –Sometimes the local minimum weight is not the correct choice for the overall problem. In that case, the algorithm will still work, but the initial guess was wrong. –Dijkstra's algorithm works in a similar way to BFS but instead of a queue, use a “minimum” priority queue. That is, a priority queue that returns an item whose priority is least among the items in the priority queue. –Let's see an example on the board and come up with pseudocode for this algorithm.
Graphs Example on the board and then pseudocode for this algorithm. 0-> 1(4), 2(2), 4(4) 1-> 3(3), 4(3) 2-> 1(1), 4(1) 3-> 4(1),5(2) 4-> 6(2) 5-> 6(3) 6-> null Dijkstra's algorithm, given a starting vertex will find the minimum weight paths from that starting vertex to all other vertices.
We need code to handle a weighted, directed graph. We need a “minimum” Priority Queue, that is, one that returns the item with the lowest priority at any given remove(). We need a way to set all the minimum path lengths to Integer.MAX_VALUE (this is the initial value we want to use for the path lengths, because if we ever calculate a lesser weight path, then we store this lesser weight path.)
Dijkstra's algorithm pseudocode (given a startV) set all vertices to unvisited and all to have pathLen MAX set pathLen from startV to startV to be 0 add (item=startV, priority=0) to PQ while (PQ !empty) { v = remove the lowest priority vertex from PQ (do this until we get an unvisited vertex out) set v to visited for all unvisited adjacent vertices (adjV) to v { if ( current pathLen from startV to adjV ) > ( weight of the edge from v to adjV + pathLen from startV to v ) then { set adjV's pathLen from startV to adjV to be weight of the edge from v to adjV + pathLen from startV to v add (item=v, priority=pathLen just calculated) to PQ prevof adjV is set to v } // end if } // end for } // end while
Hashing is used to allow very efficient insertion, removal, and retrieval of items. Consider retrieval (searching) with several structures –To find data in an unordered linear list structure O(n) –To find data in an order linear list structure O(log n) –To find data in a BST or a Heap O(log n) What orders are better than log n ? Hashes
Hashing is used to allow –inserting an item –removing an item –searching for an item all in constant time (in the average case). Hashing does not provide efficient sorting nor efficient finding of the minimum or maximum item etc. Hashes
We want to insert our items (of any type (String, int, double, etc.)) into a structure that allows fast retrieval. Terms: –Hash Table (an array of references to objects(items)) table_size is the number of places to store –Hash Function (calculates a hash value (an integer) based on some key data about the item we are adding to the hash table.) –Hash Value (the value returned by the hash function) the hash value must be an integer value within [0, table_size – 1] this gives us the index in the hash table, where we wish to store the item. Hashes
Just to give an idea of how to insert and retrieve items into a hash table (this does not use a good hash function) –Consider our items are simply ints –Consider our Hash Function to be f(x) = x % n (this is not a typical hash function) –The hash function returns a hash value which is modded by the size of our hash table array to compute the index where we wish to store our item. –example on the board (assume n=8, add items 24, 3, 17, 31) Then we can reverse the process to see if a particular item is in our hash table. Hashes
In our example (assume n=8, add items 24, 3, 17, 31), what if we needed to insert item 11 into our hash? There'd be a collision. There are several strategies to handle collisions –the chosen strategy effects how retrieval is handled too –Open Addressing (aka Probing hash table) Place item in next open slot –or –Separate chaining Each array element is a list Examples of these two techniques on the board. Hashes
Let's come up with a hash table to store Strings –we'll need to come up with the size of our table –we'll need to decide whether we will use seperate chaining or open addressing hashing We'll need to create a hash function. (We'll talk about strategies for creating good hash functions next time) We'll also allow insertion and retrieval (determine if an item exists in the hash). Hashes
Strategies for best performance (we'll go through more of this next time) –want items to be distributed evenly throughout the hash table and we want few collisions so that depends on our choice of hash function and our size of the hash table –also need to decide whether to use a probing hash table or to use a hash table where collisions are handled by adding the item to a list for the index (hash value) other methods called quadratic probing and double hashing are other ways of handling collisions. –if choices are done well we get the retrieval time to be a constant, but the worst case is O(n) –we also need to consider the computations needed to insert (computing the hash value) Hashes