Expected Running Times and Randomized Algorithms Instructor Neelima Gupta

Expected Running Times and Randomized Algorithms Instructor Neelima Gupta ngupta@cs.du.ac.in

Expected Running Time of Insertion Sort x 1,x 2,........., x i-1,x i,.......…,x n For I = 2 to n Insert the ith element x i in the partially sorted list x 1,x 2,........., x i-1. (at r th position)

Let X i be the random variable which represents the number of comparisons required to insert i th element of the input array in the sorted sub array of first i-1 elements. X i : x i1,x i2,..................…,x ii E(X i ) = Σ j x ij p(x ij ) where E(X i ) is the expected value X i And, p(x ij ) is the probability of inserting x i in the j th position 1≤j≤i Expected Running Time of Insertion Sort

x 1,x 2,........., x i-1,x i,.......…,x n How many comparisons it makes to insert i th element in j th position? (at j th position) Expected Running Time of Insertion Sort

Position# of Comparisions i1 i-12 i-23.. 2i-1 1i-1 Note: Here, both position 2 and 1 have # of Comparisions equal to i-1. Why? Because to insert element at position 2 we have to compare with previously first element. and after that comparison we know which of them come first and which at second.

Thus, E(X i ) = (1/i) { i-1 Σ k=1 k + (i-1) } where 1/i is the probability to insert at j th position in the i possible positions. For n elements, E(X 1 + X 2 +.............+X n ) = n Σ i=2 E(X i ) = n Σ i=2 (1/i) { i-1 Σ k=1 k + (i-1) } = (n-1)(n-4) / 4 Therefore average case of insertion sort takes Θ(n 2 )

For n number of elements, expected time taken is, T = n Σ i=2 (1/i) { i-1 Σ k=1 k + (i-1) } where 1/i is the probability to insert at r th position in the i possible positions. E(X 1 + X 2 +.............+X n ) = n Σ i=1 E(X i ) Where,Xi is expected value of inserting X i element. T = (n-1)(n-4) / 4 Therefore average case of insertion sort takes Θ(n 2 )

Quick-Sort Pick the first item from the array--call it the pivot Partition the items in the array around the pivot so all elements to the left are  to the pivot and all elements to the right are greater than the pivot Use recursion to sort the two partitions pivot partition: items > pivot partition 1: items  pivot

Quicksort: Expected number of comparisons Partition may generate splits (0:n-1, 1:n-2, 2:n-3, …, n-2:1, n-1:0) each with probability 1/n If T(n) is the expected running time,

Randomized Quick-Sort Pick an element from the array--call it the pivot Partition the items in the array around the pivot so all elements to the left are  to the pivot and all elements to the right are greater than the pivot Use recursion to sort the two partitions pivot partition: items > pivot partition 1: items  pivot

Remarks Not much different from the Q-sort except that earlier, the algorithm was deterministic and the bounds were probabilistic. Here the algorithm is also randomized. We pick an element to be a pivot randomly. Notice that there isn’t any difference as to how does the algorithm behave there onwards? In the earlier case, we can identify the worst case input. Here no input is worst case.

Randomized Select

Randomized Algorithms A randomized algorithm performs coin tosses (i.e., uses random bits) to control its execution b ← random() if b = 0 do A … else { i.e. b = 1} do B … Its running time depends on the outcomes of the coin tosses

Assumptions the coins are unbiased, and the coin tosses are independent The worst-case running time of a randomized algorithm may be large but occurs with very low probability (e.g., it occurs when all the coin tosses give “heads”)

Monte Carlo Algorithms Running times are guaranteed but the output may not be completely correct. Probability of error is low.

Las Vegas Algorithms Output is guaranteed to be correct. Bounds on running times hold with high probability. What type of algorithm is Randomized Qsort?

Why expected running times? Markov’s inequality P( X > k E(X)) < 1/k i.e. the probability that the algorithm will take more than O(2 E(X)) time is less than 1/2. Or the probability that the algorithm will take more than O(10 E(X)) time is less than 1/10. This is the reason why Qsort does well in practice.

Markov’s Bound P(X<kM)< 1/k,where k is a constant. Chernouff’s Bound P(X>2μ)< ½ A More Stronger Result P(X>k μ )< 1/n k,where k is a constant.

Binary search tree can be built randomly. Rank(x)=i Randomly selected key becomes the root. Pivot element=root x >< RANDOMLY BUILT BST

X i : the height of the tree rooted at a node with rank=i. Y i : exponential height of the tree=2^X i H=max{H1,H2} + 1 where H1 : ht. of left subtree H2 : ht.of right subtree H : ht. of the tree rooted at x. HEIGHT OF THE TREE

Y=2^H =2.max{2^H1,2^H2} Expected value of exponential ht. of the tree with ‘n’ nodes: =E(EH(T(X))) =2/n ∑ max{EH(T(k)),EH(T(n-1-k))} =O(n^3)=E(H(T(n)))=O(log n)

Skip list is a data structure that can be used to maintain dictionary. Given n keys,we insert these n keys in a linked list that has -∞ as first node and ∞ as last node. Initial list S 0 Then we flip coin a coin for each element until only one is left in S i,if a tail occurs,we insert it into next list S i+1 and so on. -∞∞ 59 2530353840 Skip List : Dictionary as ADT

-∞ S3S2S1S0S3S2S1S0 5∞40383530259 ∞ ∞ ∞ 93038 30 3830 headTail

Operations that can be performed on skip list Each node has two pointers right and down 1.Drop down This operation is performed when after(p)>key. In this operation, pointer p moves down to immediate lower level list. (after drop down) right down -∞ ∞ ∞30 938 p S1 S0

2.Scan forward This operation is performed when after(p)<key. Here the pointer p moves to the next element in the list. eg. here key=28 & p is at 9 after(9)<28, so scan forward -∞9∞ 25 30 p p p S0S0

Searching a key k Keep a ptr p to the first node in the highest list S h while (after(p)>k) if (S cur ==S 0 ) // S cur is the current skip list then “key k not found”; exit if (after(p)>k) drop down to next skip list If (after(p)<k).scan forward i.e. update p  after(p) if (after(p)==k) return after(p);

-∞ Searching for a key 25 S 3 S 2 S 1 S 0 -∞5∞40383530259 ∞ ∞ ∞ 30 9 38 p p p Key found p p p

-∞ Searching for a key 28 S 3 S 2 S 1 S 0 -∞5∞40383530259 ∞ ∞ ∞ 30 9 38 p p p Key not found p p p p

-∞ Deletion of a key S 3 eg.delete 30 S 2 S 1 S 0 -∞5∞40383530259 ∞ ∞ ∞ 30 9 38 p p p p p

Analysis 1. An element x k is in S i with probability 1/2 i, true ∀ elements. E(S i ) = ∑ 1/2 i.X k,i where X k,i = 1 if x k is in S i, 0, otherwise. = n/2 i E(total size) = E(∑ IS i I) = ∑ n/2 i ≤ 2n 2. Expected height of a skip list,h = log n n/2 h =1 ⇛h ≃ log n n k=1 ∞

Analysis(contd.) 3.Drop downO(log n) Since pointer p can drop atmost h times i.e.height of the skip list until S 0 is reached and h = logn. 4.Scan forwardO(log n) # of elements Total no. of levels Total Cost to scan at each level O(1)O(log n )O(log n )

The number of elements scanned at ith level is no more than 2 because The key lies between p and after(p) on the (i+1)th level (that’s why we came down to ith level). And, there is only one element between p and after(p) of (i+1)th level in Si : the element pointed to by after(p) in Si. Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si.

Hashing Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated data Symbols: variable names, procedure names, etc. Associated data: memory location, call graph, etc. –For a symbol table (also called a dictionary), we care about search, insertion, and deletion –We typically don’t care about sorted order

Hash Tables More formally: –Given a table T and a record x, with key (= symbol) and satellite data, we need to support: Insert (T, x) Delete (T, x) Search(T, x) –We want these to be fast, but don’t care about sorting the records The structure we will use is a hash table –Supports all the above in O(1) expected time!

Hash Functions Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

Resolving Collisions How can we solve the problem of collisions? One of the solution is : chaining Other solutions: open addressing

Chaining Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining How do we insert an element? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7 How do we delete an element?

Chaining How do we search for a element with a given key? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table: the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key?

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  )

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  ) What will be the average cost of a successful search?

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  ) What will be the average cost of a successful search? A: O((1 +  )/2) = O(1 +  )

Analysis of Chaining Continued So the cost of searching = O(1 +  ) If the number of keys n is proportional to the number of slots in the table, what is  ? A:  = O(1) –In other words, we can make the expected cost of searching constant if we make  constant

If we could prove this, P(failure)<1/k (we are sort of happy) P(failure)<1/n k (most of times this is true and we’re happy ) P(failure)<1/2 n (this is difficult but still we want this) A Final Word About Randomized Algorithms

Acknowledgements Kunal Verma Nidhi Aggarwal And other students of MSc(CS) batch 2009.

Expected Running Times and Randomized Algorithms Instructor Neelima Gupta

Similar presentations

Presentation on theme: "Expected Running Times and Randomized Algorithms Instructor Neelima Gupta"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Expected Running Times and Randomized Algorithms Instructor Neelima Gupta

Similar presentations

Presentation on theme: "Expected Running Times and Randomized Algorithms Instructor Neelima Gupta"— Presentation transcript:

Similar presentations

About project

Feedback