Divide and Conquer for Sorting (2.3/1.3) Divide (into two equal parts) Conquer (solve for each part separately) Combine separate solutions Mergesort –Divide into two equal parts –Sort each part using Mergesort (recursion!!!) –Merge two sorted subsequences
Merging Two Subsequences x[1]-x[2]- … - x[K] y[1]-y[2]- … - y[L] if y[i] > x[j] y[i+1] > x[j] < < K+L-1 edges = # (comparisons) = linear time
Merge Sort Execution Example
Recursion Tree log n n comparisons per level log n levels total runtime = n log n
Master Method (4. 3) Recurrence T(n) = a T(n/b) + f(n) 1) If for some > 0 then 2) If then 3) If for some > 0 and a f(n/b) c f(n) for some c < 1 then
Master Method Examples Mergesort T(n) = 2T(n/2) + (n) Strassen (28.2) T(n) = 7T(n/2) + (n 2 )
Quicksort ( / ) Sorts in place like insertion sort, unlike merge sort Divide into two parts such that –elements of left part < elements of right part Conquer: recursively solve for each part separately Combine: trivial - do not do anything Quicksort(A,p,r) if p < r then q Partition (A,p,r) Quicksort (A,p,q) Quicksort (A,q+1,r) //divide //conquer left //conquer right
Divide = Partition PARTITION(A,p,r) //Partition array from A[p] to A[r] with pivot A[p] //Result: All elements original A[p] have index i x = A[p] i = p - 1 j = r + 1 repeat forever repeat j = j - 1 until A[j] x repeat i = i +1 until A[i] x if i < j then exchange A[i] A[j] else return j
How It Works i j i j i j i j i j i j i j i j i j 11105* * i j i j i j i j i j i j i j i j i j i j i j i j ij ij leftright i j leftright leftright leftright leftright ji leftright ji leftright ji leftright j i leftright ji leftright leftright leftright leftright leftright jj leftright ji *169 15* leftright ji leftright ji leftright ji leftright ji leftright ji leftright leftright leftright
Runtime of Quicksort Worst case: –every time nothing to move –pivot = left (right) end of subarray –O(n 2 ) n Recursion Tree of QSort
Runtime of Quicksort Best case: –every time partition in (almost) equal parts –no worse than in given proportion –O(n log n) Average case = ?
What is the DQ Recurrence for QSort? t(n) = (n) + 1/n j = 1 to n (t(j-1) + t(n-j)) pivots pivots equiprobable lengths of subproblems = (n) + 2/n k = 0 to n-1 t(k) Guess t(n) O(n log n) t(n) (n) + 2/n [ i=2 to n-1 c i log i] c n log n – cn/2
Another QSort Analysis (Shift / Cancel) (1) T(n) = n-1 + 2/n j = 1 to n-1 T(i), n 2, T(1) = 0 (2) T(n+1) = n /(n+1) j = 1 to n T(i) (1) (3) nT(n) = n(n-1) + 2 j = 1 to n-1 T(i) (2) (4) (n+1)T(n+1) = (n+1)n + 2 j = 1 to n T(i) (4) - (3) (n+1)T(n+1) – nT(n) = 2n + 2T(n) T(n+1) = (n+2)/(n+1) T(n) + 2n/(n+1) T(n+1) (n+2)/(n+1) T(n) + 2
Shift / Cancel (cont.) Unroll: H n = 1 + ½ + 1/3 + …+ 1/n = ln n + + O(1/n) = Euler’s constant = 0.577… T(n) 2(n+1) (ln n + – 3/2) + O(1) T(n) O(n log n)
Randomized Analysis of QSort Probability space –Set of elementary events experiment outcomes –Family F of subsets of , called events –Probability measure Pr, real-valued function on members of F –where A , A F A c F F closed under union, intersection For all A F, 0 Pr(A) 1 Pr( ) = 1 For disjoint events A 1, A 2,... Pr(U A i ) = Pr(A i ) Random variable is a function from elements of into –e.g., event (X = x) set of elements of for which X assumes the fixed value x For integer-valued r.v. X, expectation E[X] = i Pr(X=i)
Randomized Analysis of QSort (cont.) At each level, we compare each element to the splitter in its partition Def. A successful pivot p satisfies n/8 < p < 7n/8 Three facts –(1) E[ X i ] = E[X i ] linearity of expectation –(2) If Pr(Heads) is q, expected # tosses up to and including first Head is 1/q // family size puzzle –(3) Pr(U i A i ) i Pr(A i ) prob of union sum of probs Given that Pr(successful pivot) = 3/4, a single element e i participates in at most how many successful partitioning steps? –Ans: log 8/7 n
Randomized Analysis of QSort (cont.) What is number of partition steps expected between k th, (k+1) st successful pivots? –(2) 4/3 steps –(1) 4/3 log 8/7 n –Each element expects to participate in O(log n) pivots (comparisons) Define r.v.’s which give # comparisons for each element –(1) get O(n log n) expected comparisons So, we have another analysis that gives us the O(n log n) expected complexity of QSort
How Badly Can We Deviate From Expectation? E.g., what is Pr (i th element sees 20 log n pivots)? –How many successes possible? –Know log 8/7 n Pivots are independent (Bernoulli trials) –Pr(success) = 3/4, Pr(failure) = 1/4 –To see 20 log n pivots, need 20 log n - log n failures in 20 log n tries Can use (3) to bound probability of so many “bad” events, since in general events may not be independent –Can get: Pr( 20 n log n) is 1 - O(n -6 )
Selection Best pivot: median –exercise: analyze complexity when bad pivots chosen –too expensive random splitter This leads to SELECTION: –select (L, k) returns k th - smallest element of L –e.g., k = |L|/2 median What is an efficient algorithm? –O(n log n)? –sorting D/Q Recursion: –N.B.: This if RSelect, below –recall pivot from QSort –if i<k, look in right part for (k-i) th smallest –if i>k, look in left part for k th smallest
Randomized Selection Worst case: – (n 2 ) as in QSort analysis Suppose can guarantee “good” pivot –e.g., n/4 i 3n/4 – subproblem size 3n/4 –Let s(n) time to find good pivot – t(n) s(n) + cn + t([3n/4]) find pivot pivot, make subproblem solve subproblem
Randomized Selection Suppose further: S(n) dn for some d; t(n) (c+d)n + t([3n/4]) Claim: t(n) kn for some k would follow –Constructive induction or “substitution” –Ind. Hyp.: t(m) km for m n-1 –Ind. Step: t(n) (c+d)n + k(3n/4) = (c+d+3k/4)n kn which we want to be equivalent to t(n) kn –But this is true if k 4(c+d)
Break Celebrity Problem: Given n people and a “knows” relation, is there a celebrity? –Notation: Directed graph (and, let’s say that there can be 0, 1 or 2 directed edges between any two vertices (== people)) What is obvious algorithm? –Test each person’s celebrityhood Induction –Hyp: Can tell whether there is a celebrity among the first n-1 people –Induction Step: Have a celebrity among first n-1 people two queries needed to verify whether known by n th person No celebrity among first n-1 people check whether n th person is a celebrity (2(n-1) queries needed) (Else no celebrity) (n 2 ) queries
Break (cont.) Celebrity Problem: Can you do better? –Hint: (n-1) + 2(n-1) queries suffice Why are we focusing on identifying the celebrity? Key idea: eliminate a non-celebrity with each query –K ij = 0 j not celebrity –K ij = 1 i not celebrity –(We’ve seen this “complement” idea in DQ-MAXMIN) Another question: Given the adjacency matrix of an undirected graph, describe a fast algorithm that finds all triangles in the graph. –Q: Why am I asking this now?
D/Q for Arithmetic Multiplying Large Integers –A = a 0 + r 1 a r n-1 a n-1, r radix –“classic” approach (n 2 ) work Can we apply D/Q? –Let n = 2s, r = 10 radix –AB = xz + 10 s (wz + xy) s wy –T(n) = 4T(n/2) + (n) a = 4, b = 2 in Master Method –T(n) (n 2 ) –Need to reduce # subproblems, i.e., want a < 4 Observation: r’ = (w+x)(y+z) = wy + (wz+xy) + xz –r’ (w+x)(y+z) –p wy –q xz –return 10 2s p + 10 s (r’-p-q) + q –T(n) O(n log 2 3 ) = O(n 1.59 )
Matrix Multiplication A = […], B = […] are n x n matrices a 11, a 12, etc are n/2 x n/2 submatrices M = AB = […] –where m 11 = a 11 b 11 + a 12 b 21 etc. –Evaluation requires 8 multiplies, 4 adds T(n) = 8T(n/2) + O(n) (n 3 ) Strassen: –p 1 = (a 21 + a 22 - a 11 )(b 22 - b 12 + b 11 ) –p 2 = a 11 b 11 –p 3 = a 12 b 21 –p 4 = (a 11 -a 21 )(b 22 -b 12 ) –p 5 = (a 21 +a 22 )(b 12 - b 11 ) –p 6 = (a 12 -a 21 +a 11 -a 22 )b 22 –p 7 = a 22 (b 11 +b 22 -b 12 -b 21 )
Strassen’s Matrix Multiplication p 1 = (a 21 + a 22 - a 11 )(b 22 - b 12 + b 11 ) p 2 = a 11 b 11 p 3 = a 12 b 21 p 4 = (a 11 -a 21 )(b 22 -b 12 ) p 5 = (a 21 +a 22 )(b 12 - b 11 ) p 6 = (a 12 -a 21 +a 11 -a 22 )b 22 p 7 = a 22 (b 11 +b 22 -b 12 -b 21 ) AB 11 = p 2 + p 3 AB 12 = p 1 + p 2 + p 5 + p 6 AB 21 = p 1 + p 2 + p 4 + p 7 AB 22 = p 1 + p 2 + p 4 + p 5 T(n) (n 2.81 ) // 7 multiplies, 24 adds –Can get to 15 adds