Download presentation
Presentation is loading. Please wait.
Published byArabella Turner Modified over 9 years ago
1
Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1
2
Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 2
3
Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 3
4
Goal: Make the binary blob as small as possible, satisfying the protocol. Recap: Encoding-Decoding 010010100010010100011 110110010010101010110 100001110100010011000 010010101011010100010 encoder decoder
5
Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode
6
Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode c
7
Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode caca
8
Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode cabcab
9
Recap: Prefix-free Encodings Binary Trees We can represent each prefix-free code Ɣ as a binary tree T and vice-versa. abcdabcd 0 10 110 111 Code 1 b cd 0 1 a 01 01 Encoding of letter x = path from the root to the leaf with x # bits for x = depth T (x)
10
Recap: Formal Problem Statement Input: An alphabet A, and frequencies of letters in A Output: A binary tree T, where letters of A are the leaves of T, that has the minimum average bit length (ABL):
11
Recap: Observations About Optimal T Observation 1: The optimal binary tree T is full, i.e., each non-leaf vertex u has exactly 2 children a 0 1 c 0 1 b 0 1 e 0 a 01 c 0 1 b 0 1 e Why? TT`
12
Recap: Observation 2 About Optimal T Claim: In any optimal tree T if leaf x has depth i, and leaf y has depth j, s.t i f(x) ≥ f(y) Exchange Argument: Replace x and y and get a better tree T`.
13
Corollary In any optimal tree T the two lowest frequency letters are both in the lowest level of the tree!
14
Recap: Huffman’s Key Insight Observation 1 => optimal Ts are full => each leaf has a sibling Corollary => 2 lowest freq. letters x, y are at the same level Changing letters across the same level does not change the cost of T b cd 0 1 a 01 01 There is an optimal tree T, in which the two lowest frequency letters are siblings (in the lowest level of the tree).
15
Possible Greedy Algorithm Possible greedy algorithm: 1.If x, y are siblings, treat them as a single meta-letter xy 2.Find an optimal tree T* with A-{x, y} + {xy} 3.Expand xy back into x and y in T*
16
Possible Greedy Algorithm (Example) xy t 0 1 z 01 Ex: A = {x, y, z, t}, and let x, y be the two lowest freq. letters Let A` = {xy, z, t} t 0 1 z 01 xy 0 1 T* T
17
Huffman’s Algorithm (1951) procedure Huffman(A, ): if (|A|=2): return T where branch 0, 1 point to A[0] and A[1], respectively let x, y be lowest two frequency letters let A` = A-{x,y}+{xy} let ` = - {x, y} + {xy: f(x) + f(y)} T* = Huffman(A`, `) expand x, y in T* to get T return T`
18
Huffman’s Algorithm Correctness (1) By induction on the |A| Base case: |A| = 2 => return simple full tree with 2 leaves IH: Assume true for all alphabets of size k-1 Huffman will get a T k-1 opt with meta-letter xy and expand xy
19
Huffman’s Algorithm Correctness (2) xy t 0 1 z 01 t 0 1 z 01 xy 0 1 T k-1 opt T f(xy)*depth(xy) =(f(x) + f(y))*depth(xy) (f(x) + f(y))*(depth(xy) + 1) Total diff = f(x) + f(y)
20
Huffman’s Algorithm Correctness (3) Take any optimal Z, we’ll argue ABL(T) ≤ ABL(Z) By corollary, we can assume in Z, x,y are also siblings at the lowest level. Consider Z` by merging xy in Z => Z` is valid prefix-code for A` of size k-1
21
Correctness 21 t 0 1 z 0 1 xy 0 1 t 0 1 z 0 1 xy Z Z` ABL(Z) = ABL(Z`) + f(x) + f(y) ABL(T) = ABL(T`) + f(x) + f(y) By IH: ABL(T`) ≤ ABL(T`) => ABL(T) ≤ ABL(Z) Q.E.D Total diff is again f(x) + f(y)!
22
Huffman’s Algorithm Runtime Exercise: Make Huffman run in O(|A|log(|A|))?
23
Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 23
24
Given a fixed input, they may give: 1.Different outputs 2.Different runtimes depending on the outcomes of the coins Compared to their deterministic counterparts: often simpler, more practical, elegant Randomized Algorithms Randomized Algorithm Input Output flip coins 24
25
Input: An array A of n integers, and an integer 1 ≤ k ≤ n Output: Find the rank-k element in A: kth smallest element If k = 1 find min If k = n find max If k = n/2 find median Naïve Solution: Sort A, return kth element O(nlog(n)) Problem of Selection Can we do better? Maybe O(n)? 25
26
QuickSelect Given A, k Pick a pivot p from A uniformly at random, Partition A into A L : those p If p is the rank-k element return p, i.e. |A l | = k-1 Otherwise recurse on either A L or A R depending on the sizes of A L and A R. 26
27
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 k = 6 27
28
QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 28
29
QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 29
30
QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 30
31
QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 31
32
QuickSelect Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 k = 6 32
33
QuickSelect Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 k = 6 33
34
QuickSelect Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 k = 6 34
35
QuickSelect Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 k = 6 35
36
QuickSelect Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 36
37
QuickSelect Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 37
38
QuickSelect Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 38
39
QuickSelect Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 39
40
QuickSelect Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 40
41
QuickSelect Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 41
42
QuickSelect Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 42
43
QuickSelect Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 43
44
QuickSelect Simulation pivot 7141 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 44
45
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141 1919 1414 1313 10 5 k = 6 45
46
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 k = 6 46
47
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 k = 6 47
48
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 k = 6 48
49
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 k = 6 49
50
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141616 9898 10101 1919 1414 1313 10 5 k = 6 50
51
QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71481616 9898 10101 1919 1414 1313 10 5 k = 6 6 th smallest element is to the right of 8 4 th smallest 51
52
QuickSelect Simulation 1616 9898 10101 1919 1414 1313 10 5 k = 2 52
53
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 53
54
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 16 54
55
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 16 55
56
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 9898 16 56
57
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 9898 16 57
58
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 9898 16 58
59
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 9898 16 59
60
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1919 9898 16 60
61
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1919 9898 16 61
62
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1414 1919 9898 16 62
63
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1414 1919 9898 16 63
64
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1313 1414 1919 9898 16 64
65
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1313 1414 1919 9898 16 65
66
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 10 5 1313 1414 1919 9898 16 66
67
QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 10101 10 5 1313 1414 1919 9898 16 2 nd smallest return 11 67
68
QuickSelect Pseudocode procedure QuickSelect(A, k): pick a pivot element p uniformly at random; for i from 1 to n: if A[i] < p put A[i] into A L else put A[i] into A R if |A L | = k-1; return p; // p is the kth element else if |A L | ≥ k-1 return QuickSelect(A L, k) else: return QuickSelect(A R, k-1-|A L |) 68
69
Correctness of QuickSelect Informally: If p is the kth smallest element we correctly return it O.w. the kth smallest element is either on A L or A R. We pick the correct subarray according to # elements < p and update the rank k correctly Can be made formal with an inductive proof. 69
70
Runtime of QuickSelect Assume we are selecting the median (rank = n/2)? Question 1: What’s the best scenario? First pivot is the median: O(n) runtime Question 2: What’s the worst scenario? We iteratively pick the max or min element as pivot End up having n-1 iterations O(n 2 ) Runtime of QuickSelect is fundamentally a probability question! How long does QuickSelect take on average? Or what’s the expected runtime of QuickSelect? 70
71
Terminology Clarification Worst-Case vs Average-Case Worst-case/Average-case refer to assumptions about the input Worst-case: Under any input (or the worst input). Average-case: Under an “average” input according to some distribution. Our randomized algorithms analyses will be worst-case No assumptions about the input distribution. Given any input (or worst input) what’s the average or expected run-time of algorithms? 71
72
Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 72
73
Definition (Sample Space Ω ): Set of all possible outcomes Ex 1: Rolling two dice Ω = {(1,1), (1, 2), …, (6, 6)} Ex 2: QuickSelect Ω, all possible sequences of pivot picks Ω: {(kth), (nth, kth),…, (nth, (n-1)st, 1st, kth)} Each outcome i ∈ Ω has a probability p(i) ≥ 0 Sample Space Ω 73
74
Definition (Event S ⊆ Ω): a set of outcomes from Ω Ex: Rolling a 10 or more with 2 dice S = {(4, 6), (6, 4), (5, 5), (5, 6), (6, 5), (6, 6)} Ex: Picking the kth element as pivot in at most 2 picks S = {(kth), (nth, kth), (n-1st, kth), …, (1st, kth)} The probability of each event S: Event S ⊆ Ω Pr(rolling a 10 or more) = 6/36 74
75
Definition (Random Variable): a fn: Ω -> (real numbers) Ex: X: the sum of the dices Random Variable Ω = {(1,1), (1, 2), …, (6, 6)} 231212 Ex: Y: run-time of QuickSelect Ω: {(kth) (nth, kth), …, (nth, n-1st, …, kth) } n~2n ~n 2 75
76
Definition (Indicator Random Variable): A RV X from Ω-> {0, 1} With probability p, X=1 With probability 1-p, X=0 Indicator Random Variable Examples Later 76
77
Definition (Expectation E[X]): average value of X Expectation Value of X under outcome i probability of i Equivalently: 77 Assuming X takes non- negative integer values
78
Expectation Examples 1.Let X be sum of 2 dices: E[X] = 2. Let Y be an indicator random variable. Y=1 with prob. p, Y=0 with 1-p 3. Assume we have a coin that comes heads with prob. p. Let Z be # times we have to flip the coin to get a head 78 Due to independence of consequent coin flips.
79
Facts About Expectation: Facts About Expectation 79
80
Let Z = Σ X j **Linearity of Expectation** Even if X j depend on each other (i.e, not independent) Extremely useful when trying to understand the average value of a complicated random variable Z!. We express Z as a sum of simpler random variables. 80
81
Ex: Birthday Paradox If there are k people in a room, on average, how many pairs of people have the same birthday? Let Z be # pairs of people with the same birthday. Z = 0, if no one shares birthdays Z = 1, if exactly one pair of people have the same birthday … E[Z] is difficult to compute from the definition of expectation. 81
82
Ex: Birthday Paradox Let X (i,j) be an indicator random variable X (i,j) = 1 if i, j have the same birthday, X (i,j) = 0 otherwise Then: when k = 28, E[Z] > 1! 82
83
Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 83
84
Back to Expected Runtime of QuickSelect Let Z be the runtime of QuickSelect Run time question is equivalent to: What’s E[Z]? Trick: Try to break Z into simpler random variables. 84
85
QuickSelect Execution Termination … Recursion 1: work done = n Recursion 2: work done = r 2 Recursion 3: work done = r 3 work done = 0 Recursion k: work done = r k Recursion k+1: work done = r k+1 … Total Work: Sum of work done across all recursions 85
86
Phases of QuickSelect Termination … … Phase 1: calls when the array size [n, 0.75n] Phase 2: calls when the array size [0.75n, 0.75 2 n) Phase j-1: calls when the array size [0.75 j-1 n, 0.75 j n) Phase log 4/3 n 86
87
Expectd Runtime In Terms of Phases, where X j is the work done in phase j. Can we bound E[X j ]? 87
88
Bounding the Work Of Each Phase Consider phase j At each recursion during phase j, the work done ≤ (¾) j-1 n Let Y j be the # recursive calls made during phase j Let’s try to bound the expected number of recursive calls during phase j. 88
89
Bounding the # Calls Of Each Phase e1e1 e2e2 ……e k/ 4 ……………e 3k/ 4 ……e k-1 ekek Let’s say phase j starts with (¾) j-1 n ≤ k < (¾) j n Guaranteed to exit the phase when we cut k by ¾! Let e 1, e 2, …, e k be our elements in increasing order Observation: A phase is guaranteed to end when p, is between [e k/4, e 3k/4 ], irrespective of the rank of the item we’re searching for! if pivot is from here, phase is guaranteed to end 89
90
Bounding the Work Of Each Phase e1e1 e2e2 ……e k/ 4 ……………e 3k/ 4 ……e k-1 ekek Q: What’s the probability of picking p from [e k/4, e 3k/4 ]? A: 50% Expected # picks to pick a central pivot is ≤ 2. Therefore, expected # recursion to end phase j: 90
91
Final Calculations Q.E.D. 91
92
Summary: QuickSelect’s Runtime Analysis 1.Defined r.v. Z as the run-time of QuickSelect 2.Broke the executions into log 4/3 n phases, according to the sizes of the arrays in the recursions 3.Defined X j as the runtime during phase j 4.Expressed Z as 5.Bounded X j : Y j: *(3/4) j n, Y j is # recursions in phase j 6.Bounded E[Y j ] by 2 7.Using (5) and (6) bounded E[Z] by O(n). 92
93
Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 93
94
Back To Sorting: QuickSort Input: Given an array A of size n Output: Elements of A in increasing order Pick a pivot p from A uniformly at random, Partition A into A L : those p Sort A L and A R recursively Output: [A L, p, A R ] 94
95
QuickSort Pseudocode procedure QuickSort(A): if |A| = 1 return A[0] pick a pivot element p uniformly at random; for i from 1 to n: if A[i] < p put A[i] into A L else put A[i] into A R return [QuickSort(A L ), p, QuickSort(A R )] 95
96
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 96
97
QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 97
98
QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 98
99
QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 99
100
QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 100
101
QuickSort Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 101
102
QuickSort Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 102
103
QuickSort Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 103
104
QuickSort Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 104
105
QuickSort Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616
106
QuickSort Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616 106
107
QuickSort Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 107
108
QuickSort Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 108
109
QuickSort Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 109
110
QuickSort Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 110
111
QuickSort Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 111
112
QuickSort Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 112
113
QuickSort Simulation pivot 7141 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 113
114
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141 1919 1414 1313 10 5 114
115
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 115
116
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 116
117
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 117
118
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 118
119
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141616 9898 10101 1919 1414 1313 10 5 119
120
QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71481616 9898 10101 1919 1414 1313 10 5 Recurse Total work done at a recursive call with m elements:, m-1 comparisons with the pivot + m copies = O(m). ** Run-time of QuickSort = O(# comparisons made)** 120
121
Analysis Roadmap 121 1.Let Z be the runtime of QuickSort, i.e. # comparisons made by Quicksort 2.Let X (i, j) be the # times (i, j) gets compared. 3.Express Z as: 4.Solve E[X (i, j) ] and sum them up to solve E[Z].
122
Counting # Comparisons Made By QuickSort 122 Let X (i, j) be the # times (i, j) gets compared. Fix a particular recursive call: 10 5 71313 81414 119191 41010 9898 1616 71481616 9898 10101 1919 1414 1313 71313 81414 119191 41010 9898 1616 Observation 1: All comparisons are made against the pivot! Observation 2: The pivot will never be compared to anything else!
123
Counting # (i, j) Comparisons 123 10 5 71313 81414 119191 41010 9898 1616 71481616 9898 10101 1919 1414 1313 71313 81414 119191 41010 9898 1616 Condition for (i, j) comparison: i, j are compared only when 1.They are in the some recursive call together 2.One of them is a pivot And they can never be compared again. Therefore X (i, j) can only be 0 or 1 (i.e., is an indicator r.v.)
124
E[X (i,j) ] 124 Recall: Expected value of indicator R.V.: E[X] = Pr(X=1). Question: What’s Pr(X (i,j) = 1)?
125
(i, j) Comparison Simulation (1) 125 e1e1 e2e2 ………eiei ……ejej ……e m-1 emem Recursive Call 1 …eiei ……ejej ……e m-1 emem No Comparison Recursive Call 2 pivot …eiei ……ejej … …eiei …ejej … No Comparison Recursive Call 3 Cannot be compared again! Total 0 comparisions! i and j are at different recursions
126
(i, j) Comparison Simulation (2) 126 e1e1 e2e2 ………eiei ……ejej ……e m-1 emem Recursive Call 1 No Comparison Recursive Call 2 pivot ……eiei ……ejej … No Comparison Recursive Call 3 1 (e i, e j ) comparison! (and only 1) e1e1 e2e2 ………eiei ……ejej … Observation: (e i, e j ) is compared only if e i or e j is the first pivot to be picked among [e i, e j ] block.
127
Pr(X (i,j) = 1) 127 Pr(X (i,j) = 1) = probability that e i or e j is the first element to be picked amongst [e i, e i+1, …, e j ]:
128
Final Calculations 128 For fixed i: 1/2 + 1/3 +... + 1/(n-i+1) ≤ ln(n) Q.E.D
129
129 On Monday: Min Cut and Max Cut
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.