Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1.

Similar presentations


Presentation on theme: "Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1."— Presentation transcript:

1 Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

2 Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 2

3 Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 3

4 Goal: Make the binary blob as small as possible, satisfying the protocol. Recap: Encoding-Decoding 010010100010010100011 110110010010101010110 100001110100010011000 010010101011010100010 encoder decoder

5 Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode

6 Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode c

7 Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode caca

8 Ex: Variable Length Prefix-free Encoding Ex: A = {a, b, c, d} abcdabcd 0 10 110 111 110010 decode cabcab

9 Recap: Prefix-free Encodings  Binary Trees  We can represent each prefix-free code Ɣ as a binary tree T and vice-versa. abcdabcd 0 10 110 111 Code 1 b cd 0 1 a 01 01 Encoding of letter x = path from the root to the leaf with x # bits for x = depth T (x)

10 Recap: Formal Problem Statement  Input: An alphabet A, and frequencies of letters in A  Output: A binary tree T, where letters of A are the leaves of T, that has the minimum average bit length (ABL):

11 Recap: Observations About Optimal T  Observation 1: The optimal binary tree T is full, i.e., each non-leaf vertex u has exactly 2 children a 0 1 c 0 1 b 0 1 e 0 a 01 c 0 1 b 0 1 e Why? TT`

12 Recap: Observation 2 About Optimal T Claim: In any optimal tree T if leaf x has depth i, and leaf y has depth j, s.t i f(x) ≥ f(y) Exchange Argument: Replace x and y and get a better tree T`.

13 Corollary In any optimal tree T the two lowest frequency letters are both in the lowest level of the tree!

14 Recap: Huffman’s Key Insight  Observation 1 => optimal Ts are full => each leaf has a sibling  Corollary => 2 lowest freq. letters x, y are at the same level  Changing letters across the same level does not change the cost of T b cd 0 1 a 01 01 There is an optimal tree T, in which the two lowest frequency letters are siblings (in the lowest level of the tree).

15 Possible Greedy Algorithm  Possible greedy algorithm: 1.If x, y are siblings, treat them as a single meta-letter xy 2.Find an optimal tree T* with A-{x, y} + {xy} 3.Expand xy back into x and y in T*

16 Possible Greedy Algorithm (Example) xy t 0 1 z 01 Ex: A = {x, y, z, t}, and let x, y be the two lowest freq. letters Let A` = {xy, z, t} t 0 1 z 01 xy 0 1 T* T

17 Huffman’s Algorithm (1951) procedure Huffman(A, ): if (|A|=2): return T where branch 0, 1 point to A[0] and A[1], respectively let x, y be lowest two frequency letters let A` = A-{x,y}+{xy} let ` = - {x, y} + {xy: f(x) + f(y)} T* = Huffman(A`, `) expand x, y in T* to get T return T`

18 Huffman’s Algorithm Correctness (1)  By induction on the |A|  Base case: |A| = 2 => return simple full tree with 2 leaves  IH: Assume true for all alphabets of size k-1  Huffman will get a T k-1 opt with meta-letter xy and expand xy

19 Huffman’s Algorithm Correctness (2) xy t 0 1 z 01 t 0 1 z 01 xy 0 1 T k-1 opt T f(xy)*depth(xy) =(f(x) + f(y))*depth(xy) (f(x) + f(y))*(depth(xy) + 1) Total diff = f(x) + f(y)

20 Huffman’s Algorithm Correctness (3)  Take any optimal Z, we’ll argue ABL(T) ≤ ABL(Z)  By corollary, we can assume in Z, x,y are also siblings at the lowest level.  Consider Z` by merging xy in Z => Z` is valid prefix-code for A` of size k-1

21 Correctness 21 t 0 1 z 0 1 xy 0 1 t 0 1 z 0 1 xy Z Z` ABL(Z) = ABL(Z`) + f(x) + f(y) ABL(T) = ABL(T`) + f(x) + f(y) By IH: ABL(T`) ≤ ABL(T`) => ABL(T) ≤ ABL(Z) Q.E.D Total diff is again f(x) + f(y)!

22 Huffman’s Algorithm Runtime Exercise: Make Huffman run in O(|A|log(|A|))?

23 Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 23

24  Given a fixed input, they may give: 1.Different outputs 2.Different runtimes depending on the outcomes of the coins  Compared to their deterministic counterparts:  often simpler, more practical, elegant Randomized Algorithms Randomized Algorithm Input Output flip coins 24

25  Input: An array A of n integers, and an integer 1 ≤ k ≤ n  Output: Find the rank-k element in A: kth smallest element If k = 1  find min If k = n  find max If k = n/2  find median  Naïve Solution: Sort A, return kth element  O(nlog(n)) Problem of Selection Can we do better? Maybe O(n)? 25

26 QuickSelect  Given A, k  Pick a pivot p from A uniformly at random,  Partition A into A L : those p  If p is the rank-k element return p, i.e. |A l | = k-1  Otherwise recurse on either A L or A R depending on the sizes of A L and A R. 26

27 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 k = 6 27

28 QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 28

29 QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 29

30 QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 30

31 QuickSelect Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 k = 6 31

32 QuickSelect Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 k = 6 32

33 QuickSelect Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 k = 6 33

34 QuickSelect Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 k = 6 34

35 QuickSelect Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 k = 6 35

36 QuickSelect Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 36

37 QuickSelect Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 37

38 QuickSelect Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 38

39 QuickSelect Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 39

40 QuickSelect Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 40

41 QuickSelect Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 41

42 QuickSelect Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 42

43 QuickSelect Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 43

44 QuickSelect Simulation pivot 7141 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 k = 6 44

45 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141 1919 1414 1313 10 5 k = 6 45

46 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 k = 6 46

47 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 k = 6 47

48 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 k = 6 48

49 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 k = 6 49

50 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141616 9898 10101 1919 1414 1313 10 5 k = 6 50

51 QuickSelect Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71481616 9898 10101 1919 1414 1313 10 5 k = 6 6 th smallest element is to the right of 8 4 th smallest 51

52 QuickSelect Simulation 1616 9898 10101 1919 1414 1313 10 5 k = 2 52

53 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 53

54 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 16 54

55 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 16 55

56 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 9898 16 56

57 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 9898 16 57

58 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 9898 16 58

59 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 9898 16 59

60 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1919 9898 16 60

61 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1919 9898 16 61

62 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1414 1919 9898 16 62

63 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1414 1919 9898 16 63

64 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1313 1414 1919 9898 16 64

65 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 1313 1414 1919 9898 16 65

66 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 1010 10 5 1313 1414 1919 9898 16 66

67 QuickSelect Simulation pivot 1616 9898 10101 1919 1414 1313 10 5 k = 2 10101 10 5 1313 1414 1919 9898 16 2 nd smallest return 11 67

68 QuickSelect Pseudocode procedure QuickSelect(A, k): pick a pivot element p uniformly at random; for i from 1 to n: if A[i] < p put A[i] into A L else put A[i] into A R if |A L | = k-1; return p; // p is the kth element else if |A L | ≥ k-1 return QuickSelect(A L, k) else: return QuickSelect(A R, k-1-|A L |) 68

69 Correctness of QuickSelect  Informally: If p is the kth smallest element we correctly return it O.w. the kth smallest element is either on A L or A R. We pick the correct subarray according to # elements < p and update the rank k correctly Can be made formal with an inductive proof. 69

70 Runtime of QuickSelect  Assume we are selecting the median (rank = n/2)?  Question 1: What’s the best scenario?  First pivot is the median: O(n) runtime  Question 2: What’s the worst scenario?  We iteratively pick the max or min element as pivot  End up having n-1 iterations  O(n 2 ) Runtime of QuickSelect is fundamentally a probability question! How long does QuickSelect take on average? Or what’s the expected runtime of QuickSelect? 70

71 Terminology Clarification Worst-Case vs Average-Case  Worst-case/Average-case refer to assumptions about the input  Worst-case: Under any input (or the worst input).  Average-case: Under an “average” input according to some distribution.  Our randomized algorithms analyses will be worst-case  No assumptions about the input distribution. Given any input (or worst input) what’s the average or expected run-time of algorithms? 71

72 Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 72

73  Definition (Sample Space Ω ): Set of all possible outcomes  Ex 1: Rolling two dice Ω = {(1,1), (1, 2), …, (6, 6)}  Ex 2: QuickSelect Ω, all possible sequences of pivot picks Ω: {(kth), (nth, kth),…, (nth, (n-1)st, 1st, kth)}  Each outcome i ∈ Ω has a probability p(i) ≥ 0 Sample Space Ω 73

74  Definition (Event S ⊆ Ω): a set of outcomes from Ω  Ex: Rolling a 10 or more with 2 dice S = {(4, 6), (6, 4), (5, 5), (5, 6), (6, 5), (6, 6)}  Ex: Picking the kth element as pivot in at most 2 picks S = {(kth), (nth, kth), (n-1st, kth), …, (1st, kth)}  The probability of each event S: Event S ⊆ Ω Pr(rolling a 10 or more) = 6/36 74

75  Definition (Random Variable): a fn: Ω -> (real numbers)  Ex: X: the sum of the dices Random Variable Ω = {(1,1), (1, 2), …, (6, 6)} 231212  Ex: Y: run-time of QuickSelect Ω: {(kth) (nth, kth), …, (nth, n-1st, …, kth) } n~2n ~n 2 75

76  Definition (Indicator Random Variable): A RV X from Ω-> {0, 1}  With probability p, X=1  With probability 1-p, X=0 Indicator Random Variable Examples Later 76

77  Definition (Expectation E[X]): average value of X Expectation Value of X under outcome i probability of i  Equivalently: 77 Assuming X takes non- negative integer values

78 Expectation Examples 1.Let X be sum of 2 dices: E[X] = 2. Let Y be an indicator random variable. Y=1 with prob. p, Y=0 with 1-p 3. Assume we have a coin that comes heads with prob. p. Let Z be # times we have to flip the coin to get a head 78 Due to independence of consequent coin flips.

79  Facts About Expectation: Facts About Expectation 79

80  Let Z = Σ X j **Linearity of Expectation** Even if X j depend on each other (i.e, not independent) Extremely useful when trying to understand the average value of a complicated random variable Z!. We express Z as a sum of simpler random variables. 80

81 Ex: Birthday Paradox If there are k people in a room, on average, how many pairs of people have the same birthday? Let Z be # pairs of people with the same birthday. Z = 0, if no one shares birthdays Z = 1, if exactly one pair of people have the same birthday … E[Z] is difficult to compute from the definition of expectation. 81

82 Ex: Birthday Paradox Let X (i,j) be an indicator random variable X (i,j) = 1 if i, j have the same birthday, X (i,j) = 0 otherwise Then: when k = 28, E[Z] > 1! 82

83 Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 83

84 Back to Expected Runtime of QuickSelect  Let Z be the runtime of QuickSelect  Run time question is equivalent to: What’s E[Z]?  Trick: Try to break Z into simpler random variables. 84

85 QuickSelect Execution Termination … Recursion 1: work done = n Recursion 2: work done = r 2 Recursion 3: work done = r 3 work done = 0 Recursion k: work done = r k Recursion k+1: work done = r k+1 … Total Work: Sum of work done across all recursions 85

86 Phases of QuickSelect Termination … … Phase 1: calls when the array size [n, 0.75n] Phase 2: calls when the array size [0.75n, 0.75 2 n) Phase j-1: calls when the array size [0.75 j-1 n, 0.75 j n) Phase log 4/3 n 86

87 Expectd Runtime In Terms of Phases, where X j is the work done in phase j. Can we bound E[X j ]? 87

88 Bounding the Work Of Each Phase  Consider phase j  At each recursion during phase j, the work done ≤ (¾) j-1 n  Let Y j be the # recursive calls made during phase j Let’s try to bound the expected number of recursive calls during phase j. 88

89 Bounding the # Calls Of Each Phase e1e1 e2e2 ……e k/ 4 ……………e 3k/ 4 ……e k-1 ekek  Let’s say phase j starts with (¾) j-1 n ≤ k < (¾) j n  Guaranteed to exit the phase when we cut k by ¾!  Let e 1, e 2, …, e k be our elements in increasing order Observation: A phase is guaranteed to end when p, is between [e k/4, e 3k/4 ], irrespective of the rank of the item we’re searching for! if pivot is from here, phase is guaranteed to end 89

90 Bounding the Work Of Each Phase e1e1 e2e2 ……e k/ 4 ……………e 3k/ 4 ……e k-1 ekek Q: What’s the probability of picking p from [e k/4, e 3k/4 ]? A: 50% Expected # picks to pick a central pivot is ≤ 2. Therefore, expected # recursion to end phase j: 90

91 Final Calculations Q.E.D. 91

92 Summary: QuickSelect’s Runtime Analysis 1.Defined r.v. Z as the run-time of QuickSelect 2.Broke the executions into log 4/3 n phases, according to the sizes of the arrays in the recursions 3.Defined X j as the runtime during phase j 4.Expressed Z as 5.Bounded X j : Y j: *(3/4) j n, Y j is # recursions in phase j 6.Bounded E[Y j ] by 2 7.Using (5) and (6) bounded E[Z] by O(n). 92

93 Outline For Today 1.Correctness of Huffman Codes 2.QuickSelect 3.Probability Review 4.QuickSelect Runtime Analysis 5.QuickSort 93

94 Back To Sorting: QuickSort  Input: Given an array A of size n  Output: Elements of A in increasing order  Pick a pivot p from A uniformly at random,  Partition A into A L : those p  Sort A L and A R recursively  Output: [A L, p, A R ] 94

95 QuickSort Pseudocode procedure QuickSort(A): if |A| = 1 return A[0] pick a pivot element p uniformly at random; for i from 1 to n: if A[i] < p put A[i] into A L else put A[i] into A R return [QuickSort(A L ), p, QuickSort(A R )] 95

96 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 96

97 QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 97

98 QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 98

99 QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 99

100 QuickSort Simulation pivot 10 5 71313 81414 119191 41010 9898 1616 100

101 QuickSort Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 101

102 QuickSort Simulation pivot 710 5 71313 81414 119191 41010 9898 1616 102

103 QuickSort Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 103

104 QuickSort Simulation pivot 71313 10 5 71313 81414 119191 41010 9898 1616 104

105 QuickSort Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616

106 QuickSort Simulation pivot 71414 1313 10 5 71313 81414 119191 41010 9898 1616 106

107 QuickSort Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 107

108 QuickSort Simulation pivot 711414 1313 10 5 71313 81414 119191 41010 9898 1616 108

109 QuickSort Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 109

110 QuickSort Simulation pivot 711919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 110

111 QuickSort Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 111

112 QuickSort Simulation pivot 711 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 112

113 QuickSort Simulation pivot 7141 1919 1414 1313 10 5 71313 81414 119191 41010 9898 1616 113

114 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141 1919 1414 1313 10 5 114

115 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 115

116 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71410101 1919 1414 1313 10 5 116

117 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 117

118 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7149898 10101 1919 1414 1313 10 5 118

119 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 7141616 9898 10101 1919 1414 1313 10 5 119

120 QuickSort Simulation 10 5 71313 81414 119191 41010 9898 1616 pivot 71481616 9898 10101 1919 1414 1313 10 5 Recurse Total work done at a recursive call with m elements:, m-1 comparisons with the pivot + m copies = O(m). ** Run-time of QuickSort = O(# comparisons made)** 120

121 Analysis Roadmap 121 1.Let Z be the runtime of QuickSort, i.e. # comparisons made by Quicksort 2.Let X (i, j) be the # times (i, j) gets compared. 3.Express Z as: 4.Solve E[X (i, j) ] and sum them up to solve E[Z].

122 Counting # Comparisons Made By QuickSort 122  Let X (i, j) be the # times (i, j) gets compared.  Fix a particular recursive call: 10 5 71313 81414 119191 41010 9898 1616 71481616 9898 10101 1919 1414 1313 71313 81414 119191 41010 9898 1616 Observation 1: All comparisons are made against the pivot! Observation 2: The pivot will never be compared to anything else!

123 Counting # (i, j) Comparisons 123 10 5 71313 81414 119191 41010 9898 1616 71481616 9898 10101 1919 1414 1313 71313 81414 119191 41010 9898 1616 Condition for (i, j) comparison: i, j are compared only when 1.They are in the some recursive call together 2.One of them is a pivot And they can never be compared again. Therefore X (i, j) can only be 0 or 1 (i.e., is an indicator r.v.)

124 E[X (i,j) ] 124  Recall: Expected value of indicator R.V.: E[X] = Pr(X=1).  Question: What’s Pr(X (i,j) = 1)?

125 (i, j) Comparison Simulation (1) 125 e1e1 e2e2 ………eiei ……ejej ……e m-1 emem Recursive Call 1 …eiei ……ejej ……e m-1 emem No Comparison Recursive Call 2 pivot …eiei ……ejej … …eiei …ejej … No Comparison Recursive Call 3 Cannot be compared again! Total 0 comparisions! i and j are at different recursions

126 (i, j) Comparison Simulation (2) 126 e1e1 e2e2 ………eiei ……ejej ……e m-1 emem Recursive Call 1 No Comparison Recursive Call 2 pivot ……eiei ……ejej … No Comparison Recursive Call 3 1 (e i, e j ) comparison! (and only 1) e1e1 e2e2 ………eiei ……ejej … Observation: (e i, e j ) is compared only if e i or e j is the first pivot to be picked among [e i, e j ] block.

127 Pr(X (i,j) = 1) 127 Pr(X (i,j) = 1) = probability that e i or e j is the first element to be picked amongst [e i, e i+1, …, e j ]:

128 Final Calculations 128 For fixed i: 1/2 + 1/3 +... + 1/(n-i+1) ≤ ln(n) Q.E.D

129 129 On Monday: Min Cut and Max Cut


Download ppt "Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1."

Similar presentations


Ads by Google