Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Piyush Kumar (Lecture 3: Stable Marriage) Welcome to COT5405.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Discrete Structures & Algorithms Graphs and Trees: IV EECE 320.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 7: Greedy Algorithms II Shang-Hua Teng. Greedy algorithms A greedy algorithm always makes the choice that looks best at the moment –My everyday.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Data Structures – LECTURE 10 Huffman coding
Greedy Algorithms Huffman Coding
Lecture 7: Greedy Algorithms II
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
The Mathematics Of 1950’s Dating: Who wins the battle of the sexes? Adapted from a presentation by Stephen Rudich.
The Mathematics Of 1950’s Dating: Who wins the battle of the sexes? Presentation by Shuchi Chawla with some modifications.
Great Theoretical Ideas in Computer Science.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lecture 23: Stable Marriage ( Based on Lectures of Steven Rudich of CMU and Amit Sahai of Princeton) Shang-Hua Teng.
Stable Marriages (Lecture modified from Carnegie Mellon University course Great Theoretical Ideas in Computer Science)
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Great Theoretical Ideas in Computer Science for Some.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
18.1 CompSci 102 Today’s topics 1950s TV Dating1950s TV Dating.
Algorithms used by CDNs Stable Marriage Algorithm Consistent Hashing.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
The Mathematics Of 1950’s Dating: Who wins the battle of the sexes?
Proving the Correctness of Huffman’s Algorithm
The Mathematics Of 1950’s Dating: Who wins The Battle of The Sexes?
Chapter 9: Huffman Codes
Chapter 16: Greedy Algorithms
Huffman Coding.
The Mathematics Of 1950’s Dating: Who wins the battle of the sexes?
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 16: Greedy algorithms Ming-Te Chi
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Piyush Kumar (Lecture 3: Stable Marriage)
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Lecture 2: Greedy Algorithms II Shang-Hua Teng

Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization problem, we wish to find a solution to maximize the value In the minimization problem, we wish to find a solution to minimize the value

Suppose we have (1G) character data file that we wish to include in an . Suppose file only contains 26 letters {a,…,z}. Suppose each letter  in {a,…,z} occurs with frequency f . Suppose we encode each letter by a binary code If we use a fixed length code, we need 5 bits for each character The resulting message length is 5( f a + f b + … + f z ) Can we do better? Data Compression

Huffman Codes Most character code systems (ASCII, unicode) use fixed length encoding If frequency data is available and there is a wide variety of frequencies, variable length encoding can save 20% to 90% space Which characters should we assign shorter codes; which characters will have longer codes?

Data Compression: A Smaller Example Suppose the file only has 6 letters {a,b,c,d,e,f} with frequencies Fixed length 3G= bits Variable length Fixed length Variable length

How to decode? At first it is not obvious how decoding will happen, but this is possible if we use prefix codes

Prefix Codes No encoding of a character can be the prefix of the longer encoding of another character, for example, we could not encode t as 01 and x as since 01 is a prefix of By using a binary tree representation we will generate prefix codes provided all letters are leaves

Prefix codes A message can be decoded uniquely. Following the tree until it reaches to a leaf, and then repeat! Draw a few more tree and produce the codes!!!

Some Properties Prefix codes allow easy decoding –Given a: 0, b: 101, c: 100, d: 111, e: 1101, f: 1100 –Decode going left to right, 0| , a|0| , a|a|101|1101, a|a|b|1101, a|a|b|e An optimal code must be a full binary tree (a tree where every internal node has two children) For C leaves there are C-1 internal nodes The number of bits to encode a file is where f(c) is the freq of c, d T (c) is the tree depth of c, which corresponds to the code length of c

Optimal Prefix Coding Problem Input: Given a set of n letters (c 1,…, c n ) with frequencies (f 1,…, f n ). Construct a full binary tree T to define a prefix code that minimizes the average code length

Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach –The basic principle is that local optimal decisions may may be used to build an optimal solution –But the greedy approach may not always lead to an optimal solution overall for all problems –The key is knowing which problems will work with this approach and which will not We will study –The problem of generating Huffman codes

Greedy algorithms A greedy algorithm always makes the choice that looks best at the moment –My everyday examples: Driving in Los Angeles, or even Boston for that matter Playing cards Invest on stocks Choose a university –The hope: a locally optimal choice will lead to a globally optimal solution –For some problems, it works greedy algorithms tend to be easier to code

David Huffman’s idea A Term paper at MIT Build the tree (code) bottom-up in a greedy fashion Origami aficionado

Building the Encoding Tree

The Algorithm An appropriate data structure is a binary min-heap Rebuilding the heap is lg n and n-1 extractions are made, so the complexity is O( n lg n ) The encoding is NOT unique, other encoding may work just as well, but none will work better

Correctness of Huffman’s Algorithm Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree Lemma A:

Proof of Lemma A Without loss of generality, assume f[a]  f[b] and f[x]  f[y] The cost difference between T and T’ is B(T’’)  B(T), but T is optimal, B(T)  B(T’’)  B(T’’) = B(T) Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth

Correctness of Huffman’s Algorithm Observation: B(T) = B(T’) + f[x] + f[y]  B(T’) = B(T)-f[x]-f[y] –For each c  C – {x, y}  d T (c) = d T’ (c)  f[c]d T (c) = f[c]d T’ (c) –d T (x) = d T (y) = d T’ (z) + 1 –f[x]d T (x) + f[y]d T (y) = (f[x] + f[y])(d T’ (z) + 1) = f[z]d T’ (z) + (f[x] + f[y]) Lemma B:

B(T’) = B(T)-f[x]-f[y] B(T) = 45*1+12*3+13*3+5*4+9*4+16*3 z:14 B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3 = B(T)

Proof of Lemma B Prove by contradiction. Suppose that T does not represent an optimal prefix code for C. Then there exists a tree T’’ such that B(T’’) < B(T). Without loss of generality, by Lemma A, T’’ has x and y as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’) –T’’’ is better than T’  contradiction to the assumption that T’ is an optimal prefix code for C’

How Did I learn about Huffman code? I was taking Information Theory Class at USC from Professor Irving Reed (Reed-Solomon code) I was TAing for CSCI 303 I taught a lecture on “Huffman Code“ for Professor Miller I wrote a paper

Stable Matching (Marriage) Problem -- Continue Shang-Hua Teng

3,5,2,1,4 5,2,1,4,3 4,3,5,1,2 1,2,3,4,5 2,3,4,1,5 Univ. 1 3,2,5,1,4 1,2,5,3,4 4,3,2,1,5 1,3,4,2,5 1,2,4,5,3 Univ. 2 Univ. 3 Univ. 4 Univ. 5 Appl. 1 Appl. 2 Appl. 3 Appl. 4 Appl. 5

Stability: “Mutually Cheating Hearts” Suppose we pair off all the university and candidate. Now suppose that some university and some candidate prefer each other to what they matched to. They will be called a pair of MCHs or blocking pair

Gale-Shapley Algorithm For each day each university still has an opening does the following: –Morning Make an offer to the best candidate to whom it has not yet offered to –Afternoon (for those candidates with at least one offer) “Maybe, but for now I will keep your offer”To today’s best suitor: “Maybe, but for now I will keep your offer” “Not me, but good luck”To any others: “Not me, but good luck” –Evening Each university is ready to offer to the next candidate on its list The day WILL COME that no university is rejected Each candidate accepts the last university to whom she/he said “maybe”

Improvement Lemma: If a candidate has an offer, then she/he will always have an offer from now on. Corollary: Each candidate will accept her/his absolute best offer received during the process. [Advantage to candidate?] Corollary: No University can be rejected by all the candidates Corollary: A matching is achieved by Gale-Shapley

Theorem: The pairing produced by Gale- Shapley is stable. –This means USC prefers Adleman more than the person it matched to, say, Lee. –Thus, USC offered to Adleman before it offered to Lee. –Adleman must have rejected USC for another university he preferred. –By the Improvement lemma, he must like his current university, saying, UCLA more than USC. –Proof by contradiction: Suppose USC and Adleman are a pair of MCHs. Contradiction!

Each candidate will accept her/his absolute best offer received during the process. [Power to reject: Advantage to candidate?] But university has the power to determine when and whom to make an offer to [Power to choose: Advantage to University]

Opinion Poll Who is better off in Gale-Shapley Algorithm, the univeristies or the candidates?

How should we define what we mean when we say “the optimal candidate for USC”? Flawed Attempt: “The candidate at the top of USC’s list”

The Optimal Match A university’s optimal match is the highest ranked candidate for whom there is some stable pairing in which they are matched The candidate is the best candidate it can conceivably be matched in a stable world. Presumably, that candidate might be better than the candidate it gets matched to in the stable pairing output by Gale-Shapley.

The Pessimal Match A university’s pessimal match is the lowest ranked candidate for whom there is some stable pairing in which the university is matched to. That candidate is the least ranked candidate the university can conceivably get to be matched to in a stable world.

Dilemmas: power to reject or power to choose A pairing is university-optimal if every university gets its optimal candidate. This is the best of all possible stable worlds for every university simultaneously. A pairing is university-pessimal if every university gets its pessimal candidate. This is the worst of all possible stable worlds for every university simultaneously.

A pairing is candidate-optimal if every candidate gets her/his optimal job. This is the best of all possible stable worlds for every candidate simultaneously. A pairing is candidate-pessimal if every candidate gets her/his pessimal job. This is the worst of all possible stable worlds for every candidate simultaneously. Dilemmas: power to reject or power to choose

The Mathematical Truth is! The Gale-Shapley Algorithm always produces a university-optimal, and candidate-pessimal pairing.

Theorem: Gale-Shapley produces a university-optimal pairing –Suppose not: i.e. that some univ. gets rejected by it optimal match during GS. –In particular, let’s say UCLA is the first univ to be rejected by its optimal match Adleman: Let’s say Adleman said “maybe” to USC, whom he prefers. –Since UCLA was the only univ. to be rejected by its optimal match so far, USC must like Adleman at least as much as USC’s optimal match.

We are assuming that Adleman is UCLA’s optimal match: Adleman likes USC more than UCLA. USC likes Adleman at least as much as its optimal match. We now show that any pairing S in which UCLA hires Adleman cannot be stable (for a contradiction). Suppose S is stable: –USC likes Adleman more than his partner in S USC likes Adleman at least as much as USC’s best match, but USC is not matched to Adleman in S –Adleman likes USC more than his university UCLA in S Contradiction!

We’ve shown that any pairing in which UCLA hires Adleman cannot be stable. –Thus, Adleman cannot be UCLA’s optimal match (since he can never work there in a stable world). –So UCLA never gets rejected by its optimal match in the Gale-Shapley, and thus the Gale- Shapley is university-optimal.

Theorem:The Gale-Shapley pairing is candidate-pessimal.