exercise in the previous class

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Applied Algorithmics - week7
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Variable-Length Codes: Huffman Codes
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Data Compression Basics & Huffman Coding
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Dr.-Ing. Khaled Shawky Hassan
Huffman Codes Information coding: –Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. –Why? These electric signals.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Foundation of Computing Systems
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
EE465: Introduction to Digital Image Processing
Assignment 6: Huffman Code Generation
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

exercise in the previous class sunny rain 45 15 12 28 Y1 X sunny rain 43 57 Y2 Q1: Compute P(X=Y1) and P(X=Y2): P(X=Y1) = 0.73 and P(X=Y2) = 0 Q2:Compute I(X; Y1) and I(X; Y2): H(X) =ℋ 0.57 =0.986 bit to compute H(X|Y1), determine some probabilities; X sunny rain 0.75 0.25 0.30 0.70 Y1 P(X|Y1) P(Y1=sunny)= 0.6, P(Y1=rain)=0.4 H(X|Y1=sunny) = ℋ 0.75 =0.811 H(X|Y1=rain) = ℋ 0.30 =0.881 H(X|Y1) = 0.6×0.811+0.4×0.881=0.839 I(X; Y1) = H(X) – H(X|Y1) = 0.986 – 0.839= 0.147 bit

exercise in the previous class (cnt’d) sunny rain 45 15 12 28 Y1 X sunny rain 43 57 Y2 Q2:Compute I(X; Y1) and I(X; Y2): H(X) =ℋ 0.57 =0.986 bit to compute H(X|Y2), determine some probabilities; X sunny rain 1 Y2 P(X|Y2) P(Y2=sunny)= 0.43, P(Y2=rain)=0.57 H(X|Y2=sunny) = ℋ 0 =0 H(X|Y2=rain) = ℋ 1 =0 H(X|Y2) = 0.43×0+0.57×0=0 I(X; Y2) = H(X) – H(X|Y2) = 0.986 – 0= 0.986 bit Q3: Which is the better forecasting? Y2 gives more information.

chapter 2: compact representation of information

the purpose of chapter 2 We learn how to encode symbols from information source. source coding data compression the purpose of source encoding: to give representations which are good for communication to discard (捨てる) redundancy (冗長性) We want a source coding scheme which gives ... as precise (正確) encoding as possible as compact encoding as possible source encoder 0101101

plan of the chapter basic properties needed for source coding uniquely decodable immediately decodable Huffman code construction of Huffman code extensions of Huffman code theoretical limit of the “compression” related topics today

words and terms Meanwhile, we consider symbol-by-symbol encodings only. M...the set of symbols generated by an information source. For each symbol in M, associate a sequence (系列) over {0, 1}. codewords (符号語): sequences associated to symbols in M code (符号): the set of codewords alphabet: {0, 1} in this case... binary code M sunny cloudy rainy C 00 010 101 three codewords; 00, 010 and 101 code C = {00, 010 and 101} 011 is NOT a codeword, for example

encoding and decoding encode ... to determine the codeword for a given symbol decode ... to determine the symbol for a given codeword sunny cloudy rainy 00 010 101 encode decode encode = 符号化 decode = 復号(化) NO separation symbols between codewords; 010 00 101 101 ... NG, 01000101101 ... OK Why? {0, 1 and “space”} ... the alphabet have three symbols, not two

uniquely decodable codes A code must be uniquely decodable (一意復号可能). Different symbol sequences are encoded to different 0-1 sequences. uniquely decodable  codewords are all different, but the converse (逆) does not hold in general. a1 a2 a3 a4 C1 00 10 01 11 C2 011 111 C3 C4 with the code C3... a1 a3 a1 a4 a2 0110 yes yes no no

more than uniqueness a1 a2 a3 a4 C1 00 10 01 11 C2 011 111 011 111 consider a scenario of using C2... a1, a4, a4, a1 is encoded to 01111110. The 0-1 sequence is transmitted by 1 bit/sec. When does the receiver find that the first symbol is a1? seven seconds later, the receiver obtains 0111111: if 0 comes next, then 0 - 111 - 111 - 0  a1, a4, a4, a1 if 1 comes next, then 01 - 111 - 111  a2, a4, a4 We cannot finalize the first symbol even in the seven seconds later.  buffer to save data, latency (遅延) of decoding...

immediately decodable codes A code must be uniquely decodable, and if possible, it should be immediately decodable (瞬時復号可能). Decoding is possible without looking ahead the sequence. If you find a codeword pattern, then decode it immediately.  important property from an engineering viewpoint. formally writing... If 𝒔∈ 0,1 ∗ is written as 𝑐 1 𝒔 𝟏 =𝒔 with 𝑐 1 ∈𝐶 and 𝒔 1 ∈ 0,1 ∗ , then there is no 𝑐 2 ∈𝐶 and 𝒔 2 ∈ 0,1 ∗ such that 𝑐 2 𝒔 𝟐 =𝒔. c1 s1 c2 s2 ≠

prefix condition If a code is NOT immediately decodable, then there is 𝒔∈ 0,1 ∗ such that 𝒔= 𝑐 1 𝒔 𝟏 = 𝑐 2 𝒔 𝟐 with different 𝑐 1 and 𝑐 2 . c1 s1 c2 s2 = the codeword 𝑐 1 is a prefix (語頭) of 𝑐 2 (c1 is the same as the beginning part of c2) Lemma: A code C is immediately decodable if and only if no codeword in C is a prefix of other codewords. (prefix condition, 語頭条件) a1 a2 a3 a4 C2 01 011 111 “0” is a prefix of “01” and “011” “01” is a prefix of “011”

break: prefix condition and user interface The prefix condition is important in engineering design. bad example: strokes for character writing on Palm PDA graffiti (ver. 2) some needs of two strokes prefix condition violated “– –”, and “=”, “– 1” and “+” graffiti (ver. 1) basically one stroke only

how to achieve the prefix condition easy ways to construct codes with the prefix condition: let all codewords have the same length put a special pattern at the end of each codeword C = {011, 1011, 01011, 10011} ... “comma code” ... too straightforward select codewords by using a tree structure (code tree) for binary codes, we use binary trees for k-ary codes, we use trees with degree k a code tree with degree 3

construction of codes (k-ary case) how to construct a k-ary code with M codewords construct a k-ary tree T with M leaf nodes for each branch (枝) of T, assign a label in {0, ..., k – 1} sibling (兄弟) branches cannot have the same label for each of leaf nodes of T, traverse T from the root to the leaf, with concatenating (連接する) labels on branches  the obtained sequence is the codeword of the node

example construct a binary code with four codewords Step 1 Step 2 1 1 00 01 10 11 the constructed code is {00, 01, 10, 11}

example (cnt’d) other constructions; we can choose different trees, different labeling... 1 1 C1={0, 10, 110, 111} C2={0, 11, 101, 100} 1 C3={01, 000, 1011, 1010} The prefix condition is always guaranteed.  Immediately decodable codes are constructed.

the “best” among immediately decodable codes 1 1 C1={0, 10, 110, 111} C3={01, 000, 1011, 1010} C1 seems to give more compact representation than C3. codeword length = [1, 2, 3, 3] Can we construct more compact immediately decodable codes? codeword length = [1, 1, 1, 1]? codeword length = [1, 2, 2, 3]? codeword length = [2, 2, 2, 3]? ? What is the criteria (基準) ?

Kraft’s inequality Theorem: A) If a k-ary code {c1, ..., cM} with |ci| = li is immediately decodable, then 𝑘 − 𝑙 1 +…+ 𝑘 − 𝑙 𝑀 ≤1 (Kraft’s inequality) holds. B) If 𝑘 − 𝑙 1 +…+ 𝑘 − 𝑙 𝑀 ≤1, then we can construct a k-ary immediately decodable code {c1, ..., cM} with |ci| = li. proof omitted in this class ... use results of graph theory [trivia] The result is given in the Master’s thesis of L. Kraft.

back to the examples Can we construct more compact immediately decodable codes? codeword length = [1, 2, 2, 3]? … 2 −1 + 2 −2 + 2 −2 + 2 −3 = 9 8 >1 We cannot construct an immediately decodable code. codeword length = [2, 2, 2, 3]? … 2 −2 + 2 −2 + 2 −2 + 2 −3 = 7 8 <1 We can construct an immediately decodable code, by simply constructing a code tree....

to the next step basic properties needed for source coding uniquely decodable immediately decodable Huffman code construction of Huffman code extensions of Huffman code theoretical limit of the “compression” related topics today

the measure of efficiency We want to construct a good source coding scheme. easy to use ... immediately decodable efficient ... what is the efficiency? We try to minimize... the expected length of a codeword for representing one symbol: 𝑖=1 𝑀 𝑝 𝑖 𝑙 𝑖 symbol a1 a2 : aM probability p1 p2 pM codeword c1 c2 cM length l1 l2 lM average codeword length

computing the average codeword length symbol a1 a2 a3 a4 probability 0.4 0.3 0.2 0.1 C1 10 110 111 C2 C3 00 01 11 C1: 0.4×1+ 0.3×2+ 0.2×3+ 0.1×3 = 1.9 C2: 0.4×3+ 0.3×3+ 0.2×2+ 0.1×1 = 2.6 C3: 0.4×2+ 0.3×2+ 0.2×2+ 0.1×2 = 2.0 It is expected that... C1 gives the most compact representation in typical cases.

Huffman code Huffman algorithm gives a clever way to construct a code with small average codeword length. prepare isolated M nodes, each attached with a probability of a symbol (node = size-one tree) repeat the following operation until all trees are joined to one select two trees T1 and T2 having the smallest probabilities join T1 and T2 by introducing a new parent node the sum of probabilities of T1 and T2 is given to the new tree David Huffman 1925-1999

“merger of small companies” example “merger of small companies” 0.05 D 0.1 C 0.25 B 0.6 A 0.05 D 0.1 C 0.15 0.25 B 0.6 A 0.05 D 0.1 C 0.15 0.25 B 0.4 0.6 A 0.05 D 0.1 C 0.15 0.25 B 0.4 0.6 A 1.0 1

exercise A B C D E prob. 0.2 0.1 0.3 codewords compare the average length with the equal-length codes...

exercise A B C D E F prob. 0.3 0.2 0.1 codewords compare the average length with the equal-length codes...

different construction, same efficiency We may have multiple options on the code construction: several nodes have the same small probabilities labels can be assigned differently to branches Different option results in a different Huffman code, but... the average length does not depend on the chosen option. 0.4 a1 0.2 a2 a3 0.1 a4 a5 0.4 a1 0.2 a2 a3 0.1 a4 a5

summary of today’s class basic properties needed for source coding uniquely decodable immediately decodable Huffman code construction of Huffman code extensions of Huffman code theoretical limit of the “compression” related topics today

exercise Construct a binary Huffman code for the information source given in the table. Compute the average codeword length of the constructed code. Can you construct a 4-ary Huffman code for the source? A B C D E F G H prob. 0.363 0.174 0.143 0.098 0.087 0.069 0.045 0.021