Download presentation
Presentation is loading. Please wait.
1
DL - 2004Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression
2
DL - 2004Compression – Beeri/Feitelson2 General: Compression methods depend on data characteristic there is no universal (best) method Requirements : text, EL’s: lossless images – may be lossy efficiency -- how may bits per byte of data? (often in percentage) coding should be fast, decoding superfast
3
DL - 2004Compression – Beeri/Feitelson3 Compression vs. communications: Minor difference: Communication is always on-line, Compression is on/off line (off-line: complete file given) source destination fileline noisenoise
4
DL - 2004Compression – Beeri/Feitelson4 A general model for statistics-based compression: Same model must be used at both sides Model is (often) stored in compressed file – its size affects compression efficiency Model coder Model decoder
5
DL - 2004Compression – Beeri/Feitelson5 Appetizer: Huffman coding (standard) binary coding: Uniquely decodable Model = table efficiency: bits/symbol (no/little compression) Can do better if symbol frequencies are known: frequent symbol – short code rare symbol – long code Minimizes the average
6
DL - 2004Compression – Beeri/Feitelson6 Assume: Huffman’s Algorithm (eager construction of code tree): Allocate a node for each symbol, weight = symbol probability Enter nodes into priority queue Q (small weights first) While |Q|>1 { –Remove two first nodes (smallest weights) –Create new node, make it their parent, assign it the sum of their weights –Enter new node into Q } Return: single node in Q (root of tree)
7
DL - 2004Compression – Beeri/Feitelson7 Example: Q: { } 1/4 1/8 1/41/2 1/8 1/4 1/2 1 1/4 1
8
DL - 2004Compression – Beeri/Feitelson8 How are the trees used? Coding: for each symbol s, output binary path from root to leaf(s) Decoding: read incoming stream of bits, follow path from root of tree. When leaf(s) reached, output s, and return to root. Common model (stored on both sides) : the tree
9
DL - 2004Compression – Beeri/Feitelson9 Expected cost bits/symbol: Binary: Huffman : In example: binary: 2 Huffman : 1/2x1 + ¼x2 + 1/8x3 + 1/8x3 = 1.75 Q: what would be the tree and cost for: 5/12, 1/3, 1/6, 1/12 ?
10
DL - 2004Compression – Beeri/Feitelson10 A note on Huffman trees: The algorithm is non-deterministic: In each step, either node can be the left child of new parent If two children of a node are exchanged, result is also a Huffman tree Closure under rotation w.r.t nodes Consider 0.4, 0.2, 0.2, 0.1, 0.1 after 1 st step, 2 out of 3 nodes are selected There are many Huffman trees for a given probability distribution
11
DL - 2004Compression – Beeri/Feitelson11 Concepts: variable length code: (e.g. Huffman) uniquely decodable code: each legal code sequence is generated by a unique source sequence instantaneous/prefix code מיידי end of code of each symbol can be recognized Examples: 0, 010, 01, 10 10, 00, 11, 110 0, 10, 110, 111 (Huffman of example) (comma code) 0, 01, 011, 111 (inverted comma code)
12
DL - 2004Compression – Beeri/Feitelson12 A prefix code = binary tree Every binary tree with q leaves is a prefix code for q symbols, lengths of code words = lengths of paths Kraft inequality: Exists a q-leaf tree with path lengths iff =1 iff tree is complete
13
DL - 2004Compression – Beeri/Feitelson13 Proof : assume exists a tree T Take T’ to be the full tree of depth The number of its leaves: A leaf of T, at distance from root has leaves of T’ under it Sum on all leaves of T: Full: all paths same length T
14
DL - 2004Compression – Beeri/Feitelson14 If T is not complete (every node has 0/2 children) it has a node with a single child Can be “shortened” new tree still satisfies hence given tree must satisfy Only complete trees have equality Comment: In general a prefix code that is not a complete tree is dominated by a tree with smaller cost From now: tree are complete
15
DL - 2004Compression – Beeri/Feitelson15 : Assume Lemma: if Replace these two by their sum (hence q-1 lengths) and use induction Assume must the tree be complete?
16
DL - 2004Compression – Beeri/Feitelson16 MacMillan Theorem : exists a uniquely decodable code with lengths iff Corollary: when there is a uniquely decodeable code, there is also a prefix code (same cost) No need to think about the first class Uniquely decodable prefix
17
DL - 2004Compression – Beeri/Feitelson17 On optimality of Huffman: Cost of a tree/code T: L(T) = Claim: if a tree T does not satisfy then it is dominated by a tree with smaller cost Claim: for any T, Proof: can assume T satisfies (*) Use induction: Q=2: both trees have lengths 1,1
18
DL - 2004Compression – Beeri/Feitelson18 Q>1: In Huffman tree, there are two maximal paths that end in sibling nodes In T, the paths for last two symbols are longest (by (*)) but their ends may not be siblings But, T is complete, hence the leaf with has a sibling with same length; exchange with the leaf corresponding to Now, in both trees, these two longest paths can be replaced by their parents Case of q-1 (induction hypothesis)
19
DL - 2004Compression – Beeri/Feitelson19 Summary: Huffman trees are optimal hence satisfy (*) Any two Huffman trees have equal costs Huffman trees have min cost among all trees (codes)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.