Presentation is loading. Please wait.

Presentation is loading. Please wait.

Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.

Similar presentations


Presentation on theme: "Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character."— Presentation transcript:

1 Huffman Encodings Section 9.4

2 Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character Typical representation: contiguous memory The bit sequence representing characters is called the encoding number of bit sequences of length n?2n2n Number of bits to represent Σ?  log 2 | Σ|)  Data Compression problem: “Given an string w over Σ, store it using as few bits as possible in such a way that it can be recovered at will”

3 Motivation for the Solution For representing strings we want to take advantage of the fact that not all characters occurs with the same frequency Example: FitalyStamp: “Do you have what it takes to type 50 words per minute in your palm organizer? If only a subset S of Σ is actually used in w, we could represent the strings in log 2 (|S|) Problems: We need to know S in advance It doesn’t account for ranking of occurrences Improvement only if  log 2 | Σ|)  <  log2 | S|) 

4 Encoding Trees Idea: use different lengths to encode members of Σ Potential problem: E: 101 T: 110 Q: 101110 Solution: No encoding of a character can be the prefix to the encoding of other character Suppose that: I: 0000, V: 0001, M: 0010, U: 0011, D: 010, H:0110, N: 0111, A: 10, ٱ: 110, F: 111 Question: how do we represent these codes in a binary tree?

5 Encoding Trees I VMU D HN A F Encoding trees can always be assumed to be full!

6 Decoding with Encoding Trees AIDA FAN: 10000001101011011111001111 Procedure TreeDecode(pointer T, bitstream b) How to generate encoding trees? P  T while not Empty(b) do if NextBit(b) = 0 then P  LC(P) else P  RC(P) if isLeaf(P) then print(value(P)) P  T

7 Constructing Encoding Trees Example: f(A) = 0.35, B = 0.1, C = 0.2, D = 0.2, E = 0.15 Many possible trees (combinatorial number). We like the one that has minimum cost COST = Weighted path Length of T, WPL(T): WPL(T) = Σ n in L(T) Depth T (n)*C(n) Notation: L(T) is the set of all leaves in T c(n) is the cost or weight of node n Idea 0: use exhaustive search to find the tree with minimum cost

8 Idea 1: Huffman Encoding Tree For each character c we now the frequency f c with which c occurs in w Construction method: Create one node for each character c in Σ with weight f c (each of these nodes will be a leaf in the tree) Repeat the following steps: 1.Pick two nodes n1 and n2 with smallest weight and without parent 2.Create a new parent node for n1 and n2 with weight: weight(n1) + weight(n2) Eventually only two nodes remain, the parent node is created and the loop ends

9 Properties of Huffman Encoding Trees Characters with higher frequency are placed nearer the root, thus They have shorter encoding! Is the Huffman method for generating the encoding trees greedy? Yes! Theorem. Let N be a set of nodes and C(n) the weight of each node n in N. Let T be a Huffman tree encoding for N. If X is any other tree encoding for N, then WPL(T) ≤ WPL(X)

10 Compression Ratio Compression ratio (CR): “  log 2 | Σ|)  is to 100 as (  log 2 | Σ|)  − WPL(T) ) is to the CR” Huffman compression ratio falls between 20% and 80%


Download ppt "Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character."

Similar presentations


Ads by Google