Download presentation
Presentation is loading. Please wait.
Published byIrene Price Modified over 9 years ago
1
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004
2
2
3
3 Introduction Compression is used to reduce the volume of information to be stored into storages or to reduce the communication bandwidth required for its transmission over the networks
4
4
5
5 Compression Principles Entropy Encoding Run-length encoding Lossless & Independent of the type of source information Used when the source information comprises long substrings of the same character or binary digit (string or bit pattern, # of occurrences), as FAX e.g) 000000011111111110000011…… 0,7 1, 10, 0,5 1,2…… 7,10,5,2……
6
6 Compression Principles Entropy Encoding Statistical encoding Based on the probability of occurrence of a pattern The more probable, the shorter codeword “Prefix property”: a shorter codeword must not form the start of a longer codeword
7
7 Compression Principles Huffman Encoding Entropy, H: theoretical min. avg. # of bits that are required to transmit a particular stream H = -Σ i=1 n P i log 2 P i where n: # of symbols, P i : probability of symbol i Efficiency, E = H/H’ where, H’ = avr. # of bits per codeword = Σ i=1 n N i P i N i : # of bits of symbol i
8
8 E.g) symbols M(10), F(11), Y(010), N(011), 0(000), 1(001) with probabilities 0.25, 0.25, 0.125, 0.125, 0.125, 0.125 H’ = Σ i=1 6 N i P i = (2(2 0.25) + 4(3 0.125)) = 2.5 bits/codeword H = -Σ i=1 6 P i log 2 P i = - (2(0.25log 2 0.25) + 4(0.125log 2 0.125)) = 2.5 E = H/H’ =100 % 3-bit/codeword if we use fixed-length codewords for six symbols
9
9 Huffman Algorithm Method of construction for an encoding tree Full Binary Tree Representation Each edge of the tree has a value, (0 is the left child, 1 is the right child) Data is at the leaves, not internal nodes Result: encoding tree “Variable-Length Encoding”
10
10 Huffman Algorithm 1. Maintain a forest of trees 2. Weight of tree = sum frequency of leaves 3. For 0 to N-1 –Select two smallest weight trees –Form a new tree
11
11 Huffman coding variable length code whose length is inversely proportional to that character’s frequency must satisfy nonprefix property to be uniquely decodable two pass algorithm –first pass accumulates the character frequency and generate codebook –second pass does compression with the codebook
12
12 create codes by constructing a binary tree 1. consider all characters as free nodes 2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their frequencies 3. remove the two free nodes and add the newly created parent node to the list of free nodes 4. repeat step2 and 3 until there is one free node left. It becomes the root of tree Huffman coding
13
13 Right of binary tree :1 Left of Binary tree :0 Prefix (example) –e:”01”, b: “010” –“01” is prefix of “010” ==> “e0” same frequency : need consistency of left or right
14
14 Example(64 data) RKKKKKKK KKKRRKKK KKRRRR GG KKBCCCRR GGGMCBRR BBBMYBBR GGGGGGGR GRRRRGRR
15
15 Color frequency Huffman code ================================= R1900 K1701 G1410 B7110 C41110 M211110 Y111111
16
16
17
17 Static Huffman Coding Huffman (Code) Tree Given : a number of symbols (or characters) and their relative probabilities in prior Must hold “prefix property” among codes Symbol Occurrence A 4/8 B 2/8 C 1/8 D 1/8 Symbol Code A 1 B 01 C 001 D 000 4 1 + 2 2 + 1 3 + 1 3 = 14 bits are required to transmit “AAAABBCD” 0 1 D A B C 8 4 2 Leaf node Root node Branch node Prefix Property !
18
18 The end
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.