Download presentation
Presentation is loading. Please wait.
1
Department of Computer Engineering University of California at Santa Cruz Data Compression (1) Hai Tao
2
Department of Computer Engineering University of California at Santa Cruz Data Compression – Why ? n Storing or transmitting multimedia data requires large space or bandwidth The size of one hour 44K sample/sec 16-bit stereo (two channels) audio is 3600x44000x2x2= 633.6MB, which can be recorded on one CD (650 MB). MP3 compression can reduce this number by factor of 10 The size of a 500x500 color image is 750KB without compression (JPEG can reduced this by a factor of 10 to 20) The size of one minute real-time, full size, color video clip is 60x30x640x480x3= 1.659GB. A two-hour movie requires 200GB. MPEG2 compression can bring this number down to 4.7 GB (DVD)
3
Department of Computer Engineering University of California at Santa Cruz Compression methods Entropy Coding Run-length Coding Huffman Coding Arithmetic Coding Prediction DPCM DM Transformation FFT Source CodingDCT Layered Coding Bit Position Sub-sampling Sub-band Coding Vector Quantization Hybrid Coding JPEG MPEG H.261 DV1 RTV, DV1 PLV
4
Department of Computer Engineering University of California at Santa Cruz Run-length coding n Example: A scanline of a binary image is 00000 00000 00000 00000 00010 00000 00000 01000 00000 00000 Total of 50 bits However, strings of consecutive 0’s or 1’s can be represented more efficiently 0(23) 1(1) 0(12) 1(1) 0(13) If the counts can be represented using 5 bits, then we can reduce the amount of data to 5+5*5=30 bits. A compression ratio of 40%
5
Department of Computer Engineering University of California at Santa Cruz Huffman coding n Example: 4 letters in language “A” “B” “S” “Z” To uniquely encode each letter, we need two bits A- 00 B-01 S-10 Z – 11 A message “AAABSAAAAZ” is encoded with 20 bits Now how about assign A- 0 B-100 S-101 Z – 11 The same message can be encoded using 15 bits The basic idea behind Huffman coding algorithm is to assign shorter codewords to more frequently used symbols
6
Department of Computer Engineering University of California at Santa Cruz Huffman coding – Problem statement n Given a set of N symbols S={s i, i=1,…N} with probabilities of occurrence P i, i=1,…N, find the optimal encoding of the the symbol to achieve the minimum transmission rate (bits/symbol) n Example: Five symbols, A,B,C,D,E with probabilities of P(A)=0.16, P(B)=0.51 P(C)=0.09 P(D)=0.13 P(E)=0.11 Without Huffman coding, 3 bits are needed for each symbol
7
Department of Computer Engineering University of California at Santa Cruz Huffman Coding - Algorithm n Algorithm Each symbol is a leave node in a tree Combining the two symbols or composite symbols with the least probabilities to form a new parent composite symbols, which has the combined probabilities. Assign a bit 0 and 1 to the two links Continue this process till all symbols merged into one root node. For each symbol, the sequence of the 0s and 1s from the root node to the symbol is the code word n Example
8
Department of Computer Engineering University of California at Santa Cruz Huffman Coding - Example n Step 1 n Step 2 n Step 3 P(C)=0.09)P(E)=0.11) P(CE)=0.20) P(D)=0.13)P(A)=0.16) P(AD)=0.29) 1 0 1 0 P(C)=0.09)P(E)=0.11) P(CE)=0.20) P(D)=0.13)P(A)=0.16) P(AD)=0.29) 1 0 1 0 P(ACDE)=0.49) 1 0
9
Department of Computer Engineering University of California at Santa Cruz Huffman Coding - Example n Step 4 n Step 5 A=000, B=1, C=011, D=001, E=010 Expected bits/symbol is 3*(0.16+0.09+0.13+0.11)+1*0.51=3*0.49+1*0.51=1.98bit/symbol Compression ratio is 1.02/3=34% P(C)=0.09)P(E)=0.11) P(CE)=0.20) P(D)=0.13)P(A)=0.16) P(AD)=0.29) 1 0 1 0 P(ACDE)=0.49) 1 0 P(ABCDE)=1) P(B)=0.51) 1 0
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.