Page 1KUT Graduate Course Data Compression Jun-Ki Min
Page 2KUT Data Compression Huge Data Large Processing Time Exact Result vs. Approximated Result Lossless Compression Lossy Compression
Page 3KUT Data Compression –Advantage Reduce Storage Requirements Data Transfer Performance –Disadvantage Process Overhead Loss of some subtle information
Page 4KUT General compression techniques –lossy compression DCT, Wavelet, Patricia Trie After compression, the original data representation can never be reconstructed. –lossless compression static : using fixed probability semi adaptive : using preliminary pass of the file to gather static adaptive : dynamically estimating the probability of each symbol
Page 5KUT Lossless Compression : Static scheme Dictionary Encoding –Assign an integer to each new world in the input Run Length Encoding(RLE) –Replace sequences of identical values by a count field, followed by an identifier for the repeated value –When the sequence has enough repeated values Differential Encoding(Delta Encoding) –Replace sequences with a code value which defines its relationship to a specific sequence –When data are of uniform size and tend to vary relatively slowly
Page 6KUT Lossless Compression: Example 1 Dictionary Encoding input: ABC ABC BC DDD Compressed Data: Dictionary: ABC =1, BC = 2, DDD=3 RLE input:AAABCCDD Compressed Data: Differential Encoding input: Johnson Jonah Jones Jorgenson Compressed Data: (0) Johnson (2)nah (3)es (2)rgenson
Page 7KUT Lossless Compression: Semi Adaptive Scheme Huffman Encoding –The most frequent characters are assigned shorter codes and the less frequent characters are assigned longer codes –Code Length = log 2 (1/symbol frequency) –Relatively easy to implements –Decompression process is very complex the length of each code to be read is not known until first few bits are interpreted –Frequency distribution for the set of input symbol must be known => maybe two scans.
Page 8KUT Lossless Compression: Example 2 Huffman Code –use a full binary tree 0 or two children –input: a 1 a 7 a 4 a 2 –Compressed Data: a6a6 a5a5 a1a1 a2a2 a4a4 a3a3 a7a7
Page 9KUT Lossless Compression: Adaptive Scheme LZ(Lempel-Ziv) Coding –Adaptive dictionary encoding –Converts variable-length strings into fixed-length codes –Requires only one pass of the original data –The original sequence must be sufficiently long for the procedure to build up enough symbol frequency experience to achieve good compression over the full ensemble
Page 10KUT Lossless Compression:Example 3 LZ –O(n 2 ) for a string of n symbols –new table entry is coded as (i,c) i : the codeword for the existing table entry(12 bit) c : the appended character(8bit) –Input: {A B AB AA ABA} –Compressed Data: {(0,A)(0,B)(1,B)(1,A)(3,A)}
Page 11KUT Arithmetic Encoding Intuition –Represents a message by an interval –Successive symbols of the message reduce the size of the interval in accordance with symbol probabilities –An massage is transformed into an variable sized bit string. the decoder needs some way of knowing when to stop. –sending the size of message –always attach the EOM symbol
Page 12KUT a6a6 a5a5 a1a1 a2a2 a4a4 a3a3 a7a7 [0.0 ~ 0.25) [0.25 ~ 0.45) [0.45 ~ 0.60) [0.60 ~ 0.72) [0.72 ~ 0.82) [0.82~ 0.92) [0.92 ~ 1.0) Arithmetic Encoding: Example input: a 1 a 7 a 4 a 2 Output: initial [.0 ~ 1.0) a 1 =>[.0~.25) a 7 =>[.0+(.25*0.92) ~.0+(.25*1.0)) => [.23 ~.25) a 4 => [.23+(.02*.6 ) ~.23+(.02*.72))=>[.242~.2444) a 2 => [.242+(.002*.25) ~.242+(.002*.45)) =>.2425 ~.2429 choose a binary value among [0.2425~ )
Page 13KUT Lossy Compression Wavelet- Based –Easy to Compute –A useful mathematical tool for hierarchically decomposing functions –Represent a function in terms of a coarse overall shape –Haar Wavelet The Haar basis is the simplest wavelet basis Fast to compute and easy to implement
Page 14KUT Haar Wavelet Given a one dimensional data [ ] –Recursive pairwise averagine and differencing at different resolutions –The wavelet transform of the original data is given by [ ] c0= (a1+a2+a3+a4)/4 c1 = (a1+a2-a3-a4)/4 c2 = (a1-a2)/2 c3 = (a3-a4)/2 Resolution AverageDetail 4[ ] 2[8 4][1, -1] 1[6][2]
Page 15KUT Reconstruction of Data : Use an Error Tree Example e.g.) d3 = c0-c1+c3 = 6-2+(-1) = 3
Page 16KUT Haar Wavelet Compression A large number of the detail coefficients turn out to be very small in magnitude Removing these small coefficients introduces small errors Lossy compression e.g.) For original data [ ] –haar wavelet coefficients is [ ] –Let us take two coefficients, 6, 2 only –Then, we have [ ] –Reconstructed Data = [ ]
Page 17KUT Energe Preservation and Tresholding Energy(X) = i=1 N x i 2 = i=1 N w i 2 –x i is original value and w i is wavelet coefficient L2-norm ( squared error) i=1 N ( x i –y i ) 2 = i=1 N (w i -w i ) 2 = i=k+1 N w i 2 since w i is 0 if i > k+1