Page 1KUT Graduate Course Data Compression Jun-Ki Min.

KUT Graduate Course Data Compression Jun-Ki Min

KUT Data Compression Huge Data  Large Processing Time Exact Result vs. Approximated Result Lossless Compression Lossy Compression

KUT Data Compression –Advantage Reduce Storage Requirements Data Transfer Performance –Disadvantage Process Overhead Loss of some subtle information

KUT General compression techniques –lossy compression DCT, Wavelet, Patricia Trie After compression, the original data representation can never be reconstructed. –lossless compression static : using fixed probability semi adaptive : using preliminary pass of the file to gather static adaptive : dynamically estimating the probability of each symbol

KUT Lossless Compression : Static scheme Dictionary Encoding –Assign an integer to each new world in the input Run Length Encoding(RLE) –Replace sequences of identical values by a count field, followed by an identifier for the repeated value –When the sequence has enough repeated values Differential Encoding(Delta Encoding) –Replace sequences with a code value which defines its relationship to a specific sequence –When data are of uniform size and tend to vary relatively slowly

KUT Lossless Compression: Example 1 Dictionary Encoding input: ABC ABC BC DDD Compressed Data: 1 1 2 3 Dictionary: ABC =1, BC = 2, DDD=3 RLE input:AAABCCDD Compressed Data: 13 21 32 42 Differential Encoding input: Johnson Jonah Jones Jorgenson Compressed Data: (0) Johnson (2)nah (3)es (2)rgenson

KUT Lossless Compression: Semi Adaptive Scheme Huffman Encoding –The most frequent characters are assigned shorter codes and the less frequent characters are assigned longer codes –Code Length = log 2 (1/symbol frequency) –Relatively easy to implements –Decompression process is very complex the length of each code to be read is not known until first few bits are interpreted –Frequency distribution for the set of input symbol must be known => maybe two scans.

KUT Lossless Compression: Example 2 Huffman Code –use a full binary tree 0 or two children –input: a 1 a 7 a 4 a 2 –Compressed Data: 01 0001 100 11 a6a6 a5a5 a1a1 a2a2 a4a4 a3a3 a7a7

KUT Lossless Compression: Adaptive Scheme LZ(Lempel-Ziv) Coding –Adaptive dictionary encoding –Converts variable-length strings into fixed-length codes –Requires only one pass of the original data –The original sequence must be sufficiently long for the procedure to build up enough symbol frequency experience to achieve good compression over the full ensemble

KUT Lossless Compression:Example 3 LZ –O(n 2 ) for a string of n symbols –new table entry is coded as (i,c) i : the codeword for the existing table entry(12 bit) c : the appended character(8bit) –Input: {A B AB AA ABA} –Compressed Data: {(0,A)(0,B)(1,B)(1,A)(3,A)}

KUT Arithmetic Encoding Intuition –Represents a message by an interval –Successive symbols of the message reduce the size of the interval in accordance with symbol probabilities –An massage is transformed into an variable sized bit string. the decoder needs some way of knowing when to stop. –sending the size of message –always attach the EOM symbol

KUT a6a6 a5a5 a1a1 a2a2 a4a4 a3a3 a7a7 [0.0 ~ 0.25) [0.25 ~ 0.45) [0.45 ~ 0.60) [0.60 ~ 0.72) [0.72 ~ 0.82) [0.82~ 0.92) [0.92 ~ 1.0) Arithmetic Encoding: Example input: a 1 a 7 a 4 a 2 Output: initial [.0 ~ 1.0) a 1 =>[.0~.25) a 7 =>[.0+(.25*0.92) ~.0+(.25*1.0)) => [.23 ~.25) a 4 => [.23+(.02*.6 ) ~.23+(.02*.72))=>[.242~.2444) a 2 => [.242+(.002*.25) ~.242+(.002*.45)) =>.2425 ~.2429 choose a binary value among [0.2425~ 0.2429)

KUT Lossy Compression Wavelet- Based –Easy to Compute –A useful mathematical tool for hierarchically decomposing functions –Represent a function in terms of a coarse overall shape –Haar Wavelet The Haar basis is the simplest wavelet basis Fast to compute and easy to implement

KUT Haar Wavelet Given a one dimensional data [9 7 3 5] –Recursive pairwise averagine and differencing at different resolutions –The wavelet transform of the original data is given by [ 6 2 1 -1] c0= (a1+a2+a3+a4)/4 c1 = (a1+a2-a3-a4)/4 c2 = (a1-a2)/2 c3 = (a3-a4)/2 Resolution AverageDetail 4[9 7 3 5] 2[8 4][1, -1] 1[6][2]

KUT Reconstruction of Data : Use an Error Tree Example e.g.) d3 = c0-c1+c3 = 6-2+(-1) = 3

KUT Haar Wavelet Compression A large number of the detail coefficients turn out to be very small in magnitude Removing these small coefficients introduces small errors Lossy compression e.g.) For original data [ 9 7 3 5] –haar wavelet coefficients is [ 6 2 1 -1] –Let us take two coefficients, 6, 2 only –Then, we have [ 6 2 0 0] –Reconstructed Data = [ 8 8 4 4]

KUT Energe Preservation and Tresholding Energy(X) =  i=1 N x i 2 =  i=1 N w i 2 –x i is original value and w i is wavelet coefficient L2-norm ( squared error)  i=1 N ( x i –y i ) 2 =  i=1 N (w i -w i ) 2 =  i=k+1 N w i 2 since w i is 0 if i > k+1

Page 1KUT Graduate Course Data Compression Jun-Ki Min.

Similar presentations

Presentation on theme: "Page 1KUT Graduate Course Data Compression Jun-Ki Min."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Page 1KUT Graduate Course Data Compression Jun-Ki Min.

Similar presentations

Presentation on theme: "Page 1KUT Graduate Course Data Compression Jun-Ki Min."— Presentation transcript:

Similar presentations

About project

Feedback