Context-based Data Compression

Context-based Data Compression
Xiaolin Wu Polytechnic University Brooklyn, NY Part 3. Context modeling

Context model – estimated symbol probability
Variable length coding schemes need estimates of probability of each symbol - model Model can be Static - Fixed global model for all inputs English text Semi-adaptive - Computed for specific data being coded and transmitted as side information C programs Adaptive - Constructed on the fly Any source! 9/16/2018

Adaptive vs. Semi-adaptive
Advantages of semi-adaptive Simple decoder Disadvantages of semi-adaptive Overhead of specifying model can be high Two-passes of data required Advantages of adaptive one-pass  universal  As good if not better Disadvantages of adaptive Decoder as complex as encoder Errors propagate 9/16/2018

Adaptation with Arithmetic and Huffman Coding
Huffman Coding - Manipulate Huffman tree on the fly - Efficient algorithms known but nevertheless they remain complex. Arithmetic Coding - Update cumulative probability distribution table. Efficient data structure / algorithm known. Rest essentially same. Main advantage of arithmetic over Huffman is the ease by which the former can be used in conjunction with adaptive modeling techniques. 9/16/2018

Context models If source is not iid then there is complex dependence between symbols in the sequence In most practical situations, pdf of symbol depends on neighboring symbol values - i.e. context. Hence we condition encoding of current symbol to its context. How to select contexts? - Rigorous answer beyond our scope. Practical schemes use a fixed neighborhood. 9/16/2018

Context dilution problem
The minimum code length of sequence achievable by arithmetic coding, if is known. The difficulty of estimating due to insufficient sample statistics prevents the use of high-order Markov models. 9/16/2018

Estimating probabilities different contexts
Two approaches Maintain symbol occurrence counts within each context number of contexts needs to be modest to avoid context dilution Assume pdf shape within each context same (e.g. Laplacian), only parameters (e.g. mean and variance) different Estimation may not be as accurate but much larger number of contexts can be used 9/16/2018

Entropy (Shannon 1948) Self-information of is
Consider a random variable source alphabet : probability mass function : Self-information of is measured in bits if the log is base 2. event of lower the probability carries more information Self-entropy is the weighted average of self information 9/16/2018

Conditional Entropy Conditional Self-information of is
Consider two random variables and Alphabet of : Conditional Self-information of is Conditional Entropy is the average value of conditional self-information 9/16/2018

Entropy and Conditional Entropy
The conditional entropy can be interpreted as the amount of uncertainty remaining about the , given that we know random variable . The additional knowledge of should reduce the uncertainty about 9/16/2018

Context Based Entropy Coders
Consider a sequence of symbols S symbol to code form the context space, C 0 0 0 0 0 1 1 1 1 EC Now let’s see how a context based entropy coder works. First, consider a sequence of symbols. (In the transform coding system, these symbols are the quantization indices.) This purple square S represents the symbol we are going to code. And we define the previous symbols as its context. It could be these 3 blue ones or we can choose more than these 3. The context space is actually a function of all of the possible values of the context indices. For example, if the sequence is binary. And we define previous 3 symbols as the context of the current symbol. Then we will have 2 to 3, it is 8 possible context. Then the context space of this sequence will have 8 possible contexts. To design a context based entropy coder, first we need to estimate the pmf of the symbols under each context. This I refers to the index of the context. This s refers to all symbols. This p is the pmf function of different symbol for the ith context ci. If we still use binary sequence as example, the pmf of this ith context probably look like this. The prob of 1 is 0.6 and the prob of 0 is 0.4. The sum of them is equal to 1. According to this pmf histogram, we can design a code for each of symbol under this context. And the average length will be the entropy of corresponding pmf. The final rate will be close to the average over all different contexts. That is the definition of the conditional entropy. It can be proved that the more context, the lower conditional entropy will be. But does it mean the more context the lower compression rate? The answer is no. 9/16/2018 A(0.2,0.8) C(0.9,0.1) B(0.4,0.6)

Decorrelation techniques to exploit sample smoothness
Transforms DCT, FFT wavelets Differential Pulse Coding Modulation (DPCM) predict current symbol with past observations code prediction residual rather than the symbol 9/16/2018

Benefits of prediction and transform
A priori knowledge exploited to reduce the self-entropy of the source symbols Higher coding efficiency due to Fewer parameters to be estimated adaptively Faster convergence of adaptation 9/16/2018

Further Reading Text Compression - T.Bell, J. Cleary and I. Witten. Prentice Hall. Good coverage on statistical context modeling. Focus on text though. Articles in IEEE Transactions on Information Theory by Rissanen and Langdon Digital Coding of Waveforms: Principles and Applications to Speech and Video. Jayant and Noll. Good coverage on Predictive coding. 9/16/2018

Context-based Data Compression

Similar presentations

Presentation on theme: "Context-based Data Compression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Context-based Data Compression

Similar presentations

Presentation on theme: "Context-based Data Compression"— Presentation transcript:

Similar presentations

About project

Feedback