Download presentation
Presentation is loading. Please wait.
Published byMeryl Jackson Modified over 8 years ago
1
Information theory Data compression perspective Pasi Fränti 4.2.2016
2
Bits and Codes One bit: 0 and 1 Two bits:00, 01, 10, 11 Four bits:0000, 0001, 0010 … 1111 (8 values) Eight bits:2 256 values (e.g. ASCII code) k bits 2 k values N values log 2 N bits
3
3 Entropy Self-entropy of symbol Entropy of source
4
4 Prefix code Example of a prefix code a=0 b= 10 c=110 d=111 Example of non-prefix code a=0 b= 01 c=011 d=111
5
5 Probability distribution
6
6 Entropy of binary model
7
7 Huffman coding Symbols and frequencies First step of the process Codetree xP(x)Code a0.050000 b0.050001 c0.1001 d0.201 e0.310 f0.2110 g0.1111
8
8 Huffman coding
9
9 Two coding methods Huffman coding David Huffman, 1952 Prefix code Bottom-up algorithm for construction of the code tree Optimal when probabilities are of the form 2 n Arithmetic coding Rissanen, 1976 General: applies to any source Suitable for dynamic models (no explicit code table) Optimal for any probability model All input file is coded as one code word
10
10 Work space
11
Modeling
12
12 Static or adaptive model Static: + No side information + One pass over the data is enough - Fails if the model is incorrect Semi-adaptive: + Optimizes model to the input data - Two-passes over the image needed - Model must also be stored in the file Adaptive: + Optimizes model to the input data + One pass over the data is enough - Must have time to adapt to the data
13
13 Using wrong model ESTIMATED MODEL: CORRECT MODEL: AVERAGE CODE LENGTH: INEFFICIENCY:
14
14 Context model pixel above
15
15 Context model pixel to left
16
16 Summary of context model NO CONTEXT: f w = 56, f b = 8, p w = 87.5 %, p b = 12.5 % Total bit rate = 10.79 + 24 = 34.79 Entropy = 34.79 / 64 = 0.54 bpp PIXEL ABOVE: Total bit rate = 33.28 Entropy = 33.28 / 64 = 0.52 bpp PIXEL TO LEFT: Total bit rate = 7.32 Entropy = 7.32 / 64 = 0.11 bpp
17
17 Using wrong model
18
Dynamic modeling State automaton in QM-coder
19
Example contexts Scanned binary images
20
Effect of context size Scanned binary images
22
Arithmetic coding
23
Block coding Two problems: Impossible to make code table for binary input Cannot use fractions of bits (p=0.9 H=0.07 bits) Solution 1: Block coding Block symbols Contradicts context model Alphabet explode exponentially with the number of symbols: 3-symbol blocks 256 3 =16 M Solution 2: Arithmetic coding Block entire input! No explicit code table
24
Interval [0,1] divided up to 3-bits accuracy
25
25 Arithmetic coding principles - Length of interval = A - Coding of A takes –log 2 A bits - Divides the interval according to the probabilities - The lengths of the subintervals sums up to 1. 0.25 0.50 0.75 1 0 a c p(a) = 0.7 p(b) = 0.2 p(c) = 0.1 Probabilities: b 0.9 0.7 0 1
26
26 Coding example sequence aab A = 0.098 H = -log 0.098 = 3.35 bits p(a) = 0.7 p(b) = 0.2 p(c) = 0.1 Probabilities:
27
27 Coding of sequence aab a c p(a) = 0.7 p(b) = 0.2 p(c) = 0.1 Probabilities: b a c b 0.25 0.50 0.75 1 0 a c b 0.25 0.50 0.75 1 0 b 0.9 0.7 0 1 0.70 0.49 0.63 0 0 0.490 0.441 0.343
28
28 Code length Length of the final interval: It’s code length: Length with respect to the distribution:
29
29 Optimality of Arithmetic Coding Lower bound for interval size: Upper bound for code length: Length with respect to the distribution: Interval is not exactly power of 2. Round it down to A’ < A that is power of 2
30
/* Initialize lower and upper bounds */ low 0; high 1; cum[0] 0; cum[1] p1; /* Calculate cumulative frequencies */ FOR i 2 TO k DO cum[i] cum[i-1] + pk WHILE Symbols left> DO /* Select the interval for symbol c */ c READ(Input); range high - low; high low + range*cum[c+1]; low low + range*cum[c]; /* Initialize lower and upper bounds */ low 0; high 1; cum[0] 0; cum[1] p1; /* Calculate cumulative frequencies */ FOR i 2 TO k DO cum[i] cum[i-1] + pk WHILE Symbols left> DO /* Select the interval for symbol c */ c READ(Input); range high - low; high low + range*cum[c+1]; low low + range*cum[c]; /* Half-point zooming: lower */ WHILE high < 0.5 DO high 2*high; low 2*low; WRITE(0); FOR buffer TIMES DO WRITE(1); buffer 0; /* Half-point zooming: higher */ WHILE low > 0.5 DO high 2*(high-0.5); low 2*(low-0.5); WRITE(1); FOR buffer TIMES DO WRITE(0); buffer 0; /* Quarter-point zooming */ WHILE (low > 0.25) AND (high < 0.75) THEN high 2*(high-0.25); low 2*(low-0.25); buffer buffer + 1;
31
Working space Text box 0.75
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.