Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information theory Data compression perspective Pasi Fränti 4.2.2016.

Similar presentations


Presentation on theme: "Information theory Data compression perspective Pasi Fränti 4.2.2016."— Presentation transcript:

1 Information theory Data compression perspective Pasi Fränti 4.2.2016

2 Bits and Codes One bit: 0 and 1 Two bits:00, 01, 10, 11 Four bits:0000, 0001, 0010 … 1111 (8 values) Eight bits:2 256 values (e.g. ASCII code) k bits  2 k values  N values  log 2 N bits

3 3 Entropy Self-entropy of symbol Entropy of source

4 4 Prefix code Example of a prefix code a=0 b= 10 c=110 d=111 Example of non-prefix code a=0 b= 01 c=011 d=111

5 5 Probability distribution

6 6 Entropy of binary model

7 7 Huffman coding Symbols and frequencies First step of the process Codetree xP(x)Code a0.050000 b0.050001 c0.1001 d0.201 e0.310 f0.2110 g0.1111

8 8 Huffman coding

9 9 Two coding methods Huffman coding David Huffman, 1952 Prefix code Bottom-up algorithm for construction of the code tree Optimal when probabilities are of the form 2 n Arithmetic coding Rissanen, 1976 General: applies to any source Suitable for dynamic models (no explicit code table) Optimal for any probability model All input file is coded as one code word

10 10 Work space

11 Modeling

12 12 Static or adaptive model Static: + No side information + One pass over the data is enough - Fails if the model is incorrect Semi-adaptive: + Optimizes model to the input data - Two-passes over the image needed - Model must also be stored in the file Adaptive: + Optimizes model to the input data + One pass over the data is enough - Must have time to adapt to the data

13 13 Using wrong model ESTIMATED MODEL: CORRECT MODEL: AVERAGE CODE LENGTH: INEFFICIENCY:

14 14 Context model pixel above

15 15 Context model pixel to left

16 16 Summary of context model NO CONTEXT: f w = 56, f b = 8, p w = 87.5 %, p b = 12.5 % Total bit rate = 10.79 + 24 = 34.79 Entropy = 34.79 / 64 = 0.54 bpp PIXEL ABOVE: Total bit rate = 33.28 Entropy = 33.28 / 64 = 0.52 bpp PIXEL TO LEFT: Total bit rate = 7.32 Entropy = 7.32 / 64 = 0.11 bpp

17 17 Using wrong model

18 Dynamic modeling State automaton in QM-coder

19 Example contexts Scanned binary images

20 Effect of context size Scanned binary images

21

22 Arithmetic coding

23 Block coding Two problems: Impossible to make code table for binary input Cannot use fractions of bits (p=0.9  H=0.07 bits) Solution 1: Block coding Block symbols Contradicts context model Alphabet explode exponentially with the number of symbols: 3-symbol blocks 256 3 =16 M Solution 2: Arithmetic coding Block entire input! No explicit code table

24 Interval [0,1] divided up to 3-bits accuracy

25 25 Arithmetic coding principles - Length of interval = A - Coding of A takes –log 2 A bits - Divides the interval according to the probabilities - The lengths of the subintervals sums up to 1. 0.25 0.50 0.75 1 0 a c p(a) = 0.7 p(b) = 0.2 p(c) = 0.1 Probabilities: b 0.9 0.7 0 1

26 26 Coding example sequence aab A = 0.098 H = -log 0.098 = 3.35 bits p(a) = 0.7 p(b) = 0.2 p(c) = 0.1 Probabilities:

27 27 Coding of sequence aab a c p(a) = 0.7 p(b) = 0.2 p(c) = 0.1 Probabilities: b a c b 0.25 0.50 0.75 1 0 a c b 0.25 0.50 0.75 1 0 b 0.9 0.7 0 1 0.70 0.49 0.63 0 0 0.490 0.441 0.343

28 28 Code length Length of the final interval: It’s code length: Length with respect to the distribution:

29 29 Optimality of Arithmetic Coding Lower bound for interval size: Upper bound for code length: Length with respect to the distribution: Interval is not exactly power of 2. Round it down to A’ < A that is power of 2

30 /* Initialize lower and upper bounds */ low  0; high  1; cum[0]  0; cum[1]  p1; /* Calculate cumulative frequencies */ FOR i  2 TO k DO cum[i]  cum[i-1] + pk WHILE Symbols left> DO /* Select the interval for symbol c */ c  READ(Input); range  high - low; high  low + range*cum[c+1]; low  low + range*cum[c]; /* Initialize lower and upper bounds */ low  0; high  1; cum[0]  0; cum[1]  p1; /* Calculate cumulative frequencies */ FOR i  2 TO k DO cum[i]  cum[i-1] + pk WHILE Symbols left> DO /* Select the interval for symbol c */ c  READ(Input); range  high - low; high  low + range*cum[c+1]; low  low + range*cum[c]; /* Half-point zooming: lower */ WHILE high < 0.5 DO high  2*high; low  2*low; WRITE(0); FOR buffer TIMES DO WRITE(1); buffer  0; /* Half-point zooming: higher */ WHILE low > 0.5 DO high  2*(high-0.5); low  2*(low-0.5); WRITE(1); FOR buffer TIMES DO WRITE(0); buffer  0; /* Quarter-point zooming */ WHILE (low > 0.25) AND (high < 0.75) THEN high  2*(high-0.25); low  2*(low-0.25); buffer  buffer + 1;

31 Working space Text box 0.75


Download ppt "Information theory Data compression perspective Pasi Fränti 4.2.2016."

Similar presentations


Ads by Google