Presentation is loading. Please wait.

Presentation is loading. Please wait.

Source Coding Efficient Data Representation A.J. Han Vinck.

Similar presentations


Presentation on theme: "Source Coding Efficient Data Representation A.J. Han Vinck."— Presentation transcript:

1 Source Coding Efficient Data Representation A.J. Han Vinck

2 DATA COMPRESSION / REDUCTION 1)INTRODUCTION - presentation of messages in Binary format - conversion into binary form 2)DATA COMPRESSION Lossless - Data Compression without errors 3)DATA REDUCTION Lossy – Data reduction using prediction and context

3 CONVERSION into DIGITAL FORM 1 SAMPLING: discrete exact time samples of the continuous (analoge) signal are produced [v(t)] time v(t) discrete+ amplitude discrete time t t 01 11 10 10 T 2T 3T 4T digital sample values sample rate R = 1/T 2 QUANTIZING: approximation(lossy) into a set discrete levels 3 ENCODING: representation of a level by a binary symbol

4 HOW FAST SHOULD WE SAMPLE ? principle: - an analoge signal can be seen as a sum of sine waves + with some highest sine frequency F h - unique reconstruction from its (exact) samples if (Nyquist, 1928) the sample Rate R = > 2 F h We LIMIT the highest FREQUENCY of the SOURCE without introducing distortion!

5 EXAMPLES: text:represent every symbol with 8 bit  storage: 8 * (500 pages) * 1000 symbols = 4 Mbit  compression possible to 1 Mbit (1:4) speech: sampling speed 8000 samples/sec; accuracy 8 bits/sample;  needed transmission speed 64 kBit/s  compression possible to 4.8 kBit/s (1:10) CD music: sampling speed 44.1 k samples/sec; accuracy 16 bits/sample  needed storage capacity for one hour stereo: 5 Gbit  1250 books  compression possible to 4 bits/sample ( 1:4 ) digital pictures: 300 x 400 pixels x 3 colors x 8 bit/sample  2.9 Mbit/picture; for 25 images/second we need 75 Mb/s 2 hour pictures need 540 Gbit  130.000 books  compression needed (1:100)

6 we have to reduce the amount of data !! using: prediction and context - statistical properties - models - perceptual properties LOSSLESS: remove redundancy exact !! LOSSY: remove irrelevance with distortion !!

7 Shannon source coding theorem Assume –independent source outputs –Consider runs of outputs of length L we expect a certain “type of runs” and –Give a code word for an expected run with prefix ‘1’ –An unexpected run is transmitted as it appears with prefix ‘0’ Example: throw dice 600 times, what do you expect? Example: throw coin 100 times, what do you expect?

8 We start with a binary source Assume: binary sequence x of length L P(0) = 1 – P(1) = 1-p; t is the # of 1‘s For L  ,  and  > 0 and as small as desired Probability ( |t/L –p| >  )    0 i.e. Probability ( |t/L –p| ≤  )  1-   1 (1) L(p-  ) ≤ t ≤ L(p +  ) with high probability

9 Consequence 1: Let A be the set of typical sequences i.e. obeying (1) for these sequences: |t/L –p| ≤  then, P(A)  1 ( as close as wanted, i.e. P(A)  1 -  ) or: almost all observed sequences are typical and have about t ones Note: we use the notation  when we asume that L  

10 Consequence 2: The cardinality of the set A is

11 Shannon 1 Encode every vector in A with N int  L(h(p)+  )+1 bits every vector in A c with L bits –Use a prefix to signal whether we have a typical sequence or not The average codeword length: K = (1-  )[L(h(p)+  ) +1] +  L + 1  L h(p) bits

12 Shannon 1: converse source output words X L encoder Y k : k output bits/L input of length L symbols H(X) =h(p); H(X L ) =Lh(p) k = L[h(p)-  ]  > 0 encoder: assignment of 2 L[h(p)-  ] –1 code words  0 or all zero code word Pe = prob (all zero code word assigned) =Prob(error)

13 Shannon 1: converse source output words X L encoder Y k : k output bits/L input of length L symbols H(X L, Y k ) = H(X L ) + H(Y k | X L ) = Lh(p) = H( Y k ) + H( X L | Y k )  L[h(p)-  ] + h(Pe) + Pe log 2 |source| Lh(p)  L[h(p)-  ] + 1 + L Pe Pe   -1/L  > 0 ! H(X) =h(p); H(X L ) =Lh(p) k = L[h(p)-  ]  > 0

14 typicality Homework: Calculate for  = 0.04 and p = 0.3 h(p), |A|, P(A), (1-  )2 L(h(p)-  ), 2 L(h(p)+  ) as a function of L Homework: Repeat the same arguments for a more general source with entropy H(X)

15 Sources with independent outputs Let U be a source with independent outputs: U 1 U 2  U L subscript = time The set of typical sequences can be defined as Then: for large L

16 Sources with independent outputs cont’d To see how it works, we can write where |U| is the size of the alphabet, P i the probability that symbol i occurs, N i the fraction of occurances of symbol i

17 Sources with independent outputs cont’d the cardinality of the set A  2 LH(U) Proof:

18 encoding Encode every vector in A with  L(H(U)+  )+1 bits every vector in A c with L bits –Use a prefix to signal whether we have a typical sequence or not The average codeword length: K = (1-  )[L(H(U)+  ) +1] +  L +1  L H(U) +1 bits

19 converse For converse, see binary source

20 Sources with memory Let U be a source with memory Output: U 1 U 2  U L subscript = time states: S  {1,2, , |S|} The entropy or minimum description length H(U) = H(U 1 U 2  U L ) (use the chain rule) = H(U 1 )+ H(U 2 |U 1 ) +  + H( U L | U L-1  U 2 U 1 )  H(U 1 )+ H(U 2 ) +  + H( U L ) ( use H(X)  H(X|Y) ) How to calculate?

21 Stationary sources with memory H( U L | U L-1  U 2 U 1 )  H( U L | U L-1  U 2 ) = H( U L-1 | U L-2  U 1 ) stationarity less memory increase the entropy conclusion: there must be a limit for the innovation for large L

22 Cont‘d H(U 1 U 2  U L ) = H(U 1 U 2  U L-1 )+ H( U L | U L-1  U 2 U 1 )  H(U 1 U 2  U L-1 ) +1/L[ H(U 1 ) + H(U 2 | U 1 )+  + H(U L |U L-1  U 2 U 1 )]  H(U 1 U 2  U L-1 ) + 1/L H(U 1 U 2  U L ) thus: 1/L H(U 1 U 2  U L )  1 /(L-1) H(U 1 U 2  U L-1 ) conclusion: the normalized block entropy has a limit

23 Cont‘d 1/L H(U 1 U 2  U L ) = 1/L[ H(U 1 )+ H(U 2 |U 1 ) +  + H( U L | U L-1  U 2 U 1 )]  H( U L | U L-1  U 2 U 1 ) conclusion: the normalized block entropy is  innovation H(U 1 U 2  U L+j )  H( U L-1  U 2 U 1 ) +(j+1) H( U L | U L-1  U 2 U 1 ) conclusion: the limit ( large j) of the normalized block entropy  innovation THUS: limit normalized block entropy = limit innovation

24 Sources with memory: example Note: H(U) = minimum representation length H(U)  average length of a practical scheme English Text: First approach consider words as symbols Scheme: 1. count word frequencies 2. Binary encode words 3. Calculate average representation length Conclusion: we need 12 bits/word; average wordlength 4.5 letter H(english)  2.6 bit/letter Homework: estimate the entropy of your favorite programming language

25 Example: Zipf‘s law Procedure: order the English words according to frequency of occurence Zipf‘s law: frequency of the word at position n is F n = A/n for English A = 0.1 word frequency word order 1101001000 1 0.1 0.01 0.001 Result: H(English) = 2.16 bits/letter In General: F n = a n -b where a and b are constants Application: web-access, complexity of languages

26 Sources with memory: example Another approach: H(U)= H(U 1 )+ H(U 2 |U 1 ) +  + H( U L | U L-1  U 2 U 1 ) Consider text as stationary, i.e. the statistical properties are fixed Measure: P(a), P(b), etc  H(U 1 ) 4.1 P(a|a), P(b|a),...  H(U 2 |U 1 ) 3.6 P(a|aa), P(b|aa),...  H(U 3 | U 2 U 1 ) 3.3... Note: More given letters reduce the entropy. Shannons experiments give as result that 0.6  H(english)  1.3

27 Example: Markov model A Markov model has states: S  { 1, , S } State probabilities P t (S=i) Transitions t  t+1 with probability P (s t+1 =j| s t =i) Every transition determines a specific output, see example i k j P (s t+1 =j| s t =i) P (s t+1 =k| s t =i) 0 1 0 1 0000 11

28 Markov Sources Instead H(U)= H(U 1 U 2  U L ), we consider H(U‘)= H(U 1 U 2  U L,S 1 ) = H(S 1 )+H(U 1 |S 1 ) + H(U 2 |U 1 S 1 ) +  + H(U L |U L-1  U 1 S 1 ) = H(U) + H(S 1 |U L U L-1  U 1 ) Markov property:output U t depends on S t only S t and U t determine S t+1 H(U‘)= H(S 1 ) + H(U 1 |S 1 ) + H(U 2 |S 2 ) +  + H(U L |S L )

29 Markov Sources: further reduction assume: stationarity, i.e. P(S t =i) = P (S=i) for all t > 0 and all i then: H(U 1 |S 1 ) = H(U i |S i ) i = 2, 3, , L H(U 1 |S 1 ) = P(S 1 =1)H(U 1 |S 1 =1) +P(S 1 =2)H(U 1 |S 1 =2) +  +P(S 1 =|S|) H(U 1 |S 1 =|S|) and H(U‘)= H(S 1 ) + LH(U 1 |S 1 ) H(U) = H(U‘) - H(S 1 |U 1 U 2  U L ) Per symbol: 1/L H(U)  H(U 1 |S 1 ) +1/L[H(S 1 ) - H(S 1 |U 1 U 2  U L )] 0

30 example P t (A)= ½ P t (B)= ¼ P t (C)= ¼ ½ 0 ¼ 1 ¼ 2 ABCABC ABCABC P(A|B)=P(A|C)= ½ 01010101 The entropy for this example = 1¼ Homework: check! Homework: construct another example that is stationary and calculate H

31 Appendix to show (1) For a binary sequence X of length n with P(X i =1)=p Because the X i are independent, the variance of the sum = sum of the variances


Download ppt "Source Coding Efficient Data Representation A.J. Han Vinck."

Similar presentations


Ads by Google