Download presentation
Presentation is loading. Please wait.
Published byClarissa Austin Modified over 9 years ago
1
Source Coding Efficient Data Representation A.J. Han Vinck
2
DATA COMPRESSION / REDUCTION 1)INTRODUCTION - presentation of messages in Binary format - conversion into binary form 2)DATA COMPRESSION Lossless - Data Compression without errors 3)DATA REDUCTION Lossy – Data reduction using prediction and context
3
CONVERSION into DIGITAL FORM 1 SAMPLING: discrete exact time samples of the continuous (analoge) signal are produced [v(t)] time v(t) discrete+ amplitude discrete time t t 01 11 10 10 T 2T 3T 4T digital sample values sample rate R = 1/T 2 QUANTIZING: approximation(lossy) into a set discrete levels 3 ENCODING: representation of a level by a binary symbol
4
HOW FAST SHOULD WE SAMPLE ? principle: - an analoge signal can be seen as a sum of sine waves + with some highest sine frequency F h - unique reconstruction from its (exact) samples if (Nyquist, 1928) the sample Rate R = > 2 F h We LIMIT the highest FREQUENCY of the SOURCE without introducing distortion!
5
EXAMPLES: text:represent every symbol with 8 bit storage: 8 * (500 pages) * 1000 symbols = 4 Mbit compression possible to 1 Mbit (1:4) speech: sampling speed 8000 samples/sec; accuracy 8 bits/sample; needed transmission speed 64 kBit/s compression possible to 4.8 kBit/s (1:10) CD music: sampling speed 44.1 k samples/sec; accuracy 16 bits/sample needed storage capacity for one hour stereo: 5 Gbit 1250 books compression possible to 4 bits/sample ( 1:4 ) digital pictures: 300 x 400 pixels x 3 colors x 8 bit/sample 2.9 Mbit/picture; for 25 images/second we need 75 Mb/s 2 hour pictures need 540 Gbit 130.000 books compression needed (1:100)
6
we have to reduce the amount of data !! using: prediction and context - statistical properties - models - perceptual properties LOSSLESS: remove redundancy exact !! LOSSY: remove irrelevance with distortion !!
7
Shannon source coding theorem Assume –independent source outputs –Consider runs of outputs of length L we expect a certain “type of runs” and –Give a code word for an expected run with prefix ‘1’ –An unexpected run is transmitted as it appears with prefix ‘0’ Example: throw dice 600 times, what do you expect? Example: throw coin 100 times, what do you expect?
8
We start with a binary source Assume: binary sequence x of length L P(0) = 1 – P(1) = 1-p; t is the # of 1‘s For L , and > 0 and as small as desired Probability ( |t/L –p| > ) 0 i.e. Probability ( |t/L –p| ≤ ) 1- 1 (1) L(p- ) ≤ t ≤ L(p + ) with high probability
9
Consequence 1: Let A be the set of typical sequences i.e. obeying (1) for these sequences: |t/L –p| ≤ then, P(A) 1 ( as close as wanted, i.e. P(A) 1 - ) or: almost all observed sequences are typical and have about t ones Note: we use the notation when we asume that L
10
Consequence 2: The cardinality of the set A is
11
Shannon 1 Encode every vector in A with N int L(h(p)+ )+1 bits every vector in A c with L bits –Use a prefix to signal whether we have a typical sequence or not The average codeword length: K = (1- )[L(h(p)+ ) +1] + L + 1 L h(p) bits
12
Shannon 1: converse source output words X L encoder Y k : k output bits/L input of length L symbols H(X) =h(p); H(X L ) =Lh(p) k = L[h(p)- ] > 0 encoder: assignment of 2 L[h(p)- ] –1 code words 0 or all zero code word Pe = prob (all zero code word assigned) =Prob(error)
13
Shannon 1: converse source output words X L encoder Y k : k output bits/L input of length L symbols H(X L, Y k ) = H(X L ) + H(Y k | X L ) = Lh(p) = H( Y k ) + H( X L | Y k ) L[h(p)- ] + h(Pe) + Pe log 2 |source| Lh(p) L[h(p)- ] + 1 + L Pe Pe -1/L > 0 ! H(X) =h(p); H(X L ) =Lh(p) k = L[h(p)- ] > 0
14
typicality Homework: Calculate for = 0.04 and p = 0.3 h(p), |A|, P(A), (1- )2 L(h(p)- ), 2 L(h(p)+ ) as a function of L Homework: Repeat the same arguments for a more general source with entropy H(X)
15
Sources with independent outputs Let U be a source with independent outputs: U 1 U 2 U L subscript = time The set of typical sequences can be defined as Then: for large L
16
Sources with independent outputs cont’d To see how it works, we can write where |U| is the size of the alphabet, P i the probability that symbol i occurs, N i the fraction of occurances of symbol i
17
Sources with independent outputs cont’d the cardinality of the set A 2 LH(U) Proof:
18
encoding Encode every vector in A with L(H(U)+ )+1 bits every vector in A c with L bits –Use a prefix to signal whether we have a typical sequence or not The average codeword length: K = (1- )[L(H(U)+ ) +1] + L +1 L H(U) +1 bits
19
converse For converse, see binary source
20
Sources with memory Let U be a source with memory Output: U 1 U 2 U L subscript = time states: S {1,2, , |S|} The entropy or minimum description length H(U) = H(U 1 U 2 U L ) (use the chain rule) = H(U 1 )+ H(U 2 |U 1 ) + + H( U L | U L-1 U 2 U 1 ) H(U 1 )+ H(U 2 ) + + H( U L ) ( use H(X) H(X|Y) ) How to calculate?
21
Stationary sources with memory H( U L | U L-1 U 2 U 1 ) H( U L | U L-1 U 2 ) = H( U L-1 | U L-2 U 1 ) stationarity less memory increase the entropy conclusion: there must be a limit for the innovation for large L
22
Cont‘d H(U 1 U 2 U L ) = H(U 1 U 2 U L-1 )+ H( U L | U L-1 U 2 U 1 ) H(U 1 U 2 U L-1 ) +1/L[ H(U 1 ) + H(U 2 | U 1 )+ + H(U L |U L-1 U 2 U 1 )] H(U 1 U 2 U L-1 ) + 1/L H(U 1 U 2 U L ) thus: 1/L H(U 1 U 2 U L ) 1 /(L-1) H(U 1 U 2 U L-1 ) conclusion: the normalized block entropy has a limit
23
Cont‘d 1/L H(U 1 U 2 U L ) = 1/L[ H(U 1 )+ H(U 2 |U 1 ) + + H( U L | U L-1 U 2 U 1 )] H( U L | U L-1 U 2 U 1 ) conclusion: the normalized block entropy is innovation H(U 1 U 2 U L+j ) H( U L-1 U 2 U 1 ) +(j+1) H( U L | U L-1 U 2 U 1 ) conclusion: the limit ( large j) of the normalized block entropy innovation THUS: limit normalized block entropy = limit innovation
24
Sources with memory: example Note: H(U) = minimum representation length H(U) average length of a practical scheme English Text: First approach consider words as symbols Scheme: 1. count word frequencies 2. Binary encode words 3. Calculate average representation length Conclusion: we need 12 bits/word; average wordlength 4.5 letter H(english) 2.6 bit/letter Homework: estimate the entropy of your favorite programming language
25
Example: Zipf‘s law Procedure: order the English words according to frequency of occurence Zipf‘s law: frequency of the word at position n is F n = A/n for English A = 0.1 word frequency word order 1101001000 1 0.1 0.01 0.001 Result: H(English) = 2.16 bits/letter In General: F n = a n -b where a and b are constants Application: web-access, complexity of languages
26
Sources with memory: example Another approach: H(U)= H(U 1 )+ H(U 2 |U 1 ) + + H( U L | U L-1 U 2 U 1 ) Consider text as stationary, i.e. the statistical properties are fixed Measure: P(a), P(b), etc H(U 1 ) 4.1 P(a|a), P(b|a),... H(U 2 |U 1 ) 3.6 P(a|aa), P(b|aa),... H(U 3 | U 2 U 1 ) 3.3... Note: More given letters reduce the entropy. Shannons experiments give as result that 0.6 H(english) 1.3
27
Example: Markov model A Markov model has states: S { 1, , S } State probabilities P t (S=i) Transitions t t+1 with probability P (s t+1 =j| s t =i) Every transition determines a specific output, see example i k j P (s t+1 =j| s t =i) P (s t+1 =k| s t =i) 0 1 0 1 0000 11
28
Markov Sources Instead H(U)= H(U 1 U 2 U L ), we consider H(U‘)= H(U 1 U 2 U L,S 1 ) = H(S 1 )+H(U 1 |S 1 ) + H(U 2 |U 1 S 1 ) + + H(U L |U L-1 U 1 S 1 ) = H(U) + H(S 1 |U L U L-1 U 1 ) Markov property:output U t depends on S t only S t and U t determine S t+1 H(U‘)= H(S 1 ) + H(U 1 |S 1 ) + H(U 2 |S 2 ) + + H(U L |S L )
29
Markov Sources: further reduction assume: stationarity, i.e. P(S t =i) = P (S=i) for all t > 0 and all i then: H(U 1 |S 1 ) = H(U i |S i ) i = 2, 3, , L H(U 1 |S 1 ) = P(S 1 =1)H(U 1 |S 1 =1) +P(S 1 =2)H(U 1 |S 1 =2) + +P(S 1 =|S|) H(U 1 |S 1 =|S|) and H(U‘)= H(S 1 ) + LH(U 1 |S 1 ) H(U) = H(U‘) - H(S 1 |U 1 U 2 U L ) Per symbol: 1/L H(U) H(U 1 |S 1 ) +1/L[H(S 1 ) - H(S 1 |U 1 U 2 U L )] 0
30
example P t (A)= ½ P t (B)= ¼ P t (C)= ¼ ½ 0 ¼ 1 ¼ 2 ABCABC ABCABC P(A|B)=P(A|C)= ½ 01010101 The entropy for this example = 1¼ Homework: check! Homework: construct another example that is stationary and calculate H
31
Appendix to show (1) For a binary sequence X of length n with P(X i =1)=p Because the X i are independent, the variance of the sum = sum of the variances
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.