Source Coding Efficient Data Representation A.J. Han Vinck.

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

15-583:Algorithms in the Real World
Data Compression CS 147 Minh Nguyen.
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Information Theory EE322 Al-Sanie.
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Chapter 6 Information Theory
Spatial and Temporal Data Mining
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Compression - I Hao Jiang Computer Science Department Sept. 13, 2007.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Data Compression Basics & Huffman Coding
SIMS-201 Audio Digitization. 2  Overview Chapter 12 Digital Audio Digitization of Audio Samples Quantization Reconstruction Quantization error.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
Review of Probability.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
§1 Entropy and mutual information
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Formatting and Baseband Modulation
DIGITAL COMMUNICATION Error - Correction A.J. Han Vinck.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany Data communication line codes and constrained sequences A.J. Han Vinck Revised.
Dr.-Ing. Khaled Shawky Hassan
Information theory in the Modern Information Society A.J. Han Vinck University of Duisburg/Essen January 2003
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.
Channel Capacity.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2.
Information Theory The Work of Claude Shannon ( ) and others.
DIGITAL COMMUNICATIONS Linear Block Codes
Huffman Code and Data Decomposition Pranav Shah CS157B.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Cryptography and Authentication A.J. Han Vinck Essen, 2008
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission ( ) Image Compression Review of Basics Huffman coding run length coding Quantization.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
ECE 101 An Introduction to Information Technology Information Coding.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Entropy estimation and lossless compression Structure and Entropy of English How much lossless compression can be achieved for a given image? How much.
DIGITAL COMMUNICATION. Introduction In a data communication system, the output of the data source is transmitted from one point to another. The rate of.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany DATA COMMUNICATION introduction A.J. Han Vinck May 10, 2003.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
IMAGE COMPRESSION.
Data Compression.
Introduction to Information theory
Analog to digital conversion
Image Compression The still image and motion images can be compressed by lossless coding or lossy coding. Principle of compression: - reduce the redundant.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Context-based Data Compression
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
COT 5611 Operating Systems Design Principles Spring 2014
Image Transforms for Robust Coding
Lecture 2: Basic Information Theory
Presentation transcript:

Source Coding Efficient Data Representation A.J. Han Vinck

DATA COMPRESSION / REDUCTION 1)INTRODUCTION - presentation of messages in Binary format - conversion into binary form 2)DATA COMPRESSION Lossless - Data Compression without errors 3)DATA REDUCTION Lossy – Data reduction using prediction and context

CONVERSION into DIGITAL FORM 1 SAMPLING: discrete exact time samples of the continuous (analoge) signal are produced [v(t)] time v(t) discrete+ amplitude discrete time t t T 2T 3T 4T digital sample values sample rate R = 1/T 2 QUANTIZING: approximation(lossy) into a set discrete levels 3 ENCODING: representation of a level by a binary symbol

HOW FAST SHOULD WE SAMPLE ? principle: - an analoge signal can be seen as a sum of sine waves + with some highest sine frequency F h - unique reconstruction from its (exact) samples if (Nyquist, 1928) the sample Rate R = > 2 F h We LIMIT the highest FREQUENCY of the SOURCE without introducing distortion!

EXAMPLES: text:represent every symbol with 8 bit  storage: 8 * (500 pages) * 1000 symbols = 4 Mbit  compression possible to 1 Mbit (1:4) speech: sampling speed 8000 samples/sec; accuracy 8 bits/sample;  needed transmission speed 64 kBit/s  compression possible to 4.8 kBit/s (1:10) CD music: sampling speed 44.1 k samples/sec; accuracy 16 bits/sample  needed storage capacity for one hour stereo: 5 Gbit  1250 books  compression possible to 4 bits/sample ( 1:4 ) digital pictures: 300 x 400 pixels x 3 colors x 8 bit/sample  2.9 Mbit/picture; for 25 images/second we need 75 Mb/s 2 hour pictures need 540 Gbit  books  compression needed (1:100)

we have to reduce the amount of data !! using: prediction and context - statistical properties - models - perceptual properties LOSSLESS: remove redundancy exact !! LOSSY: remove irrelevance with distortion !!

Shannon source coding theorem Assume –independent source outputs –Consider runs of outputs of length L we expect a certain “type of runs” and –Give a code word for an expected run with prefix ‘1’ –An unexpected run is transmitted as it appears with prefix ‘0’ Example: throw dice 600 times, what do you expect? Example: throw coin 100 times, what do you expect?

We start with a binary source Assume: binary sequence x of length L P(0) = 1 – P(1) = 1-p; t is the # of 1‘s For L  ,  and  > 0 and as small as desired Probability ( |t/L –p| >  )    0 i.e. Probability ( |t/L –p| ≤  )  1-   1 (1) L(p-  ) ≤ t ≤ L(p +  ) with high probability

Consequence 1: Let A be the set of typical sequences i.e. obeying (1) for these sequences: |t/L –p| ≤  then, P(A)  1 ( as close as wanted, i.e. P(A)  1 -  ) or: almost all observed sequences are typical and have about t ones Note: we use the notation  when we asume that L  

Consequence 2: The cardinality of the set A is

Shannon 1 Encode every vector in A with N int  L(h(p)+  )+1 bits every vector in A c with L bits –Use a prefix to signal whether we have a typical sequence or not The average codeword length: K = (1-  )[L(h(p)+  ) +1] +  L + 1  L h(p) bits

Shannon 1: converse source output words X L encoder Y k : k output bits/L input of length L symbols H(X) =h(p); H(X L ) =Lh(p) k = L[h(p)-  ]  > 0 encoder: assignment of 2 L[h(p)-  ] –1 code words  0 or all zero code word Pe = prob (all zero code word assigned) =Prob(error)

Shannon 1: converse source output words X L encoder Y k : k output bits/L input of length L symbols H(X L, Y k ) = H(X L ) + H(Y k | X L ) = Lh(p) = H( Y k ) + H( X L | Y k )  L[h(p)-  ] + h(Pe) + Pe log 2 |source| Lh(p)  L[h(p)-  ] L Pe Pe   -1/L  > 0 ! H(X) =h(p); H(X L ) =Lh(p) k = L[h(p)-  ]  > 0

typicality Homework: Calculate for  = 0.04 and p = 0.3 h(p), |A|, P(A), (1-  )2 L(h(p)-  ), 2 L(h(p)+  ) as a function of L Homework: Repeat the same arguments for a more general source with entropy H(X)

Sources with independent outputs Let U be a source with independent outputs: U 1 U 2  U L subscript = time The set of typical sequences can be defined as Then: for large L

Sources with independent outputs cont’d To see how it works, we can write where |U| is the size of the alphabet, P i the probability that symbol i occurs, N i the fraction of occurances of symbol i

Sources with independent outputs cont’d the cardinality of the set A  2 LH(U) Proof:

encoding Encode every vector in A with  L(H(U)+  )+1 bits every vector in A c with L bits –Use a prefix to signal whether we have a typical sequence or not The average codeword length: K = (1-  )[L(H(U)+  ) +1] +  L +1  L H(U) +1 bits

converse For converse, see binary source

Sources with memory Let U be a source with memory Output: U 1 U 2  U L subscript = time states: S  {1,2, , |S|} The entropy or minimum description length H(U) = H(U 1 U 2  U L ) (use the chain rule) = H(U 1 )+ H(U 2 |U 1 ) +  + H( U L | U L-1  U 2 U 1 )  H(U 1 )+ H(U 2 ) +  + H( U L ) ( use H(X)  H(X|Y) ) How to calculate?

Stationary sources with memory H( U L | U L-1  U 2 U 1 )  H( U L | U L-1  U 2 ) = H( U L-1 | U L-2  U 1 ) stationarity less memory increase the entropy conclusion: there must be a limit for the innovation for large L

Cont‘d H(U 1 U 2  U L ) = H(U 1 U 2  U L-1 )+ H( U L | U L-1  U 2 U 1 )  H(U 1 U 2  U L-1 ) +1/L[ H(U 1 ) + H(U 2 | U 1 )+  + H(U L |U L-1  U 2 U 1 )]  H(U 1 U 2  U L-1 ) + 1/L H(U 1 U 2  U L ) thus: 1/L H(U 1 U 2  U L )  1 /(L-1) H(U 1 U 2  U L-1 ) conclusion: the normalized block entropy has a limit

Cont‘d 1/L H(U 1 U 2  U L ) = 1/L[ H(U 1 )+ H(U 2 |U 1 ) +  + H( U L | U L-1  U 2 U 1 )]  H( U L | U L-1  U 2 U 1 ) conclusion: the normalized block entropy is  innovation H(U 1 U 2  U L+j )  H( U L-1  U 2 U 1 ) +(j+1) H( U L | U L-1  U 2 U 1 ) conclusion: the limit ( large j) of the normalized block entropy  innovation THUS: limit normalized block entropy = limit innovation

Sources with memory: example Note: H(U) = minimum representation length H(U)  average length of a practical scheme English Text: First approach consider words as symbols Scheme: 1. count word frequencies 2. Binary encode words 3. Calculate average representation length Conclusion: we need 12 bits/word; average wordlength 4.5 letter H(english)  2.6 bit/letter Homework: estimate the entropy of your favorite programming language

Example: Zipf‘s law Procedure: order the English words according to frequency of occurence Zipf‘s law: frequency of the word at position n is F n = A/n for English A = 0.1 word frequency word order Result: H(English) = 2.16 bits/letter In General: F n = a n -b where a and b are constants Application: web-access, complexity of languages

Sources with memory: example Another approach: H(U)= H(U 1 )+ H(U 2 |U 1 ) +  + H( U L | U L-1  U 2 U 1 ) Consider text as stationary, i.e. the statistical properties are fixed Measure: P(a), P(b), etc  H(U 1 ) 4.1 P(a|a), P(b|a),...  H(U 2 |U 1 ) 3.6 P(a|aa), P(b|aa),...  H(U 3 | U 2 U 1 ) Note: More given letters reduce the entropy. Shannons experiments give as result that 0.6  H(english)  1.3

Example: Markov model A Markov model has states: S  { 1, , S } State probabilities P t (S=i) Transitions t  t+1 with probability P (s t+1 =j| s t =i) Every transition determines a specific output, see example i k j P (s t+1 =j| s t =i) P (s t+1 =k| s t =i)

Markov Sources Instead H(U)= H(U 1 U 2  U L ), we consider H(U‘)= H(U 1 U 2  U L,S 1 ) = H(S 1 )+H(U 1 |S 1 ) + H(U 2 |U 1 S 1 ) +  + H(U L |U L-1  U 1 S 1 ) = H(U) + H(S 1 |U L U L-1  U 1 ) Markov property:output U t depends on S t only S t and U t determine S t+1 H(U‘)= H(S 1 ) + H(U 1 |S 1 ) + H(U 2 |S 2 ) +  + H(U L |S L )

Markov Sources: further reduction assume: stationarity, i.e. P(S t =i) = P (S=i) for all t > 0 and all i then: H(U 1 |S 1 ) = H(U i |S i ) i = 2, 3, , L H(U 1 |S 1 ) = P(S 1 =1)H(U 1 |S 1 =1) +P(S 1 =2)H(U 1 |S 1 =2) +  +P(S 1 =|S|) H(U 1 |S 1 =|S|) and H(U‘)= H(S 1 ) + LH(U 1 |S 1 ) H(U) = H(U‘) - H(S 1 |U 1 U 2  U L ) Per symbol: 1/L H(U)  H(U 1 |S 1 ) +1/L[H(S 1 ) - H(S 1 |U 1 U 2  U L )] 0

example P t (A)= ½ P t (B)= ¼ P t (C)= ¼ ½ 0 ¼ 1 ¼ 2 ABCABC ABCABC P(A|B)=P(A|C)= ½ The entropy for this example = 1¼ Homework: check! Homework: construct another example that is stationary and calculate H

Appendix to show (1) For a binary sequence X of length n with P(X i =1)=p Because the X i are independent, the variance of the sum = sum of the variances