SEAC-3 J.Teuhola 201425 3. Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.

Slides:



Advertisements
Similar presentations
Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
CY2G2 Information Theory 1
15-583:Algorithms in the Real World
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Entropy and Shannon’s First Theorem
The University of Manchester Introducción al análisis del código neuronal con métodos de la teoría de la información Dr Marcelo A Montemurro
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
SWE 423: Multimedia Systems
Fundamental limits in Information Theory Chapter 10 :
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Variable-Length Codes: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Information Theory and Security
Noise, Information Theory, and Entropy
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Noise, Information Theory, and Entropy
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
1 Advanced Smoothing, Evaluation of Language Models.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Information and Coding Theory
2. Mathematical Foundations
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
(Important to algorithm analysis )
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Channel Capacity.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School.
Computer Vision – Compression(1) Hanyang University Jong-Il Park.
Information & Communication INST 4200 David J Stucki Spring 2015.
Coding Theory Efficient and Reliable Transfer of Information
Source Coding Efficient Data Representation A.J. Han Vinck.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
Image Compression The still image and motion images can be compressed by lossless coding or lossy coding. Principle of compression: - reduce the redundant.
Context-based Data Compression
Communication Amid Uncertainty
CSE 589 Applied Algorithms Spring 1999
Entropy is Your Friend.
Lecture 2: Basic Information Theory
Lecture 11 The Noiseless Coding Theorem (Section 3.4)
Presentation transcript:

SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission rate of communication Measure of symbol information:  Degree of surprise / uncertainty  Number of yes/no questions (binary decisions) to find out the correct symbol.  Depends on the probability p of the symbol.

SEAC-3 J.Teuhola Choosing the information measure Requirements for information function I(p):  I(p)  0  I(p1p2) = I(p1) + I(p2)  I(p) is continuous with p. The solution is essentially unique: I(p) =  log p = log (1/p). Base of log = 2  The unit of information is bit.

SEAC-3 J.Teuhola Examples Tossing a fair coin: P(heads) = P(tails) = ½  Information measures for one toss: Inf(heads) = Inf(tails) = -log bits = 1 bit  Information measure for a 3-sequence: Inf( ) = -log 2 (½  ½  ½) bits = 3 bits.  Optimal coding: heads  0, tails  1 An unfair coin: P(heads) = 1/8 and P(tails) = 7/8.  Inf(heads) = -log 2 (1/8) bits = 3 bits  Inf(tails) = -log 2 (7/8) bits  bits  Inf( ) = -log 2 (7/8) 3 bits  bits  Improving the coding requires grouping of tosses into blocks.

SEAC-3 J.Teuhola Entropy Measures the average information of a symbol from alphabet S having probability distribution P: Noiseless source encoding theorem (C. Shannon): Entropy H(S) gives a lower bound on the average code length L for any instantaneously decodable system.

SEAC-3 J.Teuhola Example case: Binary source Two symbols, e.g. S = {0, 1}, probabilities p 0 and p 1 = 1  p 0. Entropy = p 0 = 0.5, p 1 = 0.5  H(S) = 1 p 0 = 0.1, p 1 = 0.9  H(S)  p 0 = 0.01, p 1 = 0.99  H(S)  The skewer the distribution, the smaller the entropy. Uniform distribution results in maximum entropy

SEAC-3 J.Teuhola Example case: Predictive model HELLO WOR ? Already processed ‘context’ Next char ProbInf (bits)Weighted information L0.95-log  bits 0.95   bits D0.04-log  bits 0.04   bits M0.01-log  bits 0.01   bits Weighted sum  bits

SEAC-3 J.Teuhola Code redundancy Average redundancy of a code (per symbol): L  H(S). Redundancy can be made = 0, if symbol probabilities are negative powers of 2. (Note that  log 2 (2  i ) = i ) Generally possible: Universal code: L  c1·H(S) + c2 Asymptotically optimal code: c1 = 1

SEAC-3 J.Teuhola Generalization: m-memory source Conditional information: Conditional entropy for a given context: Global entropy over all contexts:

SEAC-3 J.Teuhola About conditional sources Generalized Markov process:  Finite-state machine  For an m-memory source there are q m states  Transitions correspond to symbols that follow the m-block  Transition probabilities are state-dependent Ergodic source:  System settles down to a limiting probability distribution.  Equilibrium state probabilities can be inferred from transition probabilities

SEAC-3 J.Teuhola Solving the example entropy Solution: eigenvector p 0 = 0.385, p 1 = Example application: compression of black-and – white images (black and white areas highly clustered)

SEAC-3 J.Teuhola Empirical observations Shannon’s experimental value for the entropy of the English language  1 bit per character Current text compressor efficiencies:  gzip  2.5 – 3 bits per character  bzip2  2.5 bits per character  The best predictive methods  2 bits per character Improvements are still possible! However, digital images, audio and video are more important data types from compression point of view.

SEAC-3 J.Teuhola Other extensions of entropy Joint entropy, e.g. for two random variables X, Y: Relative entropy: difference of using q i instead of p i : Differential entropy for continuous probability distribution:

SEAC-3 J.Teuhola Kolmogorov complexity Measure of message information = Length of the shortest binary program for generating the message. This is close to entropy H(S) for a sequence of symbols drawn at random from a distribution that S has. Can be much smaller than entropy for artificially generated data: pseudo random numbers, fractals,... Problem: Kolmogorov complexity is not computable! (Cf. Gödel’s incompleteness theorem and Turing machine stopping problem).