Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Sampling and Pulse Code Modulation
Information Theory EE322 Al-Sanie.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7. Statistical Estimation and Sampling Distributions
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
April 2, 2015Applied Discrete Mathematics Week 8: Advanced Counting 1 Random Variables In some experiments, we would like to assign a numerical value to.
Data Compression.
An Introduction to Markov Decision Processes Sarah Hickmott
Entropy Rates of a Stochastic Process
Discrete geometry Lecture 2 1 © Alexander & Michael Bronstein
Visual Recognition Tutorial
CSE 589 Applied Algorithms Spring 1999 Image Compression Vector Quantization Nearest Neighbor Search.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Fundamental limits in Information Theory Chapter 10 :
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Spatial and Temporal Data Mining
PART 7 Constructing Fuzzy Sets 1. Direct/one-expert 2. Direct/multi-expert 3. Indirect/one-expert 4. Indirect/multi-expert 5. Construction from samples.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Network Source Coding Lee Center Workshop 2006 Wei-Hsin Gu (EE, with Prof. Effros)
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Visual Recognition Tutorial
Variable-Length Codes: Huffman Codes
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Linear Codes for Distributed Source Coding: Reconstruction of a Function of the Sources -D. Krithivasan and S. Sandeep Pradhan -University of Michigan,
Information Theory and Security
Maximum likelihood (ML)
LIAL HORNSBY SCHNEIDER
Radial Basis Function Networks
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Information and Coding Theory Linear Block Codes. Basic definitions and some examples. Juris Viksna, 2015.
Channel Capacity.
Extremal Problems of Information Combining Alexei Ashikhmin  Information Combining: formulation of the problem  Mutual Information Function for the Single.
Set, Combinatorics, Probability & Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Set,
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Name Iterative Source- and Channel Decoding Speaker: Inga Trusova Advisor: Joachim Hagenauer.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Vector Quantization CAP5015 Fall 2005.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Euclidean space Euclidean space is the set of all n-tuples of real numbers, formally with a number called distance assigned to every pair of its elements.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany DATA COMMUNICATION introduction A.J. Han Vinck May 10, 2003.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
S.R.Subramanya1 Outline of Vector Quantization of Images.
2018/9/16 Distributed Source Coding Using Syndromes (DISCUS): Design and Construction S.Sandeep Pradhan, Kannan Ramchandran IEEE Transactions on Information.
Synaptic Dynamics: Unsupervised Learning
Latent Variables, Mixture Models and EM
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Clustering 77B Recommender Systems
Foundation of Video Coding Part II: Scalar and Vector Quantization
Biointelligence Laboratory, Seoul National University
Information Theoretical Analysis of Digital Watermarking
Error Correction Coding
Watermarking with Side Information
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Rate Distortion Theory

Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a continuous random variable can never be perfect. Define the “goodness” of a representation of a source: distortion measure which is a measure of distance between the random variable and its representation. The basic problem in rate distortion theory can then be stated as follows: Given a source distribution and a distortion measure, what is the minimum expected distortion achievable at a particular rate? Or, equivalently, what is the minimum rate description required to achieve a particular distortion?

Quantization In this section we motivate the elegant theory of rate distortion by showing how complicated it is to solve the quantization problem exactly for a single random variable. Since a continuous random source requires infinite precision to represent exactly, we cannot reproduce it exactly using a finite-rate code. We first consider the problem of representing a single sample from the source. Let the random variable be represented be X and let the representation of X be denoted as ˆX(X). If we are given R bits to represent X, the function ˆX can take on 2 R values. The problem is to find the optimum set of values for ˆX (called the reproduction points or code points) and the regions that are associated with each value ˆX.

Example: Gaussian Distribution For example, let X ∼ N(0, σ 2 ), and assume a squared-error distortion measure. In this case we wish to find the function ˆX(X) such that ˆX takes on at most 2 R values and minimizes E(X −ˆX(X)) 2. If we are given one bit to represent X, it is clear that the bit should distinguish whether or not X > 0. To minimize squared error, each reproduced symbol should be the mean of its region.

Example Thus, If we are given 2 bits to represent the sample, the situation is not as simple. Clearly, we want to divide the real line into four regions and use a point within each region to represent the sample.

Optimal Regions and Reconstruction Points But it is no longer immediately obvious what the representation regions and the reconstruction points should be. We can, however, state two simple properties of optimal regions and reconstruction points for the quantization of a single random variable: Given a set {ˆX(w)} of reconstruction points, the distortion is minimized by mapping a source random variable X to the representation ˆX(w) that is closest to it. The set of regions of  defined by this mapping is called a Voronoi or Dirichlet partition defined by the reconstruction points. The reconstruction points should minimize the conditional expected distortion over their respective assignment regions.

The “Good” Quantizer These two properties enable us to construct a simple algorithm to find a “good” quantizer: 1.We start with a set of reconstruction points, find the optimal set of reconstruction regions (which are the nearest-neighbor regions with respect to the distortion measure); 2.Then find the optimal reconstruction points for these regions (the centroids of these regions if the distortion is squared error), 3.Then repeat the iteration for this new set of reconstruction points. The expected distortion is decreased at each stage in the algorithm, so the algorithm will converge to a local minimum of the distortion. This algorithm is called the Lloyd algorithm

Peculiarities Instead of quantizing a single random variable, let us assume that we are given a set of n i.i.d. random variables drawn according to a Gaussian distribution. These random variables are to be represented using nR bits. Since the source is i.i.d., the symbols are independent, and it may appear that the representation of each element is an independent problem to be treated separately. But this is not true. We will represent the entire sequence by a single index taking 2 nR values. This treatment of entire sequences at once achieves a lower distortion for the same rate than independent quantization of the individual samples.

Encoder and Decoder Assume that we have a source that produces a sequence X 1,X 2,..., X n i.i.d. ∼ p(x), x ∈ X. We assume that the alphabet is finite, but most of the proofs can be extended to continuous random variables. The encoder describes the source sequence X n by an index f n (X n ) ∈ {1, 2,..., 2 nR }. The decoder represents X n by an estimate ˆX n ∈ ˆX, as illustrated in Figure

Distortion Function Definition A distortion function or distortion measure is a mapping d : X × ˆX → R + from the set of source alphabet-reproduction alphabet pairs into the set of nonnegative real numbers. The distortion d(x, ˆx) is a measure of the cost of representing the symbol x by the symbol ˆx.

Bounded Distortion Definition A distortion measure is said to be bounded if the maximum value of the distortion is finite: In most cases, the reproduction alphabet ˆX is the same as the source alphabet X.

Hamming and Square Error Distortion Hamming (probability of error) distortion. The Hamming distortion is given by which results in a probability of error distortion, since Ed(X, ˆX) = Pr(X ≠ ˆX). Squared-error distortion. The squared-error distortion, is the most popular distortion measure used for continuous alphabets. Its advantages are its simplicity and its relationship to least-squares prediction.

Distortion between Sequences Definition The distortion between sequences x n and ˆx n is defined by So the distortion for a sequence is the average of the per symbol distortion of the elements of the sequence.

Rate Distortion Code Definition A (2 nR, n)-rate distortion code consists of an encoding function, f n : X n → {1, 2,..., 2 nR }, and a decoding (reproduction) function, g n : {1, 2,..., 2 nR } → ˆX n. The distortion associated with the (2 nR, n) code is defined as where the expectation is with respect to the probability distribution on X:

Some Terms The set of n-tuples g n (1), g n (2),..., g n (2 nR ), denoted by ˆX n (1),...,ˆX n (2 nR ), constitutes the codebook, and f −1 n (1),...,f −1 n (2 nR ) are the associated assignment regions. Many terms are used to describe the replacement of X n by its quantized version ˆX n (w). It is common to refer to ˆX n as the vector quantization, reproduction, reconstruction, representation, source code, or estimate of X n.

Distortion-Related Definitions Definition A rate distortion pair (R,D) is said to be achievable if there exists a sequence of (2 nR, n)-rate distortion codes (f n, g n ) with: Definition The rate distortion region for a source is the closure of the set of achievable rate distortion pairs (R,D). Definition The rate distortion function R(D) is the infimum of rates R such that (R,D) is in the rate distortion region of the source for a given distortion D. Definition The distortion rate function D(R) is the infimum of all distortions D such that (R,D) is in the rate distortion region of the source for a given rate R.