Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.

Slides:



Advertisements
Similar presentations
Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Binary Symmetric channel (BSC) is idealised model used for noisy channel. symmetric p( 01) =p(10)
Information theory Multi-user information theory A.J. Han Vinck Essen, 2004.
Sampling and Pulse Code Modulation
Chapter 10 Shannon’s Theorem. Shannon’s Theorems First theorem:H(S) ≤ L n (S n )/n < H(S) + 1/n where L n is the length of a certain code. Second theorem:
Information Theory EE322 Al-Sanie.
Capacity of Wireless Channels
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
II. Linear Block Codes. © Tallal Elshabrawy 2 Last Lecture H Matrix and Calculation of d min Error Detection Capability Error Correction Capability Error.
Chain Rules for Entropy
Data Compression.
Entropy Rates of a Stochastic Process
Chapter 6 Information Theory
Fundamental limits in Information Theory Chapter 10 :
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Variable-Length Codes: Huffman Codes
Compression with Side Information using Turbo Codes Anne Aaron and Bernd Girod Information Systems Laboratory Stanford University Data Compression Conference.
Noise, Information Theory, and Entropy
Channel Polarization and Polar Codes
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Information and Coding Theory
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
§4 Continuous source and Gaussian channel
Gaussian Channel. Introduction The most important continuous alphabet channel is the Gaussian channel depicted in Figure. This is a time-discrete channel.
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Information and Coding Theory Transmission over noisy channels. Channel capacity, Shannon’s theorem. Juris Viksna, 2015.
Course Review for Final ECE460 Spring, Common Fourier Transform Pairs 2.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Channel Capacity.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Redundancy The object of coding is to introduce redundancy so that even if some of the information is lost or corrupted, it will still be possible to recover.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
§2 Discrete memoryless channels and their capacity function
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
DIGITAL COMMUNICATIONS Linear Block Codes
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Source Coding Efficient Data Representation A.J. Han Vinck.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
Basic Concepts of Encoding Codes and Error Correction 1.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
The Finite-state channel was introduced as early as 1953 [McMillan'53]. Memory captured by channel state at end of previous symbol's transmission: - S.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
INFORMATION THEORY Pui-chor Wong.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Mutual Information, Joint Entropy & Conditional Entropy
Mutual Information and Channel Capacity Multimedia Security.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany DATA COMMUNICATION introduction A.J. Han Vinck May 10, 2003.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
The Viterbi Decoding Algorithm
Introduction to Information theory
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Distributed Compression For Binary Symetric Channels
Theory of Information Lecture 13
Watermarking with Side Information
Presentation transcript:

Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication channel. This number grows exponentially with n, and the exponent is known as the channel capacity.

Mathematical model The mathematical analog of a physical signaling system is shown. The mathematical analog of a physical signaling system is shown. Problem: two different input sequences may give rise to the same output sequence; the inputs are confusable. Problem: two different input sequences may give rise to the same output sequence; the inputs are confusable. We show that we can choose a “nonconfusable” subset of input sequences so that with high probability there is only one highly likely input that could have caused the particular output. We show that we can choose a “nonconfusable” subset of input sequences so that with high probability there is only one highly likely input that could have caused the particular output.

Definitions Definition discrete channel :( , p(y|x), Y ) a system consisting of an input alphabet  and output alphabet Y (finite sets) and a probability transition matrix p(y|x) that expresses the probability of observing the output symbol y given that we send the symbol x. The channel is said to be memoryless if the probability distribution of the output depends only on the input at that time and is conditionally independent of previous channel inputs or outputs. Definition “information” channel capacity of a discrete memoryless channel:

Examples Of Channel Capacity Noiseless Binary Channel Any transmitted bit is received without error One error-free bit can be transmitted per use of the channel, so the capacity is 1 bit. Or, by the definition of C : C = max I (X; Y) =1 bit, achieved by: p(x) = (1/2, 1/2).

Examples Of Channel Capacity Noisy channel with nonoverlapping outputs The input can be determined from the output C = max I (X; Y) = 1 bit, achieved by : p(x) = (1/2, 1/2).

Examples Of Channel Capacity Noisy Typewriter the channel input is either received unchanged at the output with probability 1/2 or is transformed into the next letter with probability 1/2 C = max I (X; Y) = max (H(Y ) − H(Y|X)) = maxH(Y) − 1 = log 26 − 1 = log 13, achieved by using p(x) distributed uniformly over all the inputs.

Examples Of Channel Capacity Binary Symmetric Channel This is a model of a channel with errors, all the bits received are unreliable. Equality is achieved when the input distribution is uniform. Hence, the information capacity of a binary symmetric channel with parameter p is C = 1 − H(p) bits.

Examples Of Channel Capacity Binary Erasure Channel A fraction α of the bits are erased. The receiver knows which bits have been erased. [Han Vinck; Essen:] P(X=0) = P 0 I(X;Y) = H(X) – H(X|Y) H(X) = H(P 0 ) H(X|Y) = α H(X) = α H(P 0 ) Thus, C erasure = 1 – α

Properties Of Channel Capacity 1. C ≥ 0 since I (X; Y) ≥ C ≤ log |  | since C = max I (X; Y) ≤ maxH(X) = log |  |. 3. C ≤ log | Y | for the same reason. 4. I (X; Y) is a continuous function of p(x). 5. I (X; Y) is a concave function of p(x) (Theorem 2.7.4). So a local maximum is a global maximum. From properties 2 and 3, the maximum is finite, and we are justified in using the term maximum.

PREVIEW OF THE THEOREM ( AEP again!) For large block lengths, every channel looks like the noisy typewriter channel and the channel has a subset of inputs that produce essentially disjoint sequences at the output. For each (typical) input n-sequence, there are approximately 2 nH(Y |X) possible Y sequences, all of them equally likely. We wish to ensure that no two X sequences produce the same Y output sequence. Otherwise, we will not be able to decide which X sequence was sent. The total number of possible (typical) Y sequences is ≈ 2 nH(Y ). This set has to be divided into sets of size ≈ 2 nH(Y ). This set has to be divided into sets of size 2 nH(Y |X) corresponding to the different input X 2 nH(Y |X) corresponding to the different input X sequences. The total number of disjoint sets is less than or equal to 2 n(H(Y )−H(Y|X)) = 2 nI (X;Y). is less than or equal to 2 n(H(Y )−H(Y|X)) = 2 nI (X;Y). Hence, we can send at most ≈ 2 nI (X;Y) distinguishable sequences of length n.

Definitions A message W, is drawn from the index set {1, 2,..., M}. Definition The nth extension of the discrete memoryless channel (DMC) is the channel (  n, p(y|x), Y ), where is the channel (  n, p(y n |x n ), Y n ), where p(y |x, y ) = p(y|x), k = 1, 2,..., n. p(y k |x k, y k-1 ) = p(y k |x k ), k = 1, 2,..., n. without feedback:

Definitions  Y Definition An (M, n) code for the channel ( , p(y|x), Y ) consists of the following: 1. An index set {1, 2,..., M}.  2. An encoding function X n : {1, 2,...,M} →  n, yielding codewords x n (1), x n (2),..., x n (M). The set of codewords is called the codebook. Y 3. A decoding function g : Y n → {1, 2,..., M}. Definition (Conditional probability of error) Let be the conditional probability of error given that index i was sent.

Definition The maximal probability of error λ(n) for an (M, n) code is defined as Definition The (arithmetic) average probability of error P e (n) for an (M, n) code is defined as Definitions

Definitions Definition The rate R of an (M, n) code is Definition A rate R is said to be achievable if there exists a sequence of (2  nR , n) codes such that the maximal probability of error λ (n) tends to 0 as n→∞. We write (M= 2 nR, n) codes to mean (2  nR , n) codes. This will simplify the notation. Definition The (operational) capacity of a channel is the supremum of all achievable rates. Thus, rates less than capacity yield arbitrarily small probability of error for sufficiently large block lengths.

Jointly Typical Sequences Definition The set A ε (n) of jointly typical sequences {(x n, y n )} with respect to the distribution p(x, y) is the set of n-sequences with empirical entropies ε-close to the true entropies: Where:

Jointly Typical Sequences

There are about 2 nH(X) typical X sequences and about 2 nH(Y ) typical Y sequences. However, since there are only 2 nH(X,Y ) jointly typical sequences, not all pairs of typical X n and typical Y n are also jointly typical. The probability that any randomly chosen pair is jointly typical is about 2 −nI (X;Y). Hence, we can consider about 2 nI (X;Y) such pairs before we are likely to come across a jointly typical pair. This suggests that there are about 2 nI (X;Y) distinguishable signals X n. Proof #3:

Channel Coding Theorem Theorem (7.7.1) (Channel coding theorem) For a discrete memoryless channel, all rates below capacity C are achievable. Specifically, for every rate R < C, there exists a sequence of (2 nR, n) codes with maximum probability of error λ(n) → 0. Conversely, any sequence of (2 nR, n) codes with λ(n) → 0 must have R ≤ C. Proof, Achievability Consider the following:

Proof, Achievability 1. A random code C is generated according to p(x). 2. The code C is then revealed to both sender and receiver. Both sender and receiver are also assumed to know the channel transition matrix p(y|x) for the channel. 3. A message W is chosen according to a uniform distribution 4. The w th codeword X n (w), is sent over the channel. 5. The receiver receives a sequence Y n according to the distribution 6. The receiver guesses which message was sent.

Proof, Achievability The receiver declares that the index Ŵ was sent if the following conditions are satisfied: Let E be the event { Ŵ ≠ W}. By the symmetry of the code construction, the average probability of error averaged over all codes does not depend on the particular index that was sent. Thus, we can assume without loss of generality that the message W = 1 was sent.

Proof, Achievability Define the following events: Where Ei is the event that the ith codeword and Y n are jointly typical. Recall that Y n is the result of sending the first codeword X n (1) over the channel. Then an error occurs in the decoding scheme if either E 1 C occurs, or E2 ∪ E3 ∪ · · · ∪ E2 nR occurs. Hence, letting P(E) denote Pr(E|W = 1) (note: these are equal):

Proof, Achievability Now, by the joint AEP, The probability that X n (i) and Y n are jointly typical is 2 -n(I(x;y)-3ε), (i ≠ 1), Thus:

Proof, Achievability If n is sufficiently large and R < I(X; Y) − 3ε. Hence, if R < I(X; Y), we can choose ε and n so that the average probability of error, averaged over codebooks and codewords, is less than 2ε. To finish the proof, we need a series of reasoning and code selections.

Proof, The Converse Lemma Let Y n be the result of passing X n through a discrete memoryless channel of capacity C. Then Proof :

Zero-error Codes To obtain a strong bound, we arbitrarily assume that W is uniformly distributed over {1, 2,..., 2 nR }. Thus, H(W) = nR. We can now write: Hence, for any zero-error (2 nR, n) code, for all n, R ≤ C.

Proof, The Converse We have to show that any sequence of (2 nR, n) codes with λ (n) → 0 must have R ≤ C. If the maximal probability of error tends to zero, the average probability P e (n) of error goes to zero, Dividing by n, we obtain:

Proof, The Converse Now letting n→∞, we should have R ≤ C. We can rewrite this as: if R > C, the probability of error is bounded away from 0 for sufficiently large n.