Download presentation
Presentation is loading. Please wait.
Published bySri Tedja Modified over 6 years ago
1
COT 5611 Operating Systems Design Principles Spring 2012
Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 5:00-6:00 PM
2
Lecture 17 – Wednesday March 14, 2012
Reading assignment: Chapter 8 from the on-line text Claude Shannon’s paper Last time - Information Theory Information theory - a statistical theory of communication Random variables, probability density functions (PDF), cumulative distribution functions (CDF), Thermodynamic entropy Shannon entropy Joint and conditional entropy Mutual information Shannon’s source coding theorem Channel capacity 11/19/2018 Lecture 17
3
Today Information Theory Applications of information theory
Properties of Shannon’s entropy Joint and conditional entropy Mutual information Shannon’s source coding theorem Channel capacity Error detection and error correction 11/19/2018 Lecture 17
4
Applications of information theory
Error detection and error correction increase redundancy to protect the message. Data compression remove redundancy. Encryption transform information to protect it. 11/19/2018 Lecture 17
5
Properties of binary Shannon’s entropy
H(X) > 0 for 0 < p < 1; H(X) is symmetric about p = 0:5; limp0H(X) = limp1H(X) = 0; H(X) is increasing for 0 < p < 0:5, decreasing for 0:5 < p < 1 and has a maximum for p = 0:5. The binary entropy is a concave function of p, the probability of an outcome. Note: A function f(x) is convex over an interval (a,b) if f[kx1+(1-k)x2] ≤ kf(x1)+(1-k)x2 for all (x1,x2 ) in (a,b) and 0 ≤ k ≤ 1. A function is concave over an interval (a,b) if [-f(x)] is convex over (a,b). 11/19/2018 Lecture 17
6
Properties of binary Shannon’s entropy
H(X) > 0 for 0 < p < 1; H(X) is symmetric about p = 0:5; limp0H(X) = limp1H(X) = 0; H(X) is increasing for 0 < p < 0:5, decreasing for 0:5 < p < 1 and has a maximum for p = 0:5. The binary entropy is a concave function of p, the probability of an outcome. Note: A function f(x) is convex over an interval (a,b) if f[kx1+(1-k)x2] ≤ kf(x1)+(1-k)x2 for all (x1,x2 ) in (a,b) and 0 ≤ k ≤ 1. A function is concave over an interval (a,b) if [-f(x)] is convex over (a,b). 11/19/2018 Lecture 17
7
Shannon entropy is a concave function of p
11/19/2018 Lecture 17
8
Joint and conditional entropy; mutual information
11/19/2018 Lecture 17
9
11/19/2018 Lecture 17
10
Properties of joint and conditional entropy
H(X, Y) = H(Y,X) symmetry of joint entropy H(X, Y) ≥ nonnegativity of joint entropy H(X | Y) ≥ 0; H(Y | X) ≥ nonnegativity of conditional entropy H(X | Y) = H(X,Y) - H(Y) conditional and joint entropy relation H(X,Y) ≥ H(Y ) joint entropy vs. entropy of a single rv H(X,Y) ≤ H(X) + H(Y ) subadditivity H(X, Y, Z) + H(Y) ≤ H(X,Y) + H(Y,Z) strong subadditivity H(X | Y) ≤ H(X) reduction of uncertainty by conditioning H(X,Y,Z) = H(X) + H(Y | X) + H(Z | X, Y ) chain rule for joint entropy H(X,Y | Z) = H(Y | X,Z) + H(X | Z) chain rule for conditional entropy: 11/19/2018 Lecture 17
11
Properties of mutual information
I(X; Y) = I(Y ;X) symmetry of mutual entropy I(X; Y) = H(X) - H(X j Y ) mutual information, entropy, and conditional entropy I(X; Y) = H(Y ) - H(Y|X) mutual information, entropy, and conditional entropy I(X;X) = H(X) mutual self information and entropy I(X;X) ≥0; non-negativity of mutual self information I(X;Y) = H(X) + H(Y ) - H(X,Y ) mutual information, entropy, and joint entropy I(X; Y | Z) = H(X | Z) - H(X | Y,Z) conditional mutual information and conditional entropy I(X, Y;Z) = I(X;Z | Y ) + I(Y ;Z) chain rule for mutual information I(X; Y) ≤ I(X;Z) if X Y Z data processing inequality 11/19/2018 Lecture 17
12
Shannon’s source coding theorem
Informally, Shannon source encoding theorem states that a message containing n independent, identically distributed samples of a random variable X with entropy H(X) can be compressed to a length The justification of this theorem is based on the weak law of large numbers The mean of a large number of independent, identically distributed random variables, xi, approaches the average, with a high probability when n is large with 𝜺 and 𝛅 arbitrary. lX(n) = nH(X) + O(n) 11/19/2018 Lecture 17
13
Shannon’s source coding theorem
When the source has an alphabet with m symbols and messages consist of n independently selected symbols from this alphabet a large number of these sequences are typical. There are 2nH(A) typical strings, therefore we need log 2nH(A) = nH(A) bits to encode all possible typical strings; this is the upper bound for the data compression provided by Shannon's source encoding theorem. 11/19/2018 Lecture 17
14
Binary erasure channel
11/19/2018 Lecture 17
15
Channel capacity Discrete memoryless channel: C= maxp(x)I(X;Y)
the maximum of mutual information between the input X and the output Y. The capacity of a noisy channel The noisy binary symmetric channel: p probability of error; q=Prob(X=0) I(X;Y) = H(Y) –H(Y|X) H(Y | X) = -{ q [p log p + (1 - p) log(1 - p)] + (1 - q) [p log p + (1 - p) log(1 - p)]} = [p log p + (1 - p) log(1 - p)] We maximize I(X;Y) by making H(Y)=1 C=1 - [p log p + (1 - p) log(1 - p)] p=1/2 C=0 because the output is independent of the input; p=0 or p=1 C=1 we have a noiseless channel The capacity of the binary erasure channel with pe the probability of erasure Ce = 1- pe 11/19/2018 Lecture 17
16
Error detection and error correction (ECC)
Error detection and error correction based on schemes to increase the redundancy of a message. A crude analogy is to bubble wrap a fragile item and place it into a box to reduce the chance that the item will be damaged during transport. Redundant information plays the role of the packing materials; it increases the amount of data transmitted, but it also increases the chance that we will be able to restore the original contents of a message distorted during communication. Coding corresponds to the selection of both the packing materials and the strategy to optimally pack the fragile item subject to the obvious constraints: use the least amount of packing materials and the least amount of effort to pack and unpack. Error detection compare what you received with the code words from the common dictionary; if there is no match error(s) have occurred Error correction map the received message to a valid code word. 11/19/2018 Lecture 17
17
Examples and limitations of ECC
A trivial example of an error detection scheme the addition of a parity check bit to a word of a given length. This is a simple scheme but very powerful; it allows us to detect an odd number of errors, but fails if an even number of errors occur. For example, consider a system that enforces even parity for an eight-bit word. Given the string , we add one more bit to ensure that the total number of 1s is even, in this case a 0, and we transmit the nine-bit string The error detection procedure is to count the number of 1s; we decide that the string is in error if this number is odd. This example also hints to the limitations of error detection mechanisms. A code is designed with certain error detection or error correction capabilities and fails to detect, or to correct error patterns not covered by the original design of the code. In the previous example we transmit and when two errors occur, in the 4-th and the 7-th bits we receive This tuple has even parity (an even number of 1's) and our scheme for error detection fails. 11/19/2018 Lecture 17
18
Code n-tuple a set of n-symbols from an alphabet A.
Example A={0,1,2} and n=6 , , , etc. A={0,1) (binary alphabet) n=3 000, 001,010,100, 110, 101, 011, 111 Code a set of n-tuples. Example: Binary code C select 2k codewords from the 2n possible binary n-tuples The sender and the receiver share the knowledge of all the code words in C Hamming distance the number of positions two code words differ Distance d of a code C the minimum distance between any pair of code words of C Hamming sphere of radius d around a code w – the set of all n-tuples at distance at most d from w. 11/19/2018 Lecture 17
19
Block codes A block code C=[n,M] consists of code words of length n and allows the encoding of M messages. Example: consider binary [n,M] codes; for example n=6 and M=4. The code: C={c0,c1,c2,c3} with c0=00000, c1= , c2 = , c3=111011 Out of the 26 possible binary 6-tuples we have selected 4 as code words. Hamming distance of two code words: the number of bit position they differ d(c1,c3) =3 The Hamming distance of the code C the minimum distance between any pair of code words: d(C)=3 Indeed d(c0,c1) =4, d(c0,c2) =3, d(c0,c3) =5, d(c1,c2) =5, d(c1,c3) =3, d(c2,c3) =3 To compute the Hamming distance for an [n,M] code, it is necessary to compute the distance between CM2pairs of codewords and then to find the pair with the minimum distance. 11/19/2018 Lecture 17
20
Encoding Encoding map k information symbols into n = k+r by adding r redundancy symbols Example: repetitive code: Encode 0 and 1 111. Then the two code words are 000 and 111; the other 3-tuples are: 100, 010, 001, 011, 101, 110 decode any received 3-tuple with one error as follows 100, 010, 001 0 011, 101, 110 1 The Hamming sphere of radius 1 around 000 and 111 11/19/2018 Lecture 17
21
11/19/2018 Lecture 17
22
11/19/2018 Lecture 17
23
Errors 11/19/2018 Lecture 17
24
Decoding in the presence of errors
Send c; receive v=c+e Minimum distance or nearest neighbor decoding. If an n-tuple v is received, and there is a unique codeword c such that d (v,c) is the minimum over all codewords of C then correct v as the codeword c. If no such c exists, report that errors have been detected, but no correction is possible. If multiple codewords are at the same minimum distance from the received codeword select at random one of them and decode v as that codeword. Maximum likelihood decoding. Under this decoding policy, of all possible codewords c the n-tuple v is decoded to that codeword c which maximizes the probability P(v,c) that v is received, given that c is sent. 11/19/2018 Lecture 17
25
Example of maximum likelihood decoding
Consider the same code C= {c0=00000, c1= , c2 = , c3=111011} Probability of a bit in error is p=0.15 When we receive v = we decode it as p(v, ) = (0.15)6 = p(v,101100) = (0.15)3 x (0.85)3 = p(v,010110) = (0.15)3 x (0.85)3 = p(v,111011) = (0.15)1 x (0.85)5 = 11/19/2018 Lecture 17
26
Error detecting and error correcting codes
The error detection and error correction capabilities of a code are determined by the distance d of the code (minimum Hamming distance between any par of code words) To detect e errors d > e+1 To correct e errors d ≥ 2e+1 11/19/2018 Lecture 17
27
11/19/2018 Lecture 17
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.