Lecture 2 Shannon’s Theory Lecturer: Meysam Alishahi Design By: Z. Faraji and H. Hajiabolhassan 9/22/2018
Plan Introduction Perfect secrecy Shannon’s theorem Entropy Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
Claude Shanon (Shannon 1949) In this chapter, we discuss several of Shannon’s ideas about “secrecy systems”. 9/22/2018
What’s the meaning of security? anyone can lock it the key is needed to unlock 9/22/2018
Security We define some of the most usefule criteria: computational security Provable security Unconditional security 9/22/2018
Elementary Probability Theory Basic properties: Pr: 2 [0,1] Pr() = 1 For disjoint events: Pr(Ai) = i Pr(Ai) [axiomatic definition of probability: take the above three conditions as axioms] Immediate consequences: Pr() = 0, Pr(A) = 1 - Pr(A), A B Pr(A) Pr(B) a Pr(a) = 1 9/22/2018
Random Variables 9/22/2018
Elementary Probability Theory X and Y= discrete random variable pr= probability distribution function pr(x,y)=pr(X=x|Y=y)pr(Y=y) Set pr(x):=pr(X=x). X and Y are said to be independent if pr(x, y) = pr(x)pr(y) for all possible values x of X and y of Y. 9/22/2018
Chain rule: Bayes’ Theorem If Pr(A1,..., An) = Pr(A1)Pr(A2|A1)Pr(A3|A1,A2 )Pr(An|A1,..., An-1) 9/22/2018
Cryptography: Elementary Definitions C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} 9/22/2018
Plan Introduction Perfect secrecy Shannon’s theorem Entropy Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
Perfect Secrecy. If pr(x|y) = pr(x) for all xєP ,yєC The cryptosystem has perfect secrecy. 9/22/2018
Shift Cipher for any Plaintext probability distribution, Suppose the 26 keys in the Shift Cipher are use with equal probability 1/26. for any Plaintext probability distribution, the ShiftCipher has perfect secrecy. 9/22/2018
Shift Cipher proof: We know pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) kєZ26 =1/26∑ pr(X=y-k) On the other hand ∑ pr(X=y-k) = ∑ pr(X=x) kєZ26 xєZ26 so pr(y) = 1/26 . 9/22/2018
Shift Cipher proof: We have that pr(y|x) = pr(K=(y-x) mod 26) = 1/26 Finally pr(x|y) = pr(x) 9/22/2018
Perfect Secrecy. If Pr(x_0)=0 for some x_0 in P then Pr[x_0|y]=Pr[x_0]=0 for all y in C. So we need only consider x in P with Pr(x_0)>0. Pr[x|y]=Pr[x] for all y is equivalent to Pr[y|x]=Pr[y] for all y Reasonable assumption Pr[y]>0 for all y in C. 9/22/2018
Perfect Secrecey When we have a cryptosystem with perfect secrecey; |K|≥|C| |C|≥|P| 9/22/2018
Shannon’s Theorem Suppose (P,C,K,E,D) is a cyptosystem where |K|=|C|=|P|. The cryptosystem povides perfect secrecy if and only if every key is used with equal probability 1/|K|, and for all xєP,yєC there is a unique key K such that ek (x)=y 9/22/2018
Proof We know that |C|=|{ek (x)|kєK}|= |K| So (for all x єP,y єC) there is a unique key k s.t ek (x)=y Assume that P={x_1,…,x_n} By bayes’ theorem pr(xi|y)= (pr(x_i)pr(y|x_i)) / pr(y) = (pr(xi)pr(K=ki)) /pr(y) By perfect secrecy condition pr(K=ki) = pr(y) 9/22/2018
Proof C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} 9/22/2018
A perfectly secret scheme: one-time pad t – a parameter K = P = {0,1}t component-wise xor Gilbert Vernam (1890 –1960) Vernam’s cipher: ek(m) = k xor m dk(c) = k xor c Correctness is trivial: dk(ek(m)) = k xor (k xor m) m 9/22/2018
Observation One time pad can be generalized as follows. Let (G,+) be a group andK = P = C = G. The following is a perfectly secret encryption scheme: e(k,m) = m + k d(k,m) = m – k 9/22/2018
Why the one-time pad is not practical? The key has to be as long as the message. The key cannot be reused This is because: ek(m0) xor ek(m1) = (k xor m0) xor (k xor m1) m0 xor m1 9/22/2018
a KGB one-time pad hidden Practicality? Generally, the one-time pad is not very practical, since: the key has to be as long as the total length of the encrypted messages, it is hard to generate truly random strings. However, it is sometimes used (e.g. in the military applications), because of the following advantages: perfect secrecy, short messages can be encrypted using pencil and paper . a KGB one-time pad hidden in a walnut shell In the 1960s the Americans and the Soviets established a hotline that was encrypted using the one-time pad.(additional advantage: they didn’t need to share their secret encryption methods) 9/22/2018
Plan Introduction Perfect secrecy Shannon’s theorem Entropy Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
Information Theory and Entropy Information theory tries to solve the problem of communicating as much data as possible over a noisy channel Measure of data is entropy Claude Shannon first demonstrated that reliable communication over a noisy channel is possible (jump-started digital age) 9/22/2018
Knowledge and Information Goal: Reasoning with incomplete information! Problem 1: Description of a state of knowledge! 011000101110011100101011100011101001011101000 Problem 2: Updating probabilities when new information becomes available! 9/22/2018
Entropy Suppose we have a random variable X which takes on a finite set of values What is the information gained by an event which takes place according to distribution p(X)? Equivalently, if the event has not (yet) taken place, what is the uncertainty about the outcome? This quantity is called the entropy of X and is denoted by H(X). 01100010111001110010101110001… 9/22/2018
Entropy X = discrete random variable p= probability distribution function 9/22/2018
Entropy: how about n = 3? n = 3 p1 + p2 + p3 = 1 9/22/2018
Entropy Shannon entropy Binary entropy formula Differential entropy 9/22/2018
Symbol Codes AN: all strings of length N A*: all strings of finite length {0,1}3={000,001,010,…,111} {0,1}*={0,1,00,01,10,11,000,001,…} An encoding of X is any mapping f:X {0,1}* f(y): codeword for yεX |f(y)|: the length of codeword 9/22/2018
Notifications We can extend the encoding f by defining f(x1,…, xn)= f(x1)||…||f(xn) xiєX p(x1,…, xn)= p(x1)…p(xn) Since f most be decodeable, it should be injective. 9/22/2018
Definitions An encoding f is a prefix-free encoding if there do not exist x,y єX and z є(0,1}* s.t f (x)= f (y)||z L(f) is the weighted average length of an encoding of X. We define L(f) = ∑ p(x) |f(x)| xєX 9/22/2018
Our problem We are going to find an injective encoding f, that minimizes L(f). 9/22/2018
Huffman’s Encoding X={a,b,c,d,e} a b c d e 0.05 0.10 0.12 0.13 0.60 1 1 0.15 0.12 0.13 0.60 1 0.15 0.25 0.60 1 0.40 0.60 1 a=000 b=001 c=010 d=011 e=1 9/22/2018
Huffman’s algorithm solves our problem… Moreover the encoding f produced by Huffman’s algorithm is prefix-free and H(X)≤L(f) ≤ H(X) +1 9/22/2018
Huffman’s Encoding a=000 b=001 c=010 d=011 e=1 We can see L(f)=1.8 , H(X)=1.7402 9/22/2018
Entropy of a Random Variable 9/22/2018
Choosing Balls Randomly ? What is the best sequence of questions ? What is the average number of questions ? 8 balls: 4 reds, 2 blues, 1 green, 1 purple Draw one randomly 9/22/2018
Choosing Balls Randomly Best set of questions: 8 balls: 4 reds, 2 blues, 1 green, 1 purple yes Red ? 1 question yes Blue ? no 2 questions yes no Green ? 3 questions no Purple 3 questions Huffman Code! 9/22/2018
Choosing Balls Randomly Average number of questions : P( ) x 1 + P( ) x 2 + P( ) x 3 + P( ) x 3 x 1 + x 2 + x 3 + x 3 = 1.75 Entropy = = 1.75 bits Entropy = 9/22/2018
Entropy and Information The amount of information about an event is closely related to its probability of occurrence! Entropy is the expected value of the information! 1 9/22/2018
INFORMATION THEORY Communication theory deals with systems for transmitting information from one point to another. Information theory was born with the discovery of the fundamental laws of data compression and transmission. 9/22/2018
f(αx+ βy) ≤α f(x) + β f(y) Convex Functions A function f : R→R is convex if for all α,β≥0 such that α+ β= 1, we have f(αx+ βy) ≤α f(x) + β f(y) for all x,y∈R. 9/22/2018
Strictly Convex Functions A convex function f : R→R is strictly convex if for all α,β>0 such that α+ β= 1 and x=y we have f(αx+ βy) <α f(x) + β f(y) for all x,y∈R. 9/22/2018
Jensen’s Inequality Lemma. Let f : R →R be a convex function, and let α1, α2 , …, αn be nonnegative real numbers such that Σkαk = 1. Then, for any real numbers x1, x2, …, xn, we have Lemma. Let f be a convex function, and let X be a random variable. Then, f(E[X]) ≤E[f(X)]. 9/22/2018
Entropy (Bounds) When H(X) = 0? Upper bound? if a result of an experiment is known ahead of time necessarily: Upper bound? for || = n: H(X) log2n nothing can be more uncertain than the uniform distribution Entropy increases with message length! 9/22/2018
Poperties of Entropy THEOREM Suppose X is a random variable having probability distribution p1,p2,…,pn ,where pi > 0 and 1 ≤ i ≤ n H(X) ≤ log2n equality holds if and only if pi =1/n, for any 1 ≤ i ≤ n. 9/22/2018
Proof We know H(X) = - ∑ pi log2pi = ∑ pi log2 (1/ pi ) 1≤ i≤n = ∑ pi log2 (1/ pi ) By Jensen’s inequality H(X) ≤ log2 ∑ pi(1/ pi ) = log2 n Equality occurs if and only if pi =1/n , 1≤ i ≤ n. 9/22/2018
Joint Entropy The joint entropy of a pair of discrete random variables X, Y is the amount of information needed on average to specify both their values. 9/22/2018
Theorem. H(X,Y) ≤ H(X)+H(Y) and equality occurs if and only if X,Y are independent random variables. Proof. Let p(X=xi)=pi , p(Y=yj)=qj , p(X=xi,Y=yj)=rij , 1≤ i ≤ m, 1≤ j ≤ n ∑ rij=qj ∑ rij=pi i j 9/22/2018
Proof H(X)+H(Y) = - ∑ pi log2pi - ∑ qj log2qj 1≤ i≤m 1≤ j≤n = - ∑ ∑ rij log2pi - ∑ ∑ rij log2qj i j j i = - ∑ ∑ rij log2piqj (*) i j H(X,Y) =- ∑ ∑ rij log2rij (**) i j H(X,Y)-H(X)-H(Y)= ∑ ∑ rij log2(1/rij) +∑ ∑ rij log2(piqj) i j i j =∑ ∑ rij log2(piqj/rij) i j 9/22/2018
Proof By Jensen’s inquality H(X,Y)-H(X)-H(Y)≤ log2 ∑ ∑(piqj ) =0 i j In Jensen’s inquality, equality occurs rij =piqj p(xi ,yj)= p(xi) p(yj) 9/22/2018
Conditional Entropy H(X|A)= - ∑ p(X=x|A) log2p(X=x|A) x H(Y|X)= - ∑ ∑ p(x)p(Y=y|X=x) log2p(Y=y|X=x) x y 9/22/2018
The Chain Rule 9/22/2018
The Chain Rule Theorem H(X,Y)=H(X)+H(Y|X) Proof. H(X)+H(Y|X)= - ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i 9/22/2018
The Chain Rule H(X)+H(Y|X)= =- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi)p(yj |xi ) log2p(yj |xi) i i j =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) =- ∑ ∑ p(xi,yj ) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) i j i j =- ∑ ∑ p(xi,yj ) log2p(xi,yj ) =H(X,Y) i j 9/22/2018
Corollary H(X|Y)≤H(X) with equality holds if and only if X and Y are independent. Proof. We know that H(X,Y) ≤ H(X)+H(Y) and H(X,Y)=H(X)+H(Y|X) Hence, H(X|Y)≤H(X) 9/22/2018
Counterfeit Coin We have 12 coins which are similar One of them is forged The forged coin is heavier or lighter than the others. Find the minimum number of weights to recognize the forged coin! http://www.dotsphinx.com/games/forged-coin/play 9/22/2018
Counterfeit Coin 9/22/2018
Oh No! 9/22/2018
Counterfeit Coin 9/22/2018
Counterfeit Coin The answer is 3, find a strategy! Lower Bound: Consider a random ordering for coins Random variable X shows the place of the forged coin and specifies it is lighter or heavier. Assume that the random variables Y, Z, … present the best strategy! Hence, H(X|Y,Z,…)=0 H(X|Y_1,Y_2,…)=H(X,Y_1,Y_2,…)-H(Y_1,Y_2,…) =H(X)-H(Y_1)-H(Y_2|Y_1)-… H(X)=log 24 and H(Y_i|Y_1,Y_2,…Y_i) ≤ log 3. 9/22/2018
Plan Introduction Perfect secrecy Shannon’s theorem Entropy Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C) Theorem Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C) 9/22/2018
Proof We have H(K,P,C)=H(C|K,P)+H(K,P) We know H(C|K,P)=0 So H(K,P,C)=H(K,P) K,P are independent variables Hence H(K,P)=H(K)+H(P) So H(K,P,C)=H(K)+H(P) In a similar fashion H(P|K,C)=0 Hence H(K,P,C)=H(K,C) H(K|C) =H(K,C)-H(C) =H(K,P,C)-H(C) =H(K)+H(P)-H(C) 9/22/2018
Unicity Distance Assume in a given cryptosystem a message is a string: x1,x2,...,xn where xi is in P (xi is a letter or block) Encrypting each xi individually with the same key k, yi = Ek(xi), 1 ≤ I ≤ n How many ciphertext blocks, yi’s, do we need to determine k? 9/22/2018
Defining a Language L: the set of all messages, for n >= 1. “the natural language” p2: (x1,x2) : x1, x2 in P pn: (x1,x2,...,xn), xi in P, so pn L each pi inherits a probability distribution from L (digrams, trigrams, ...) H(pi) makes sense 9/22/2018
Entropy and Redundancy of a Language What is the entropy of a language? What is the redundancy of a language? 9/22/2018
English Language 1 <= HL <= 1.5 in english RL = 1 – HL/log226 H(P) = 4.18 H(P2) = 3.90 RL = 1 – HL/log226 about 75%, depends on HL 9/22/2018
Definition K(y)={kєK|ЭxєPⁿ,p(x)>0,ek (x)=y} The average number of spurious keys sn sn =∑ p(y) (|K(y)|-1) = ∑ p(y)|K(y)|-1 yєCⁿ yєCⁿ 9/22/2018
Theorem Suppose (P,C,K,E,D) is a cryptosystem, where |P|=|C| and keys are chosen equiprobably. Let RL denote the redundancy of the underlying Language. Then given a string of ciphertext of length n, where n is sufficiently large, the expected number of spurious keys,sn , satisfies 9/22/2018
Proof By last theorem; H(K|Cⁿ)=H(K)+H(Pⁿ)-H(Cⁿ) We have H(Pⁿ) ≈ nHL=n(1-RL)log2|P| Certainly H(Cⁿ)≤nlog2|C| If |P|=|C|; H(K|Cⁿ)≥H(K)-nRL log2|p| 1 9/22/2018
Proof H(K|Cⁿ)≥H(K)-nRL log2|p| On the other hand H(K|Cⁿ)= ∑ p(y)H(K|y) yєCⁿ ≤ ∑ p(y)log2|K(y)| ≤ log2∑ p(y) |K(y)| = log2(1+sn) 1 2 1,2 log2(1+sn)≥H(K)-nRL log2 |P| 9/22/2018
Plan Introduction Perfect secrecy Shannon’s theorem Entropy Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
Product Cryptosystems Let P=C , the cryptography is called endomorphism. S1=(P,P,K1,E1,D1) :an endomorphism S2=(P,P,K2,E2,D2) :an endomorphism We define the cryptosystem S1xS2 to (P,P, K1xK2, E1xE2, D1xD2) 9/22/2018
Product Cryptosystems In the S1xS2 product cryptosystem, we have e (k1,k2) (x)= ek2 (ek1(x)) d (k1,k2) (y)= dk1 (dk2(y)) d (k1,k2)(e (k1,k2)(x) )= d(k1,k2)(ek2(ek1 (x))) = dk1(ek1 (x))=x 9/22/2018
Multiplicative cipher P=C=Z26 , K= {aєZ26|(a,26)=1} and for any a є K; we have ea(x)=ax (mod 26) da (x)=(1/a)x (mod 26) (x,yєZ26) 9/22/2018
Theorem M:Multiplicative cipher SxM:Affinecipher S: Affine cipher 9/22/2018
Proof Let M=(P,P,K1,E1,D1) , S=(P,P,K2,E2,D2) , aєK1,k єK2 ,xєP e(k,a) (x)=a(x+k) mod 26 =(ax+ak) mod 26 So key (k,a) of SxM Ξ key (a,ak) of S 9/22/2018
Each key is equiprobable Proof On the other hand ak=k1 k=(1/a)k1 Hence key (a,k1) of S Ξ key ((1/a)k1 ,a) of SxM SxM is Affine cipher (a,26)=1 Each key is equiprobable 9/22/2018
Popertis of product cryptosystems S , S1 , S2 = cryptosystems S1xS2 = S2xS1 S1,S2 commute. S=Sⁿ S is an idempotent cryptosystem. 9/22/2018
The End 9/22/2018