Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2 Shannon’s Theory

Similar presentations


Presentation on theme: "Lecture 2 Shannon’s Theory"— Presentation transcript:

1 Lecture 2 Shannon’s Theory
Lecturer: Meysam Alishahi Design By: Z. Faraji and H. Hajiabolhassan 9/22/2018

2 Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018

3 Claude Shanon (Shannon 1949)
In this chapter, we discuss several of Shannon’s ideas about “secrecy systems”. 9/22/2018

4 What’s the meaning of security?
anyone can lock it the key is needed to unlock 9/22/2018

5 Security We define some of the most usefule criteria:
computational security Provable security Unconditional security 9/22/2018

6 Elementary Probability Theory
Basic properties: Pr: 2  [0,1] Pr() = 1 For disjoint events: Pr(Ai) = i Pr(Ai) [axiomatic definition of probability: take the above three conditions as axioms] Immediate consequences: Pr() = 0, Pr(A) = 1 - Pr(A), A  B  Pr(A)  Pr(B) a Pr(a) = 1 9/22/2018

7 Random Variables 9/22/2018

8 Elementary Probability Theory
X and Y= discrete random variable pr= probability distribution function pr(x,y)=pr(X=x|Y=y)pr(Y=y) Set pr(x):=pr(X=x). X and Y are said to be independent if pr(x, y) = pr(x)pr(y) for all possible values x of X and y of Y. 9/22/2018

9 Chain rule: Bayes’ Theorem If
Pr(A1,..., An) = Pr(A1)Pr(A2|A1)Pr(A3|A1,A2 )Pr(An|A1,..., An-1) 9/22/2018

10 Cryptography: Elementary Definitions
C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) {k:y  C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y  C(k)} 9/22/2018

11 Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018

12 Perfect Secrecy. If pr(x|y) = pr(x) for all xєP ,yєC The cryptosystem has perfect secrecy. 9/22/2018

13 Shift Cipher for any Plaintext probability distribution,
Suppose the 26 keys in the Shift Cipher are use with equal probability 1/26. for any Plaintext probability distribution, the ShiftCipher has perfect secrecy. 9/22/2018

14 Shift Cipher proof: We know pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) kєZ26 =1/26∑ pr(X=y-k) On the other hand ∑ pr(X=y-k) = ∑ pr(X=x) kєZ26 xєZ26 so pr(y) = 1/26 . 9/22/2018

15 Shift Cipher proof: We have that pr(y|x) = pr(K=(y-x) mod 26) = 1/26 Finally pr(x|y) = pr(x) 9/22/2018

16 Perfect Secrecy. If Pr(x_0)=0 for some x_0 in P then Pr[x_0|y]=Pr[x_0]=0 for all y in C. So we need only consider x in P with Pr(x_0)>0. Pr[x|y]=Pr[x] for all y is equivalent to Pr[y|x]=Pr[y] for all y Reasonable assumption Pr[y]>0 for all y in C. 9/22/2018

17 Perfect Secrecey When we have a cryptosystem with perfect secrecey;
|K|≥|C| |C|≥|P| 9/22/2018

18 Shannon’s Theorem Suppose (P,C,K,E,D) is a cyptosystem where |K|=|C|=|P|. The cryptosystem povides perfect secrecy if and only if every key is used with equal probability 1/|K|, and for all xєP,yєC there is a unique key K such that ek (x)=y 9/22/2018

19 Proof We know that |C|=|{ek (x)|kєK}|= |K| So (for all x єP,y єC) there is a unique key k s.t ek (x)=y Assume that P={x_1,…,x_n} By bayes’ theorem pr(xi|y)= (pr(x_i)pr(y|x_i)) / pr(y) = (pr(xi)pr(K=ki)) /pr(y) By perfect secrecy condition pr(K=ki) = pr(y) 9/22/2018

20 Proof C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y))
{k:y  C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y  C(k)} 9/22/2018

21 A perfectly secret scheme: one-time pad
t – a parameter K = P = {0,1}t component-wise xor Gilbert Vernam (1890 –1960) Vernam’s cipher: ek(m) = k xor m dk(c) = k xor c Correctness is trivial: dk(ek(m)) = k xor (k xor m) m 9/22/2018

22 Observation One time pad can be generalized as follows.
Let (G,+) be a group andK = P = C = G. The following is a perfectly secret encryption scheme: e(k,m) = m + k d(k,m) = m – k 9/22/2018

23 Why the one-time pad is not practical?
The key has to be as long as the message. The key cannot be reused This is because: ek(m0) xor ek(m1) = (k xor m0) xor (k xor m1) m0 xor m1 9/22/2018

24 a KGB one-time pad hidden
Practicality? Generally, the one-time pad is not very practical, since: the key has to be as long as the total length of the encrypted messages, it is hard to generate truly random strings. However, it is sometimes used (e.g. in the military applications), because of the following advantages: perfect secrecy, short messages can be encrypted using pencil and paper . a KGB one-time pad hidden in a walnut shell In the 1960s the Americans and the Soviets established a hotline that was encrypted using the one-time pad.(additional advantage: they didn’t need to share their secret encryption methods) 9/22/2018

25 Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018

26 Information Theory and Entropy
Information theory tries to solve the problem of communicating as much data as possible over a noisy channel Measure of data is entropy Claude Shannon first demonstrated that reliable communication over a noisy channel is possible (jump-started digital age) 9/22/2018

27 Knowledge and Information
Goal: Reasoning with incomplete information! Problem 1: Description of a state of knowledge! Problem 2: Updating probabilities when new information becomes available! 9/22/2018

28 Entropy Suppose we have a random variable X which takes on a finite set of values What is the information gained by an event which takes place according to distribution p(X)? Equivalently, if the event has not (yet) taken place, what is the uncertainty about the outcome? This quantity is called the entropy of X and is denoted by H(X). 9/22/2018

29 Entropy X = discrete random variable p= probability distribution function 9/22/2018

30 Entropy: how about n = 3? n = 3 p1 + p2 + p3 = 1 9/22/2018

31 Entropy Shannon entropy Binary entropy formula Differential entropy
9/22/2018

32 Symbol Codes AN: all strings of length N
A*: all strings of finite length {0,1}3={000,001,010,…,111} {0,1}*={0,1,00,01,10,11,000,001,…} An encoding of X is any mapping f:X {0,1}* f(y): codeword for yεX |f(y)|: the length of codeword 9/22/2018

33 Notifications We can extend the encoding f by defining
f(x1,…, xn)= f(x1)||…||f(xn) xiєX p(x1,…, xn)= p(x1)…p(xn) Since f most be decodeable, it should be injective. 9/22/2018

34 Definitions An encoding f is a prefix-free encoding if
there do not exist x,y єX and z є(0,1}* s.t f (x)= f (y)||z L(f) is the weighted average length of an encoding of X. We define L(f) = ∑ p(x) |f(x)| xєX 9/22/2018

35 Our problem We are going to find an injective encoding f, that minimizes L(f). 9/22/2018

36 Huffman’s Encoding X={a,b,c,d,e} a b c d e 0.05 0.10 0.12 0.13 0.60 1
1 0.15 0.12 0.13 0.60 1 0.15 0.25 0.60 1 0.40 0.60 1 a= b= c= d= e=1 9/22/2018

37 Huffman’s algorithm solves our problem…
Moreover the encoding f produced by Huffman’s algorithm is prefix-free and H(X)≤L(f) ≤ H(X) +1 9/22/2018

38 Huffman’s Encoding a=000 b=001 c=010 d=011 e=1
We can see L(f)=1.8 , H(X)=1.7402 9/22/2018

39 Entropy of a Random Variable
9/22/2018

40 Choosing Balls Randomly
? What is the best sequence of questions ? What is the average number of questions ? 8 balls: 4 reds, 2 blues, 1 green, 1 purple Draw one randomly 9/22/2018

41 Choosing Balls Randomly
Best set of questions: 8 balls: 4 reds, 2 blues, 1 green, 1 purple yes Red ? 1 question yes Blue ? no 2 questions yes no Green ? 3 questions no Purple 3 questions Huffman Code! 9/22/2018

42 Choosing Balls Randomly
Average number of questions : P( ) x 1 + P( ) x 2 + P( ) x 3 + P( ) x 3 x x x x 3 = 1.75 Entropy = = 1.75 bits Entropy = 9/22/2018

43 Entropy and Information
The amount of information about an event is closely related to its probability of occurrence! Entropy is the expected value of the information! 1 9/22/2018

44 INFORMATION THEORY Communication theory deals with systems for transmitting information from one point to another. Information theory was born with the discovery of the fundamental laws of data compression and transmission. 9/22/2018

45 f(αx+ βy) ≤α f(x) + β f(y)
Convex Functions A function f : R→R is convex if for all α,β≥0 such that α+ β= 1, we have f(αx+ βy) ≤α f(x) + β f(y) for all x,y∈R. 9/22/2018

46 Strictly Convex Functions
A convex function f : R→R is strictly convex if for all α,β>0 such that α+ β= 1 and x=y we have f(αx+ βy) <α f(x) + β f(y) for all x,y∈R. 9/22/2018

47 Jensen’s Inequality Lemma. Let f : R →R be a convex function, and let α1, α2 , …, αn be nonnegative real numbers such that Σkαk = 1. Then, for any real numbers x1, x2, …, xn, we have Lemma. Let f be a convex function, and let X be a random variable. Then, f(E[X]) ≤E[f(X)]. 9/22/2018

48 Entropy (Bounds) When H(X) = 0? Upper bound?
if a result of an experiment is known ahead of time necessarily: Upper bound? for || = n: H(X)  log2n nothing can be more uncertain than the uniform distribution Entropy increases with message length! 9/22/2018

49 Poperties of Entropy THEOREM Suppose X is a random variable having probability distribution p1,p2,…,pn ,where pi > 0 and 1 ≤ i ≤ n H(X) ≤ log2n equality holds if and only if pi =1/n, for any 1 ≤ i ≤ n. 9/22/2018

50 Proof We know H(X) = - ∑ pi log2pi = ∑ pi log2 (1/ pi )
1≤ i≤n = ∑ pi log2 (1/ pi ) By Jensen’s inequality H(X) ≤ log2 ∑ pi(1/ pi ) = log2 n Equality occurs if and only if pi =1/n , 1≤ i ≤ n. 9/22/2018

51 Joint Entropy The joint entropy of a pair of discrete random variables X, Y is the amount of information needed on average to specify both their values. 9/22/2018

52 Theorem. H(X,Y) ≤ H(X)+H(Y) and equality occurs if and only if
X,Y are independent random variables. Proof. Let p(X=xi)=pi , p(Y=yj)=qj , p(X=xi,Y=yj)=rij , 1≤ i ≤ m, 1≤ j ≤ n ∑ rij=qj ∑ rij=pi i j 9/22/2018

53 Proof H(X)+H(Y) = - ∑ pi log2pi - ∑ qj log2qj 1≤ i≤m 1≤ j≤n
= - ∑ ∑ rij log2pi - ∑ ∑ rij log2qj i j j i = - ∑ ∑ rij log2piqj (*) i j H(X,Y) =- ∑ ∑ rij log2rij (**) i j H(X,Y)-H(X)-H(Y)= ∑ ∑ rij log2(1/rij) +∑ ∑ rij log2(piqj) i j i j =∑ ∑ rij log2(piqj/rij) i j 9/22/2018

54 Proof By Jensen’s inquality H(X,Y)-H(X)-H(Y)≤ log2 ∑ ∑(piqj ) =0 i j In Jensen’s inquality, equality occurs rij =piqj p(xi ,yj)= p(xi) p(yj) 9/22/2018

55 Conditional Entropy H(X|A)= - ∑ p(X=x|A) log2p(X=x|A) x
H(Y|X)= - ∑ ∑ p(x)p(Y=y|X=x) log2p(Y=y|X=x) x y 9/22/2018

56 The Chain Rule 9/22/2018

57 The Chain Rule Theorem H(X,Y)=H(X)+H(Y|X) Proof. H(X)+H(Y|X)=
- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i 9/22/2018

58 The Chain Rule H(X)+H(Y|X)=
=- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi)p(yj |xi ) log2p(yj |xi) i i j =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) =- ∑ ∑ p(xi,yj ) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) i j i j =- ∑ ∑ p(xi,yj ) log2p(xi,yj ) =H(X,Y) i j 9/22/2018

59 Corollary H(X|Y)≤H(X) with equality holds if and only if X and Y are independent. Proof. We know that H(X,Y) ≤ H(X)+H(Y) and H(X,Y)=H(X)+H(Y|X) Hence, H(X|Y)≤H(X) 9/22/2018

60 Counterfeit Coin We have 12 coins which are similar
One of them is forged The forged coin is heavier or lighter than the others. Find the minimum number of weights to recognize the forged coin! 9/22/2018

61 Counterfeit Coin 9/22/2018

62 Oh No! 9/22/2018

63 Counterfeit Coin 9/22/2018

64 Counterfeit Coin The answer is 3, find a strategy! Lower Bound:
Consider a random ordering for coins Random variable X shows the place of the forged coin and specifies it is lighter or heavier. Assume that the random variables Y, Z, … present the best strategy! Hence, H(X|Y,Z,…)=0 H(X|Y_1,Y_2,…)=H(X,Y_1,Y_2,…)-H(Y_1,Y_2,…) =H(X)-H(Y_1)-H(Y_2|Y_1)-… H(X)=log 24 and H(Y_i|Y_1,Y_2,…Y_i) ≤ log 3. 9/22/2018

65 Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018

66 Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C)
Theorem Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C) 9/22/2018

67 Proof We have H(K,P,C)=H(C|K,P)+H(K,P) We know H(C|K,P)=0
So H(K,P,C)=H(K,P) K,P are independent variables Hence H(K,P)=H(K)+H(P) So H(K,P,C)=H(K)+H(P) In a similar fashion H(P|K,C)=0 Hence H(K,P,C)=H(K,C) H(K|C) =H(K,C)-H(C) =H(K,P,C)-H(C) =H(K)+H(P)-H(C) 9/22/2018

68 Unicity Distance Assume in a given cryptosystem a message is a string: x1,x2,...,xn where xi is in P (xi is a letter or block) Encrypting each xi individually with the same key k, yi = Ek(xi), 1 ≤ I ≤ n How many ciphertext blocks, yi’s, do we need to determine k? 9/22/2018

69 Defining a Language L: the set of all messages, for n >= 1.
“the natural language” p2: (x1,x2) : x1, x2 in P pn: (x1,x2,...,xn), xi in P, so pn  L each pi inherits a probability distribution from L (digrams, trigrams, ...) H(pi) makes sense 9/22/2018

70 Entropy and Redundancy of a Language
What is the entropy of a language? What is the redundancy of a language? 9/22/2018

71 English Language 1 <= HL <= 1.5 in english RL = 1 – HL/log226
H(P) = 4.18 H(P2) = 3.90 RL = 1 – HL/log226 about 75%, depends on HL 9/22/2018

72 Definition K(y)={kєK|ЭxєPⁿ,p(x)>0,ek (x)=y}
The average number of spurious keys sn sn =∑ p(y) (|K(y)|-1) = ∑ p(y)|K(y)|-1 yєCⁿ yєCⁿ 9/22/2018

73 Theorem Suppose (P,C,K,E,D) is a cryptosystem, where |P|=|C| and keys are chosen equiprobably. Let RL denote the redundancy of the underlying Language. Then given a string of ciphertext of length n, where n is sufficiently large, the expected number of spurious keys,sn , satisfies 9/22/2018

74 Proof By last theorem; H(K|Cⁿ)=H(K)+H(Pⁿ)-H(Cⁿ) We have H(Pⁿ) ≈ nHL=n(1-RL)log2|P| Certainly H(Cⁿ)≤nlog2|C| If |P|=|C|; H(K|Cⁿ)≥H(K)-nRL log2|p| 1 9/22/2018

75 Proof H(K|Cⁿ)≥H(K)-nRL log2|p| On the other hand H(K|Cⁿ)= ∑ p(y)H(K|y)
yєCⁿ ≤ ∑ p(y)log2|K(y)| ≤ log2∑ p(y) |K(y)| = log2(1+sn) 1 2 1,2 log2(1+sn)≥H(K)-nRL log2 |P| 9/22/2018

76 Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018

77 Product Cryptosystems
Let P=C , the cryptography is called endomorphism. S1=(P,P,K1,E1,D1) :an endomorphism S2=(P,P,K2,E2,D2) :an endomorphism We define the cryptosystem S1xS2 to (P,P, K1xK2, E1xE2, D1xD2) 9/22/2018

78 Product Cryptosystems
In the S1xS2 product cryptosystem, we have e (k1,k2) (x)= ek2 (ek1(x)) d (k1,k2) (y)= dk1 (dk2(y)) d (k1,k2)(e (k1,k2)(x) )= d(k1,k2)(ek2(ek1 (x))) = dk1(ek1 (x))=x 9/22/2018

79 Multiplicative cipher
P=C=Z26 , K= {aєZ26|(a,26)=1} and for any a є K; we have ea(x)=ax (mod 26) da (x)=(1/a)x (mod 26) (x,yєZ26) 9/22/2018

80 Theorem M:Multiplicative cipher SxM:Affinecipher S: Affine cipher
9/22/2018

81 Proof Let M=(P,P,K1,E1,D1) , S=(P,P,K2,E2,D2) , aєK1,k єK2 ,xєP e(k,a) (x)=a(x+k) mod 26 =(ax+ak) mod 26 So key (k,a) of SxM Ξ key (a,ak) of S 9/22/2018

82 Each key is equiprobable
Proof On the other hand ak=k1 k=(1/a)k1 Hence key (a,k1) of S Ξ key ((1/a)k1 ,a) of SxM SxM is Affine cipher (a,26)=1 Each key is equiprobable 9/22/2018

83 Popertis of product cryptosystems
S , S1 , S2 = cryptosystems S1xS2 = S2xS S1,S2 commute. S=Sⁿ S is an idempotent cryptosystem. 9/22/2018

84 The End 9/22/2018


Download ppt "Lecture 2 Shannon’s Theory"

Similar presentations


Ads by Google