Download presentation
Presentation is loading. Please wait.
1
Lecture 2 Shannon’s Theory
Lecturer: Meysam Alishahi Design By: Z. Faraji and H. Hajiabolhassan 9/22/2018
2
Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
3
Claude Shanon (Shannon 1949)
In this chapter, we discuss several of Shannon’s ideas about “secrecy systems”. 9/22/2018
4
What’s the meaning of security?
anyone can lock it the key is needed to unlock 9/22/2018
5
Security We define some of the most usefule criteria:
computational security Provable security Unconditional security 9/22/2018
6
Elementary Probability Theory
Basic properties: Pr: 2 [0,1] Pr() = 1 For disjoint events: Pr(Ai) = i Pr(Ai) [axiomatic definition of probability: take the above three conditions as axioms] Immediate consequences: Pr() = 0, Pr(A) = 1 - Pr(A), A B Pr(A) Pr(B) a Pr(a) = 1 9/22/2018
7
Random Variables 9/22/2018
8
Elementary Probability Theory
X and Y= discrete random variable pr= probability distribution function pr(x,y)=pr(X=x|Y=y)pr(Y=y) Set pr(x):=pr(X=x). X and Y are said to be independent if pr(x, y) = pr(x)pr(y) for all possible values x of X and y of Y. 9/22/2018
9
Chain rule: Bayes’ Theorem If
Pr(A1,..., An) = Pr(A1)Pr(A2|A1)Pr(A3|A1,A2 )Pr(An|A1,..., An-1) 9/22/2018
10
Cryptography: Elementary Definitions
C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} 9/22/2018
11
Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
12
Perfect Secrecy. If pr(x|y) = pr(x) for all xєP ,yєC The cryptosystem has perfect secrecy. 9/22/2018
13
Shift Cipher for any Plaintext probability distribution,
Suppose the 26 keys in the Shift Cipher are use with equal probability 1/26. for any Plaintext probability distribution, the ShiftCipher has perfect secrecy. 9/22/2018
14
Shift Cipher proof: We know pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) kєZ26 =1/26∑ pr(X=y-k) On the other hand ∑ pr(X=y-k) = ∑ pr(X=x) kєZ26 xєZ26 so pr(y) = 1/26 . 9/22/2018
15
Shift Cipher proof: We have that pr(y|x) = pr(K=(y-x) mod 26) = 1/26 Finally pr(x|y) = pr(x) 9/22/2018
16
Perfect Secrecy. If Pr(x_0)=0 for some x_0 in P then Pr[x_0|y]=Pr[x_0]=0 for all y in C. So we need only consider x in P with Pr(x_0)>0. Pr[x|y]=Pr[x] for all y is equivalent to Pr[y|x]=Pr[y] for all y Reasonable assumption Pr[y]>0 for all y in C. 9/22/2018
17
Perfect Secrecey When we have a cryptosystem with perfect secrecey;
|K|≥|C| |C|≥|P| 9/22/2018
18
Shannon’s Theorem Suppose (P,C,K,E,D) is a cyptosystem where |K|=|C|=|P|. The cryptosystem povides perfect secrecy if and only if every key is used with equal probability 1/|K|, and for all xєP,yєC there is a unique key K such that ek (x)=y 9/22/2018
19
Proof We know that |C|=|{ek (x)|kєK}|= |K| So (for all x єP,y єC) there is a unique key k s.t ek (x)=y Assume that P={x_1,…,x_n} By bayes’ theorem pr(xi|y)= (pr(x_i)pr(y|x_i)) / pr(y) = (pr(xi)pr(K=ki)) /pr(y) By perfect secrecy condition pr(K=ki) = pr(y) 9/22/2018
20
Proof C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y))
{k:y C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y C(k)} 9/22/2018
21
A perfectly secret scheme: one-time pad
t – a parameter K = P = {0,1}t component-wise xor Gilbert Vernam (1890 –1960) Vernam’s cipher: ek(m) = k xor m dk(c) = k xor c Correctness is trivial: dk(ek(m)) = k xor (k xor m) m 9/22/2018
22
Observation One time pad can be generalized as follows.
Let (G,+) be a group andK = P = C = G. The following is a perfectly secret encryption scheme: e(k,m) = m + k d(k,m) = m – k 9/22/2018
23
Why the one-time pad is not practical?
The key has to be as long as the message. The key cannot be reused This is because: ek(m0) xor ek(m1) = (k xor m0) xor (k xor m1) m0 xor m1 9/22/2018
24
a KGB one-time pad hidden
Practicality? Generally, the one-time pad is not very practical, since: the key has to be as long as the total length of the encrypted messages, it is hard to generate truly random strings. However, it is sometimes used (e.g. in the military applications), because of the following advantages: perfect secrecy, short messages can be encrypted using pencil and paper . a KGB one-time pad hidden in a walnut shell In the 1960s the Americans and the Soviets established a hotline that was encrypted using the one-time pad.(additional advantage: they didn’t need to share their secret encryption methods) 9/22/2018
25
Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
26
Information Theory and Entropy
Information theory tries to solve the problem of communicating as much data as possible over a noisy channel Measure of data is entropy Claude Shannon first demonstrated that reliable communication over a noisy channel is possible (jump-started digital age) 9/22/2018
27
Knowledge and Information
Goal: Reasoning with incomplete information! Problem 1: Description of a state of knowledge! Problem 2: Updating probabilities when new information becomes available! 9/22/2018
28
Entropy Suppose we have a random variable X which takes on a finite set of values What is the information gained by an event which takes place according to distribution p(X)? Equivalently, if the event has not (yet) taken place, what is the uncertainty about the outcome? This quantity is called the entropy of X and is denoted by H(X). … 9/22/2018
29
Entropy X = discrete random variable p= probability distribution function 9/22/2018
30
Entropy: how about n = 3? n = 3 p1 + p2 + p3 = 1 9/22/2018
31
Entropy Shannon entropy Binary entropy formula Differential entropy
9/22/2018
32
Symbol Codes AN: all strings of length N
A*: all strings of finite length {0,1}3={000,001,010,…,111} {0,1}*={0,1,00,01,10,11,000,001,…} An encoding of X is any mapping f:X {0,1}* f(y): codeword for yεX |f(y)|: the length of codeword 9/22/2018
33
Notifications We can extend the encoding f by defining
f(x1,…, xn)= f(x1)||…||f(xn) xiєX p(x1,…, xn)= p(x1)…p(xn) Since f most be decodeable, it should be injective. 9/22/2018
34
Definitions An encoding f is a prefix-free encoding if
there do not exist x,y єX and z є(0,1}* s.t f (x)= f (y)||z L(f) is the weighted average length of an encoding of X. We define L(f) = ∑ p(x) |f(x)| xєX 9/22/2018
35
Our problem We are going to find an injective encoding f, that minimizes L(f). 9/22/2018
36
Huffman’s Encoding X={a,b,c,d,e} a b c d e 0.05 0.10 0.12 0.13 0.60 1
1 0.15 0.12 0.13 0.60 1 0.15 0.25 0.60 1 0.40 0.60 1 a= b= c= d= e=1 9/22/2018
37
Huffman’s algorithm solves our problem…
Moreover the encoding f produced by Huffman’s algorithm is prefix-free and H(X)≤L(f) ≤ H(X) +1 9/22/2018
38
Huffman’s Encoding a=000 b=001 c=010 d=011 e=1
We can see L(f)=1.8 , H(X)=1.7402 9/22/2018
39
Entropy of a Random Variable
9/22/2018
40
Choosing Balls Randomly
? What is the best sequence of questions ? What is the average number of questions ? 8 balls: 4 reds, 2 blues, 1 green, 1 purple Draw one randomly 9/22/2018
41
Choosing Balls Randomly
Best set of questions: 8 balls: 4 reds, 2 blues, 1 green, 1 purple yes Red ? 1 question yes Blue ? no 2 questions yes no Green ? 3 questions no Purple 3 questions Huffman Code! 9/22/2018
42
Choosing Balls Randomly
Average number of questions : P( ) x 1 + P( ) x 2 + P( ) x 3 + P( ) x 3 x x x x 3 = 1.75 Entropy = = 1.75 bits Entropy = 9/22/2018
43
Entropy and Information
The amount of information about an event is closely related to its probability of occurrence! Entropy is the expected value of the information! 1 9/22/2018
44
INFORMATION THEORY Communication theory deals with systems for transmitting information from one point to another. Information theory was born with the discovery of the fundamental laws of data compression and transmission. 9/22/2018
45
f(αx+ βy) ≤α f(x) + β f(y)
Convex Functions A function f : R→R is convex if for all α,β≥0 such that α+ β= 1, we have f(αx+ βy) ≤α f(x) + β f(y) for all x,y∈R. 9/22/2018
46
Strictly Convex Functions
A convex function f : R→R is strictly convex if for all α,β>0 such that α+ β= 1 and x=y we have f(αx+ βy) <α f(x) + β f(y) for all x,y∈R. 9/22/2018
47
Jensen’s Inequality Lemma. Let f : R →R be a convex function, and let α1, α2 , …, αn be nonnegative real numbers such that Σkαk = 1. Then, for any real numbers x1, x2, …, xn, we have Lemma. Let f be a convex function, and let X be a random variable. Then, f(E[X]) ≤E[f(X)]. 9/22/2018
48
Entropy (Bounds) When H(X) = 0? Upper bound?
if a result of an experiment is known ahead of time necessarily: Upper bound? for || = n: H(X) log2n nothing can be more uncertain than the uniform distribution Entropy increases with message length! 9/22/2018
49
Poperties of Entropy THEOREM Suppose X is a random variable having probability distribution p1,p2,…,pn ,where pi > 0 and 1 ≤ i ≤ n H(X) ≤ log2n equality holds if and only if pi =1/n, for any 1 ≤ i ≤ n. 9/22/2018
50
Proof We know H(X) = - ∑ pi log2pi = ∑ pi log2 (1/ pi )
1≤ i≤n = ∑ pi log2 (1/ pi ) By Jensen’s inequality H(X) ≤ log2 ∑ pi(1/ pi ) = log2 n Equality occurs if and only if pi =1/n , 1≤ i ≤ n. 9/22/2018
51
Joint Entropy The joint entropy of a pair of discrete random variables X, Y is the amount of information needed on average to specify both their values. 9/22/2018
52
Theorem. H(X,Y) ≤ H(X)+H(Y) and equality occurs if and only if
X,Y are independent random variables. Proof. Let p(X=xi)=pi , p(Y=yj)=qj , p(X=xi,Y=yj)=rij , 1≤ i ≤ m, 1≤ j ≤ n ∑ rij=qj ∑ rij=pi i j 9/22/2018
53
Proof H(X)+H(Y) = - ∑ pi log2pi - ∑ qj log2qj 1≤ i≤m 1≤ j≤n
= - ∑ ∑ rij log2pi - ∑ ∑ rij log2qj i j j i = - ∑ ∑ rij log2piqj (*) i j H(X,Y) =- ∑ ∑ rij log2rij (**) i j H(X,Y)-H(X)-H(Y)= ∑ ∑ rij log2(1/rij) +∑ ∑ rij log2(piqj) i j i j =∑ ∑ rij log2(piqj/rij) i j 9/22/2018
54
Proof By Jensen’s inquality H(X,Y)-H(X)-H(Y)≤ log2 ∑ ∑(piqj ) =0 i j In Jensen’s inquality, equality occurs rij =piqj p(xi ,yj)= p(xi) p(yj) 9/22/2018
55
Conditional Entropy H(X|A)= - ∑ p(X=x|A) log2p(X=x|A) x
H(Y|X)= - ∑ ∑ p(x)p(Y=y|X=x) log2p(Y=y|X=x) x y 9/22/2018
56
The Chain Rule 9/22/2018
57
The Chain Rule Theorem H(X,Y)=H(X)+H(Y|X) Proof. H(X)+H(Y|X)=
- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i 9/22/2018
58
The Chain Rule H(X)+H(Y|X)=
=- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi)p(yj |xi ) log2p(yj |xi) i i j =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) =- ∑ ∑ p(xi,yj ) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) i j i j =- ∑ ∑ p(xi,yj ) log2p(xi,yj ) =H(X,Y) i j 9/22/2018
59
Corollary H(X|Y)≤H(X) with equality holds if and only if X and Y are independent. Proof. We know that H(X,Y) ≤ H(X)+H(Y) and H(X,Y)=H(X)+H(Y|X) Hence, H(X|Y)≤H(X) 9/22/2018
60
Counterfeit Coin We have 12 coins which are similar
One of them is forged The forged coin is heavier or lighter than the others. Find the minimum number of weights to recognize the forged coin! 9/22/2018
61
Counterfeit Coin 9/22/2018
62
Oh No! 9/22/2018
63
Counterfeit Coin 9/22/2018
64
Counterfeit Coin The answer is 3, find a strategy! Lower Bound:
Consider a random ordering for coins Random variable X shows the place of the forged coin and specifies it is lighter or heavier. Assume that the random variables Y, Z, … present the best strategy! Hence, H(X|Y,Z,…)=0 H(X|Y_1,Y_2,…)=H(X,Y_1,Y_2,…)-H(Y_1,Y_2,…) =H(X)-H(Y_1)-H(Y_2|Y_1)-… H(X)=log 24 and H(Y_i|Y_1,Y_2,…Y_i) ≤ log 3. 9/22/2018
65
Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
66
Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C)
Theorem Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C) 9/22/2018
67
Proof We have H(K,P,C)=H(C|K,P)+H(K,P) We know H(C|K,P)=0
So H(K,P,C)=H(K,P) K,P are independent variables Hence H(K,P)=H(K)+H(P) So H(K,P,C)=H(K)+H(P) In a similar fashion H(P|K,C)=0 Hence H(K,P,C)=H(K,C) H(K|C) =H(K,C)-H(C) =H(K,P,C)-H(C) =H(K)+H(P)-H(C) 9/22/2018
68
Unicity Distance Assume in a given cryptosystem a message is a string: x1,x2,...,xn where xi is in P (xi is a letter or block) Encrypting each xi individually with the same key k, yi = Ek(xi), 1 ≤ I ≤ n How many ciphertext blocks, yi’s, do we need to determine k? 9/22/2018
69
Defining a Language L: the set of all messages, for n >= 1.
“the natural language” p2: (x1,x2) : x1, x2 in P pn: (x1,x2,...,xn), xi in P, so pn L each pi inherits a probability distribution from L (digrams, trigrams, ...) H(pi) makes sense 9/22/2018
70
Entropy and Redundancy of a Language
What is the entropy of a language? What is the redundancy of a language? 9/22/2018
71
English Language 1 <= HL <= 1.5 in english RL = 1 – HL/log226
H(P) = 4.18 H(P2) = 3.90 RL = 1 – HL/log226 about 75%, depends on HL 9/22/2018
72
Definition K(y)={kєK|ЭxєPⁿ,p(x)>0,ek (x)=y}
The average number of spurious keys sn sn =∑ p(y) (|K(y)|-1) = ∑ p(y)|K(y)|-1 yєCⁿ yєCⁿ 9/22/2018
73
Theorem Suppose (P,C,K,E,D) is a cryptosystem, where |P|=|C| and keys are chosen equiprobably. Let RL denote the redundancy of the underlying Language. Then given a string of ciphertext of length n, where n is sufficiently large, the expected number of spurious keys,sn , satisfies 9/22/2018
74
Proof By last theorem; H(K|Cⁿ)=H(K)+H(Pⁿ)-H(Cⁿ) We have H(Pⁿ) ≈ nHL=n(1-RL)log2|P| Certainly H(Cⁿ)≤nlog2|C| If |P|=|C|; H(K|Cⁿ)≥H(K)-nRL log2|p| 1 9/22/2018
75
Proof H(K|Cⁿ)≥H(K)-nRL log2|p| On the other hand H(K|Cⁿ)= ∑ p(y)H(K|y)
yєCⁿ ≤ ∑ p(y)log2|K(y)| ≤ log2∑ p(y) |K(y)| = log2(1+sn) 1 2 1,2 log2(1+sn)≥H(K)-nRL log2 |P| 9/22/2018
76
Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018
77
Product Cryptosystems
Let P=C , the cryptography is called endomorphism. S1=(P,P,K1,E1,D1) :an endomorphism S2=(P,P,K2,E2,D2) :an endomorphism We define the cryptosystem S1xS2 to (P,P, K1xK2, E1xE2, D1xD2) 9/22/2018
78
Product Cryptosystems
In the S1xS2 product cryptosystem, we have e (k1,k2) (x)= ek2 (ek1(x)) d (k1,k2) (y)= dk1 (dk2(y)) d (k1,k2)(e (k1,k2)(x) )= d(k1,k2)(ek2(ek1 (x))) = dk1(ek1 (x))=x 9/22/2018
79
Multiplicative cipher
P=C=Z26 , K= {aєZ26|(a,26)=1} and for any a є K; we have ea(x)=ax (mod 26) da (x)=(1/a)x (mod 26) (x,yєZ26) 9/22/2018
80
Theorem M:Multiplicative cipher SxM:Affinecipher S: Affine cipher
9/22/2018
81
Proof Let M=(P,P,K1,E1,D1) , S=(P,P,K2,E2,D2) , aєK1,k єK2 ,xєP e(k,a) (x)=a(x+k) mod 26 =(ax+ak) mod 26 So key (k,a) of SxM Ξ key (a,ak) of S 9/22/2018
82
Each key is equiprobable
Proof On the other hand ak=k1 k=(1/a)k1 Hence key (a,k1) of S Ξ key ((1/a)k1 ,a) of SxM SxM is Affine cipher (a,26)=1 Each key is equiprobable 9/22/2018
83
Popertis of product cryptosystems
S , S1 , S2 = cryptosystems S1xS2 = S2xS S1,S2 commute. S=Sⁿ S is an idempotent cryptosystem. 9/22/2018
84
The End 9/22/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.