Lecture 2 Shannon’s Theory

Lecture 2 Shannon’s Theory
Lecturer: Meysam Alishahi Design By: Z. Faraji and H. Hajiabolhassan 9/22/2018

Plan Introduction Perfect secrecy Shannon’s theorem Entropy
Spurious Keys and Unicity Distance Product Cryptosystems 9/22/2018

Claude Shanon (Shannon 1949)
In this chapter, we discuss several of Shannon’s ideas about “secrecy systems”. 9/22/2018

What’s the meaning of security?
anyone can lock it the key is needed to unlock 9/22/2018

Security We define some of the most usefule criteria:
computational security Provable security Unconditional security 9/22/2018

Elementary Probability Theory
Basic properties: Pr: 2  [0,1] Pr() = 1 For disjoint events: Pr(Ai) = i Pr(Ai) [axiomatic definition of probability: take the above three conditions as axioms] Immediate consequences: Pr() = 0, Pr(A) = 1 - Pr(A), A  B  Pr(A)  Pr(B) a Pr(a) = 1 9/22/2018

Random Variables 9/22/2018

Elementary Probability Theory
X and Y= discrete random variable pr= probability distribution function pr(x,y)=pr(X=x|Y=y)pr(Y=y) Set pr(x):=pr(X=x). X and Y are said to be independent if pr(x, y) = pr(x)pr(y) for all possible values x of X and y of Y. 9/22/2018

Chain rule: Bayes’ Theorem If
Pr(A1,..., An) = Pr(A1)Pr(A2|A1)Pr(A3|A1,A2 )Pr(An|A1,..., An-1) 9/22/2018

Cryptography: Elementary Definitions
C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) {k:y  C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y  C(k)} 9/22/2018

Perfect Secrecy. If pr(x|y) = pr(x) for all xєP ,yєC The cryptosystem has perfect secrecy. 9/22/2018

Shift Cipher for any Plaintext probability distribution,
Suppose the 26 keys in the Shift Cipher are use with equal probability 1/26. for any Plaintext probability distribution, the ShiftCipher has perfect secrecy. 9/22/2018

Shift Cipher proof: We know pr(Y=y) = ∑ pr(K=k)pr(X=dk (y)) kєZ26 =1/26∑ pr(X=y-k) On the other hand ∑ pr(X=y-k) = ∑ pr(X=x) kєZ26 xєZ26 so pr(y) = 1/26 . 9/22/2018

Shift Cipher proof: We have that pr(y|x) = pr(K=(y-x) mod 26) = 1/26 Finally pr(x|y) = pr(x) 9/22/2018

Perfect Secrecy. If Pr(x_0)=0 for some x_0 in P then Pr[x_0|y]=Pr[x_0]=0 for all y in C. So we need only consider x in P with Pr(x_0)>0. Pr[x|y]=Pr[x] for all y is equivalent to Pr[y|x]=Pr[y] for all y Reasonable assumption Pr[y]>0 for all y in C. 9/22/2018

Perfect Secrecey When we have a cryptosystem with perfect secrecey;
|K|≥|C| |C|≥|P| 9/22/2018

Shannon’s Theorem Suppose (P,C,K,E,D) is a cyptosystem where |K|=|C|=|P|. The cryptosystem povides perfect secrecy if and only if every key is used with equal probability 1/|K|, and for all xєP,yєC there is a unique key K such that ek (x)=y 9/22/2018

Proof We know that |C|=|{ek (x)|kєK}|= |K| So (for all x єP,y єC) there is a unique key k s.t ek (x)=y Assume that P={x_1,…,x_n} By bayes’ theorem pr(xi|y)= (pr(x_i)pr(y|x_i)) / pr(y) = (pr(xi)pr(K=ki)) /pr(y) By perfect secrecy condition pr(K=ki) = pr(y) 9/22/2018

Proof C(k) = {ek (x) | xєp} pr(Y=y) = ∑ pr(K=k)pr(X=dk (y))
{k:y  C(k)} pr(Y=y|X=x) = ∑ pr(K=k) {k:y= ek (x)} pr(X=x|Y=y) = pr(X=x) ∑ pr(K=k) {k:y= ek (x)} ∑ pr(K=k)pr(X=dk (y)) {k:y  C(k)} 9/22/2018

A perfectly secret scheme: one-time pad
t – a parameter K = P = {0,1}t component-wise xor Gilbert Vernam (1890 –1960) Vernam’s cipher: ek(m) = k xor m dk(c) = k xor c Correctness is trivial: dk(ek(m)) = k xor (k xor m) m 9/22/2018

Observation One time pad can be generalized as follows.
Let (G,+) be a group andK = P = C = G. The following is a perfectly secret encryption scheme: e(k,m) = m + k d(k,m) = m – k 9/22/2018

Why the one-time pad is not practical?
The key has to be as long as the message. The key cannot be reused This is because: ek(m0) xor ek(m1) = (k xor m0) xor (k xor m1) m0 xor m1 9/22/2018

a KGB one-time pad hidden
Practicality? Generally, the one-time pad is not very practical, since: the key has to be as long as the total length of the encrypted messages, it is hard to generate truly random strings. However, it is sometimes used (e.g. in the military applications), because of the following advantages: perfect secrecy, short messages can be encrypted using pencil and paper . a KGB one-time pad hidden in a walnut shell In the 1960s the Americans and the Soviets established a hotline that was encrypted using the one-time pad.(additional advantage: they didn’t need to share their secret encryption methods) 9/22/2018

Information Theory and Entropy
Information theory tries to solve the problem of communicating as much data as possible over a noisy channel Measure of data is entropy Claude Shannon first demonstrated that reliable communication over a noisy channel is possible (jump-started digital age) 9/22/2018

Knowledge and Information
Goal: Reasoning with incomplete information! Problem 1: Description of a state of knowledge! Problem 2: Updating probabilities when new information becomes available! 9/22/2018

Entropy Suppose we have a random variable X which takes on a finite set of values What is the information gained by an event which takes place according to distribution p(X)? Equivalently, if the event has not (yet) taken place, what is the uncertainty about the outcome? This quantity is called the entropy of X and is denoted by H(X). … 9/22/2018

Entropy X = discrete random variable p= probability distribution function 9/22/2018

Entropy: how about n = 3? n = 3 p1 + p2 + p3 = 1 9/22/2018

Entropy Shannon entropy Binary entropy formula Differential entropy
9/22/2018

Symbol Codes AN: all strings of length N
A*: all strings of finite length {0,1}3={000,001,010,…,111} {0,1}*={0,1,00,01,10,11,000,001,…} An encoding of X is any mapping f:X {0,1}* f(y): codeword for yεX |f(y)|: the length of codeword 9/22/2018

Notifications We can extend the encoding f by defining
f(x1,…, xn)= f(x1)||…||f(xn) xiєX p(x1,…, xn)= p(x1)…p(xn) Since f most be decodeable, it should be injective. 9/22/2018

Definitions An encoding f is a prefix-free encoding if
there do not exist x,y єX and z є(0,1}* s.t f (x)= f (y)||z L(f) is the weighted average length of an encoding of X. We define L(f) = ∑ p(x) |f(x)| xєX 9/22/2018

Our problem We are going to find an injective encoding f, that minimizes L(f). 9/22/2018

Huffman’s Encoding X={a,b,c,d,e} a b c d e 0.05 0.10 0.12 0.13 0.60 1
1 0.15 0.12 0.13 0.60 1 0.15 0.25 0.60 1 0.40 0.60 1 a= b= c= d= e=1 9/22/2018

Huffman’s algorithm solves our problem…
Moreover the encoding f produced by Huffman’s algorithm is prefix-free and H(X)≤L(f) ≤ H(X) +1 9/22/2018

Huffman’s Encoding a=000 b=001 c=010 d=011 e=1
We can see L(f)=1.8 , H(X)=1.7402 9/22/2018

Entropy of a Random Variable
9/22/2018

Choosing Balls Randomly
? What is the best sequence of questions ? What is the average number of questions ? 8 balls: 4 reds, 2 blues, 1 green, 1 purple Draw one randomly 9/22/2018

Best set of questions: 8 balls: 4 reds, 2 blues, 1 green, 1 purple yes Red ? 1 question yes Blue ? no 2 questions yes no Green ? 3 questions no Purple 3 questions Huffman Code! 9/22/2018

Average number of questions : P( ) x 1 + P( ) x 2 + P( ) x 3 + P( ) x 3 x x x x 3 = 1.75 Entropy = = 1.75 bits Entropy = 9/22/2018

Entropy and Information
The amount of information about an event is closely related to its probability of occurrence! Entropy is the expected value of the information! 1 9/22/2018

INFORMATION THEORY Communication theory deals with systems for transmitting information from one point to another. Information theory was born with the discovery of the fundamental laws of data compression and transmission. 9/22/2018

f(αx+ βy) ≤α f(x) + β f(y)
Convex Functions A function f : R→R is convex if for all α,β≥0 such that α+ β= 1, we have f(αx+ βy) ≤α f(x) + β f(y) for all x,y∈R. 9/22/2018

Strictly Convex Functions
A convex function f : R→R is strictly convex if for all α,β>0 such that α+ β= 1 and x=y we have f(αx+ βy) <α f(x) + β f(y) for all x,y∈R. 9/22/2018

Jensen’s Inequality Lemma. Let f : R →R be a convex function, and let α1, α2 , …, αn be nonnegative real numbers such that Σkαk = 1. Then, for any real numbers x1, x2, …, xn, we have Lemma. Let f be a convex function, and let X be a random variable. Then, f(E[X]) ≤E[f(X)]. 9/22/2018

Entropy (Bounds) When H(X) = 0? Upper bound?
if a result of an experiment is known ahead of time necessarily: Upper bound? for || = n: H(X)  log2n nothing can be more uncertain than the uniform distribution Entropy increases with message length! 9/22/2018

Poperties of Entropy THEOREM Suppose X is a random variable having probability distribution p1,p2,…,pn ,where pi > 0 and 1 ≤ i ≤ n H(X) ≤ log2n equality holds if and only if pi =1/n, for any 1 ≤ i ≤ n. 9/22/2018

Proof We know H(X) = - ∑ pi log2pi = ∑ pi log2 (1/ pi )
1≤ i≤n = ∑ pi log2 (1/ pi ) By Jensen’s inequality H(X) ≤ log2 ∑ pi(1/ pi ) = log2 n Equality occurs if and only if pi =1/n , 1≤ i ≤ n. 9/22/2018

Joint Entropy The joint entropy of a pair of discrete random variables X, Y is the amount of information needed on average to specify both their values. 9/22/2018

Theorem. H(X,Y) ≤ H(X)+H(Y) and equality occurs if and only if
X,Y are independent random variables. Proof. Let p(X=xi)=pi , p(Y=yj)=qj , p(X=xi,Y=yj)=rij , 1≤ i ≤ m, 1≤ j ≤ n ∑ rij=qj ∑ rij=pi i j 9/22/2018

Proof H(X)+H(Y) = - ∑ pi log2pi - ∑ qj log2qj 1≤ i≤m 1≤ j≤n
= - ∑ ∑ rij log2pi - ∑ ∑ rij log2qj i j j i = - ∑ ∑ rij log2piqj (*) i j H(X,Y) =- ∑ ∑ rij log2rij (**) i j H(X,Y)-H(X)-H(Y)= ∑ ∑ rij log2(1/rij) +∑ ∑ rij log2(piqj) i j i j =∑ ∑ rij log2(piqj/rij) i j 9/22/2018

Proof By Jensen’s inquality H(X,Y)-H(X)-H(Y)≤ log2 ∑ ∑(piqj ) =0 i j In Jensen’s inquality, equality occurs rij =piqj p(xi ,yj)= p(xi) p(yj) 9/22/2018

The Chain Rule 9/22/2018

The Chain Rule Theorem H(X,Y)=H(X)+H(Y|X) Proof. H(X)+H(Y|X)=
- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i 9/22/2018

The Chain Rule H(X)+H(Y|X)=
=- ∑ p(X=xi) log2p(X=xi)+ ∑ p(X=xi) H(Y|X=xi) i i =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi)p(yj |xi ) log2p(yj |xi) i i j =- ∑ p(xi) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) =- ∑ ∑ p(xi,yj ) log2p(xi) - ∑ ∑ p(xi,yj ) log2p(yj |xi) i j i j =- ∑ ∑ p(xi,yj ) log2p(xi,yj ) =H(X,Y) i j 9/22/2018

Corollary H(X|Y)≤H(X) with equality holds if and only if X and Y are independent. Proof. We know that H(X,Y) ≤ H(X)+H(Y) and H(X,Y)=H(X)+H(Y|X) Hence, H(X|Y)≤H(X) 9/22/2018

Counterfeit Coin We have 12 coins which are similar
One of them is forged The forged coin is heavier or lighter than the others. Find the minimum number of weights to recognize the forged coin! 9/22/2018

Counterfeit Coin 9/22/2018

Oh No! 9/22/2018

Counterfeit Coin 9/22/2018

Counterfeit Coin The answer is 3, find a strategy! Lower Bound:
Consider a random ordering for coins Random variable X shows the place of the forged coin and specifies it is lighter or heavier. Assume that the random variables Y, Z, … present the best strategy! Hence, H(X|Y,Z,…)=0 H(X|Y_1,Y_2,…)=H(X,Y_1,Y_2,…)-H(Y_1,Y_2,…) =H(X)-H(Y_1)-H(Y_2|Y_1)-… H(X)=log 24 and H(Y_i|Y_1,Y_2,…Y_i) ≤ log 3. 9/22/2018

Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C)
Theorem Let (P,C,K,E,D) be a cryptosystem. H(K|C)=H(K)+H(P)-H(C) 9/22/2018

Proof We have H(K,P,C)=H(C|K,P)+H(K,P) We know H(C|K,P)=0
So H(K,P,C)=H(K,P) K,P are independent variables Hence H(K,P)=H(K)+H(P) So H(K,P,C)=H(K)+H(P) In a similar fashion H(P|K,C)=0 Hence H(K,P,C)=H(K,C) H(K|C) =H(K,C)-H(C) =H(K,P,C)-H(C) =H(K)+H(P)-H(C) 9/22/2018

Unicity Distance Assume in a given cryptosystem a message is a string: x1,x2,...,xn where xi is in P (xi is a letter or block) Encrypting each xi individually with the same key k, yi = Ek(xi), 1 ≤ I ≤ n How many ciphertext blocks, yi’s, do we need to determine k? 9/22/2018

Defining a Language L: the set of all messages, for n >= 1.
“the natural language” p2: (x1,x2) : x1, x2 in P pn: (x1,x2,...,xn), xi in P, so pn  L each pi inherits a probability distribution from L (digrams, trigrams, ...) H(pi) makes sense 9/22/2018

Entropy and Redundancy of a Language
What is the entropy of a language? What is the redundancy of a language? 9/22/2018

English Language 1 <= HL <= 1.5 in english RL = 1 – HL/log226
H(P) = 4.18 H(P2) = 3.90 RL = 1 – HL/log226 about 75%, depends on HL 9/22/2018

Definition K(y)={kєK|ЭxєPⁿ,p(x)>0,ek (x)=y}
The average number of spurious keys sn sn =∑ p(y) (|K(y)|-1) = ∑ p(y)|K(y)|-1 yєCⁿ yєCⁿ 9/22/2018

Theorem Suppose (P,C,K,E,D) is a cryptosystem, where |P|=|C| and keys are chosen equiprobably. Let RL denote the redundancy of the underlying Language. Then given a string of ciphertext of length n, where n is sufficiently large, the expected number of spurious keys,sn , satisfies 9/22/2018

Proof By last theorem; H(K|Cⁿ)=H(K)+H(Pⁿ)-H(Cⁿ) We have H(Pⁿ) ≈ nHL=n(1-RL)log2|P| Certainly H(Cⁿ)≤nlog2|C| If |P|=|C|; H(K|Cⁿ)≥H(K)-nRL log2|p| 1 9/22/2018

Proof H(K|Cⁿ)≥H(K)-nRL log2|p| On the other hand H(K|Cⁿ)= ∑ p(y)H(K|y)
yєCⁿ ≤ ∑ p(y)log2|K(y)| ≤ log2∑ p(y) |K(y)| = log2(1+sn) 1 2 1,2 log2(1+sn)≥H(K)-nRL log2 |P| 9/22/2018

Product Cryptosystems
Let P=C , the cryptography is called endomorphism. S1=(P,P,K1,E1,D1) :an endomorphism S2=(P,P,K2,E2,D2) :an endomorphism We define the cryptosystem S1xS2 to (P,P, K1xK2, E1xE2, D1xD2) 9/22/2018

Product Cryptosystems
In the S1xS2 product cryptosystem, we have e (k1,k2) (x)= ek2 (ek1(x)) d (k1,k2) (y)= dk1 (dk2(y)) d (k1,k2)(e (k1,k2)(x) )= d(k1,k2)(ek2(ek1 (x))) = dk1(ek1 (x))=x 9/22/2018

Multiplicative cipher
P=C=Z26 , K= {aєZ26|(a,26)=1} and for any a є K; we have ea(x)=ax (mod 26) da (x)=(1/a)x (mod 26) (x,yєZ26) 9/22/2018

Theorem M:Multiplicative cipher SxM:Affinecipher S: Affine cipher
9/22/2018

Proof Let M=(P,P,K1,E1,D1) , S=(P,P,K2,E2,D2) , aєK1,k єK2 ,xєP e(k,a) (x)=a(x+k) mod 26 =(ax+ak) mod 26 So key (k,a) of SxM Ξ key (a,ak) of S 9/22/2018

Each key is equiprobable
Proof On the other hand ak=k1 k=(1/a)k1 Hence key (a,k1) of S Ξ key ((1/a)k1 ,a) of SxM SxM is Affine cipher (a,26)=1 Each key is equiprobable 9/22/2018

Popertis of product cryptosystems
S , S1 , S2 = cryptosystems S1xS2 = S2xS S1,S2 commute. S=Sⁿ S is an idempotent cryptosystem. 9/22/2018

The End 9/22/2018

Lecture 2 Shannon’s Theory

Similar presentations

Presentation on theme: "Lecture 2 Shannon’s Theory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2 Shannon’s Theory

Similar presentations

Presentation on theme: "Lecture 2 Shannon’s Theory"— Presentation transcript:

Similar presentations

About project

Feedback