1 Cryptanalysis Four kinds of attacks (recall) The objective: determine the key ( Herckhoff principle ) Assumption: English plaintext text Basic techniques: frequency analysis based on: –Probabilities of occurrences of 26 letters –Common digrams and trigrams.
2 Cryptanalysis -- statistical analysis Probabilities of occurrences of 26 letters –E, having probability about (12%) –T,A,O,I,N,S,H,R, each between 0.06 and 0.09 –D,L, each around 0.04 –C,U,M,W,F,G,Y,P,B, each between and –V,K,J,X,Q,Z, each less than 0.01 –See table 1.1, page common digrams (in decreasing order): –TH, HE, IN, ER, AN, RE,… 12 common trigrams (in decreasing order): –THE, ING,AND,HER,ERE,…
3 Cryptanalysis of Affine Cipher Suppose a attacker got the following Affine cipher –FMXVEDKAPHFERBNDKRXRSREFNORUDSDKDVSHVUFEDKAPRKDLYEVLR HHRH Cryptanalysis steps: –Compute the frequency of occurrences of letters R: 8, D:7, E,H,K:5, F,S,V: 4 (see table 1.2, page 27) –Guess the letters, solve the equations, decrypt the cipher, judge correct or not. First guess: R e, D t, i.e., e K (4)=17, e K (19)=3 –Thus, 4a+b=17 a=6, b=19, since gcd (6,26)=2, so incorrect. 19a+b=3 Next guess: R e, E t, the result will be a=13, not correct. Guess again: R e, H t, the result will be a=8, not correct again. Guess again: R e, K t, the result will be a=3, b=5. –K=(3,5), e K (x)=3x+5 mod 26, and d K (y)=9y-19 mod 26. –Decrypt the cipher: algorithmsarequitegeneraldefinitionsofarithmeticprocesses If the decrypted text is not meaningful, try another guess. Need programming: compute frequency and solve equations Since Affine cipher has 12*26=312 keys, can write a program to try all keys.
4 Cryptanalysis of substitution cipher Final goal is to find the corresponding plaintext letter for each ciphertext letter. Ciphertext: example 1.11, page 28 Steps: –Frequency computation, see table 1.3, page 29 Guess Z e, quite sure C,D,F,J,M,R,Y are t,a,o,i,n,s,h,r, but not exact –Look at digrams, especially –Z or Z-. Since ZW occurs 4 times, but no WZ, so guess W d (because ed is a common digram, but not de) Continue to guess –Look at the trigrams, especially THE, ING, AND,…
5 Cryptanalysis of Vigenere cipher In some sense, the cryptanalysis of Vigenere cipher is a systematic method and can be totally programmed. Step 1: determine the length m of the keyword –Kasiski test and index of coincidence Step 2: determine K=(k 1,k 2,…,k m ) –Determine each k i separately.
6 Kasiski test—determine keyword length m Observation: two identical plaintext segments will be encrypted to the same ciphertext whenever they appear positions apart in plaintext, where 0 mod m. Vice Versa. So search ciphertext for pairs of identical segments, record the distance between their starting positions, such as 1, 2,…, then m should divide all of i ’s. i.e., m divides gcd of all i ’s.
7 Index of coincidence Can be used to determine m as well as to confirm m, determined by Kasiski test Definition: suppose x=x 1 x 2,…,x n is a string of length n. The index of coincidence of x, denoted by I c (x), is defined to be the probability that two random elements of x are identical. –Denoted the frequencies of A,B,…,Z in x by f 0,f 1,…,f 25 --I c (x)= i=0 25 ( ) fi2fi2 ( ) n2n2 = i=0 25 f i (f i -1) n(n-1) ( Formula IC )
8 Suppose x is a string of English text, denote the expected probability of occurrences of A,B,…,Z by p 0,p 1,…,p 25 with values from table 1.1, then I c (x) p i 2 = … =0.065 (since the probability that two random elements both are A is p 0 2, both are B is p 1 2,…) Index of coincidence (cont.) Question: if y is a ciphertext obtained by shift cipher, what is the I c (y)? Answer: should be 0.065, because the individual probabilities will be permuted, but the p i 2 will be unchanged. Therefore, suppose y=y 1 y 2 …y n is the ciphertext from Vigenere cipher. For any given m, divide y into m substrings: y 1 =y 1 y m+1 y 2m+1 … if m is indeed the keyword length, then y 2 =y 2 y m+2 y 2m+2 … each y i is a shift cipher, I c (y i ) is about … y m =y m y 2m y 3m … otherwise, I c (y i ) 26(1/26) 2 =
9 Index of coincidence (cont.) For purpose of verify keyword length m, divide the ciphertext into m substrings, compute the index of coincidence by formula IC for each substring. If all IC values of the substrings are around 0.065, then m is the correct keyword length. Otherwise m is not the correct keyword length. If want to use I c to determine correct keyword length m, what to do? Beginning from m=2,3, … until an m, for which all substrings have IC value around Now, how to determine keyword K=(k 1,k 2,…,k m )? Assume m is given.
10 Determine keyword K=(k 1,k 2,…,k m ) 1.Determine each k i (from y i ) independently. 2.Observation: 2.1 let f 0,f 1,…,f 25 denote the frequencies of A,B,…,Z in y i and n′=n/m 2.2 then probability distribution of 26 letters in y i is: f 0 f 25 n′,, n′ 2.3 if the shift key is k i, then f 0+k i (i.e., A+k i ) is the frequency of a in the corresponding plaintext x i, …, f 25+k i (note the subscript 25+k i should be computed by modulo 26) is the frequency of z in x i. Since x i is normal English text, probability distribution of f 0+k i f 25+k i n′,, n′ should be “close to” ideal probability distribution p 0,p 1,…,p 25. p 0, …, p 25 So: f 0+k i n' p0p0 +…+ f 25+k i n' p 25 p 0 2 +…+p 25 2 =0.065
11 Determine keyword K=(k 1,k 2,…,k m ) (cont.) Therefore, define: f i+g M g = i=0 25 p i n′n′ When g=k i, M g will generally be around (i.e., i=0 25 p i 2 ). Otherwise M g will be quite smaller than So let g from 0, until 25, compute M g, and for some g, if M g is around 0.065, then k i =g. Note: the subscript i+g should be seen as modulo 26. f 0+g n' p0p0 +…+ f 25+g n' p 25 On the other hand, for any g !=k i, will not be close to
12 Cryptanalysis of Vigenere cipher--example Example 1.12, page 33. –Using Kasiski test to determine the keyword length CHR appears five times at 1,166,236,276,286 the distance is 165, 235,275,285, the gcd is 5, so m=5. –Using index of coincidence to verify m=5. Divide ciphertext into y 1, y 2, y 3, y 4, y 5 Compute f 0,f 1,…,f 25 for each y i and then I c (y i ), get 0.063, 0.068,0.069,0.061,0.072, so m=5 is correct. –Determine k i for i=1,…,5. Compute M g for g=0,1,…,25 and if M g 0.065, then let k i =g. where M g = i=0 25 p i n′n′ f i+g As a result, k 1 =9,k 2 =0,k 3 =13,k 4 =4,k 5 =19, i.e., JANET
13 Cryptanalysis of Hill cipher Difficult to break based on ciphertext only Easily to break based on both ciphertext and plaintext. Suppose given at least m distinct plaintext- ciphertext pairs: x j =(x 1,j,x 2,j,…,x m,j ) y j =(y 1,j,y 2,j,…,y m,j ) then define two matrices X=(x i,j ) and Y=(y i,j ) Let Y=XK, if X is invertible, then K=X -1 Y.
14 Suppose plaintext is: friday and ciphertext is: PQCFKU and the m=2. Then e K (f,r)=(P,Q), e K (i,d)=(C,F). That is: Cryptanalysis of Hill cipher--example ( ) = ( )K K=( ) -1 ( )=( )( )=( ) Then using the third pair, i.e., (a,y) and (K,U) to verify K. In case m is unknown, try m=2,3, …
15 Cryptanalysis of LFSR stream cipher Vulnerable to known-plaintext attack. Suppose m, plaintext binary string x 1,x 2,…,x n and ciphertext binary string y 1,y 2,…,y n are known, as long as n>2m, the key can be broken: –Keystream is: z i =(x i +y i ) mod 2. (i=1,2,…,n) –Then the initialization vector of K is z 1,…, z m. –Next is to determine coefficients (c 0,c 1,…,c m-1 ) of K (recall that z i+m = m-1 j=0 c j z i+j mod 2 for all i 1) i.e, –(z m+1,z m+2,…,z 2m )=(c 0,c 1,…,c m-1 ) z 1 z 2 … z m z 2 z 3 … z m+1 …………… z m z m+1 … z 2m-1
16 Cryptanalysis of LFSR stream cipher (cont.) –(c 0,c 1,…,c m-1 )=(z m+1,z m+2,…,z 2m ) z 1 z 2 … z m z 2 z 3 … z m+1 …………… z m z m+1 … z 2m-1 Therefore:
17 Example 1.14, page 37. Suppose LFSR 5 with the following: Cryptanalysis of LFSR stream cipher --example Ciphertext string: Plaintext string: Then keystream: Therefore initialization vector is: For next five key elements: 01000, set up equation for coefficients (c 0,c 1,c 2,c 3,c 4 ) and solve it. The result is: (c 0,c 1,c 2,c 3,c 4 ) =(1,0,0,1,0) i.e., z i+5 =(z i +z i+3 ) mod 2.