Stream ciphers 2 Session 2
Contents PN generators with LFSRs Statistical testing of PN generator sequences Cryptanalysis of stream ciphers 2/75
PN generators with LFSRs Computational complexity of the Berlekamp- Massey algorithm is quadratic in the length of the minimum LFSR capable of generating the intercepted sequence. Thus, if the linear complexity is very high, then the task of predicting the next bits of the sequence is too complex. 3/75
PN generators with LFSRs Linear complexity achievable with a sole LFSR is small. Then, in order to prevent the cryptanalysis of a pseudorandom sequence generator, we must design it in such a way that its linear complexity is too high for the practical application of the Berlekamp-Massey algorithm. 4/75
PN generators with LFSRs Since LFSRs have nice properties regarding statistics of their output sequences, a good idea is to base PN generators on LFSRs. But to increase linear complexity, we have to combine outputs of several LFSRs in non- linear manner – through non-linear Boolean functions. 5/75
Algebraic normal form It is the form of a Boolean function that uses only the operations and In the ANF, the product that includes the largest number of variables is denominated non linear order of the function. Example: The non linear order of the function f(x 1,x 2,x 3 )=x 1 x 1 x 3 x 2 x 3 is 2. 6/75
Algebraic normal form The ANF of a Boolean function can be determined from its truth table. 7/75 The Möbius transform
Algebraic normal form Example: n=3 8/75 x0x0 x1x1 x2x2 f
Algebraic normal form u=000u=001u=010 9/ a 000 =f(0,0,0)=0 a 001 =f(0,0,0)+ +f(0,0,1)=0+1=1 a 010 =f(0,0,0)+ +f(0,1,0)=0+0=0 xxx
Algebraic normal form u=011u=100u=101 10/ a 011 =f(0,0,0)+ f(0,0,1) +f(0,1,0)+f(0,1,1)= =0 a 100 =f(0,0,0)+ +f(1,0,0)=0+0=0 a 101 =f(0,0,0)+ f(0,0,1) +f(1,0,0)+f(1,0,1)= =0 xxx
Algebraic normal form u=110u=111 11/ a 110 =f(0,0,0)+ f(0,1,0) +f(1,0,0)+f(1,1,0)= =1 a 111 =f(0,0,0)+ f(0,0,1) +f(0,1,0)+f(0,1,1)+ f(1,0,0) +f(1,0,1)+f(1,1,0)+ f(1,1,1) = 0 Then: f(x 0,x 1,x 2 )=a 001 x 2 +a 110 x 0 x 1 =x 2 +x 0 x 1 x
Non-linear combiners In these generators, the keystream sequence is obtained by combining the output sequences of various LFSRs in a non linear manner. Example – it is possible to use a Boolean function (without memory). 12/75
Non-linear combiners If F is a Boolean function of N periodic input sequences a 1 (t), a 2 (t),..., a N (t), then the output sequence b(t) = F(a 1 (t), a 2 (t),..., a N (t)) is a linear combination of various products of sequences. These products are determined by determining the ANF of the function F. 13/75
Non-linear combiners Given the ANF of the function F, if we create a function F* from F in such a way that instead of the sum and product modulo 2 in F we use the sum and product of integers, for the linear complexity and the period of the output sequence of F the following holds: 14/75
Non-linear combiners Example (1) – If the characteristic polynomials of the input sequences are: 15/75 All these polynomials are primitive!
Non-linear combiners Example (2) – Then 16/75
Non-linear combiners The sum of N sequences in GF(q) (1) – The equality holds if the characteristic polynomials of the input sequences do not have common factors. 17/75
Non-linear combiners The sum of N sequences in GF(q) (2) – Obviously, if the periods of the input sequences are mutually prime then 18/75
Non-linear combiners The sum of N sequences in GF(q) (3) – Example: 19/75 Primitive! The periods are Mersenne primes
Non-linear combiners The product of N sequences in GF(q) (1) – Theorem (Golić, 1989) If Per(a i ) are mutually prime, then – Theorem (Lidl, Niedereiter) Per(a i ) are mutually prime 20/75
Non-linear combiners Example 21/75 Primitive! The periods are Mersenne primes
Non-linear combiners The general case (1) – Let be the Boolean function obtained by removing all the products from the function F except those of the maximum order. Let be the corresponding integer function. 22/75
Non-linear combiners The general case (2) – Theorem (Golić, 1989) F depends on all the N input variables. Per(a i ) are mutually prime. Then 23/75
Non-linear combiners The general case (3) – Example (1) 24/75
Non-linear combiners The general case (4) – Example (2) If the characteristic polynomials of the input sequences are: Then 25/75 Primitive, periods Mersenne primes
Non-linear combiners The general case (5) – Example – Geffe’s generator (1) 26/75
Non-linear combiners The general case (6) – Example – Geffe’s generator (2) – Equivalent scheme 27/75
Non-linear combiners The general case (7) – Example – Geffe’s generator (3) If we set the feedback polynomials primitive, with periods that are Mersenne primes: Then 28/75
Statistical testing of PN generators The output sequence of a generator of pseudorandom sequences looks random, but it is not. Pseudorandom generators expand a truly random sequence (the key) to a much longer sequence, such that an adversary cannot distinguish between the pseudorandom sequence and a truly random sequence. 29/75
Statistical testing of PN generators In order to obtain a guarantee of the security of this type of generators, various statistical tests are applied, especially designed for this purpose. The fact that a generator passes a set of statistical tests should be considered a necessary condition, although not a sufficient one, for the security of the generator. 30/75
Statistical testing of PN generators If the result X of an experiment can take any real value, then X is a continuous random variable. The probability density function f(x) of a continuous random variable X can be integrated and the following holds: f(x) 0, for all x R For all a, b R the following holds 31/75
Statistical testing of PN generators A continuous random variable has a normal distribution with the mean and the variance 2 if its probability density function is: We say that X is If X is, then we say that X has a standard normal distribution. 32/75
Statistical testing of PN generators If the random variable X is, then the variable is. The Euler’s gamma function: 33/75
Statistical testing of PN generators A continuous random variable X has a 2 distribution with degrees of freedom if its probability density function is 34/75
Statistical testing of PN generators A statistical hypothesis H is an affirmation about the distribution of one or more random variables. A hypothesis test is a procedure based on the observed values of the random variable that leads to the acceptance or rejection of the hypothesis H. 35/75
Statistical testing of PN generators The test only provides a measure of the strength of evidence given by the data against the hypothesis. The conclusion is probabilistic. The level of significance of the test of the hypothesis H is the probability of rejecting the hypothesis H when it is true. 36/75
Statistical testing of PN generators The hypothesis to be tested is denominated the null hypothesis, H 0. The alternative hypothesis is denoted by H 1 or H a. In cryptography: – H 0 – the given generator is a random sequence generator. – is between 0,001 and 0,05. 37/75
Statistical testing of PN generators A test: – Determines a statistic for the sample of the output sequence. – This statistic is compared with the expected value for a random sequence. 38/75
Statistical testing of PN generators How is the comparison carried out? (1) – The computed statistic – X 0 – follows (usually) a 2 distribution with degrees of freedom. – It is assumed that this statistic takes large values for non random sequences. 39/75
Statistical testing of PN generators How is the comparison carried out? (2) – In order to achieve , a threshold X is chosen (by means of the corresponding table), such that P(X 0 >X )= . – If the value of the statistic for the sample of the output sequence, X s, satisfies X s >X , then the sequence fails on the test. 40/75
Statistical testing of PN generators Basic tests for cryptographic use: – frequency test, – serial test, – poker test, – runs test, – autocorrelation test, – etc. 41/75
Statistical testing of PN generators Frequency test (1) – Purpose: determine if the number of zeros and ones in a sequence s is approximately the same. – n 0 – number of zeros, n 1 – number of ones. – The statistic: 42/75
Statistical testing of PN generators Frequency test (2) – The statistic follows a 2 distribution with 1 degree of freedom. – The approximation is good enough if n /75
Statistical testing of PN generators Serial test (1) – Tries to determine if the number of occurrences of 00, 01, 10 and 11, as subsequences of s is approximately the same. – The statistic: 44/75
Statistical testing of PN generators Serial test (2) – The statistic follows a 2 distribution with 2 degrees of freedom. – The approximation is good enough if n /75
Statistical testing of PN generators Poker test (1) – A positive integer m is considered such that – The sequence s is divided into k parts of size m. – n i is the number of occurrences of the type i of the sequence of length m, 1 i 2 m (that is, i is the value of the integer whose binary representation is the sequence of length m. 46/75
Statistical testing of PN generators Poker test (2) – The test determines if every sequence of length m appears approximately the same number of times. – The statistic: – The statistic follows approximately a 2 distribution with 2 m -1 degrees of freedom. 47/75
Statistical testing of PN generators Runs test (1) – A run of length i – a subsequence of s formed by i consecutive zeros or i consecutive ones that are neither preceded nor followed by the same symbol. – A run of zeros – gap – A run of ones – block 48/75
Statistical testing of PN generators Runs test (2) – Purpose: determine if the number of runs of different lengths in the sequence s is that expected in a random sequence. – The number of gaps (or blocks) of length i in a random sequence of length n is – It is considered that k is equal to the largest integer i for which e i 5. 49/75
Statistical testing of PN generators Runs test (3) – We denote by B i and H i the number of blocks and gaps of length i in s, for each i, 1 i k. – The statistic – The statistic follows approximately a 2 distribution with 2k-2 degrees of freedom. 50/75
Statistical testing of PN generators Autocorrelation test (1) – Checks the correlation between s and shifted versions of s. – An integer d, 1 d n/2 is considered. – The number of bits in s that are not equal to the d-shifts is 51/75
Statistical testing of PN generators Autocorrelation test (2) – The statistic – The statistic follows approximately a N (0,1) distribution. – The approximation is good enough if n-d /75
Cryptanalysis of stream ciphers 53/75 A Plaintext KEY decipher decrypt Cryptanalysis Ciphertext encipher Plaintext KEY B
Cryptanalysis of stream ciphers The problem of cryptanalysis – Given some information related to the cryptosystem (at least the ciphertext), determine plaintext and/or the key. The goal of the designer is to make this problem as difficult as possible for the cryptanalyst. 54/75
Cryptanalysis of stream ciphers General assumption – all the details of the cryptosystem are known to the cryptanalyst. The only unknown is the key. Types of attack – Ciphertext-only attack – Known plaintext attack – Chosen plaintext attack – Chosen ciphertext attack 55/75
Cryptanalysis of stream ciphers The ciphertext-only attack is the most difficult one for the cryptanalyst (in general). The more information known to the cryptanalyst, the easier the attack. 56/75
Cryptanalysis of stream ciphers The “brute force attack” – Elementary attack – no knowledge about cryptanalysis is necessary. – Assumptions The cryptosystem is known The ciphertext is known – The goal Determine the key/plaintext – The means Trying all the possible keys 57/75
Cryptanalysis of stream ciphers Complexity of the brute force attack – Extremely high, if there are many possible keys – impractical Key space – the total number of keys possible in a cryptosystem 58/75
Cryptanalysis of stream ciphers Examples of key space size 59/75 Key space – 40 bits 1 Key space – 56 bits (DES) 7 Key space – 128 bits 3 Key space – 256 bits 1 Number of 256-bit primes 1 Age of the Sun in seconds 1 Number of clock pulses of a 3GHz computer clock through the Sun’s age 5.4 10 26
Cryptanalysis of stream ciphers A cryptosystem’s security is ultimately determined by the size of its key space However, this is the upper limit of that security measure There may be a problem in the system design that may cause a significant reduction of the effective key space The task of the cryptanalyst – to find this pitfall and to use it to attack the system 60/75
Cryptanalysis of stream ciphers Basic attack methods against stream (and block) ciphers – Algebraic – Statistical Algebraic attacks (1) – The key symbols (e.g. bits) are the unknowns in the system of equations assigned to the PRNG 61/75
Cryptanalysis of stream ciphers Algebraic attacks (2) – Given all the details of the PRNG to be cryptanalyzed (except the key bits), determine the system of equations that relates the bits of the output sequence with the bits of the key – The designer’s goal To make this system as non-linear as possible The reason – non-linear systems are difficult to solve – there is no general method other than trying all the possible values of the variables: 2 n possibilities for a system with n variables. 62/75
Cryptanalysis of stream ciphers Algebraic attacks (3) – The problem of solving a non-linear system in GF(2) – the satisfiability problem (SAT) – Cook’s theorem (1971) SAT is NP-complete – However, some instances of the SAT problem may be easier to solve – The designer should check the system assigned to the PRNG 63/75
Cryptanalysis of stream ciphers Algebraic attacks (4) – Example – LFSR – The output sequence: 1110… – The initial state: a 0, a 1, a 2, a 3 – The output bits: y 0 =1, y 1 =1, y 2 =1, y 3 =0 – The equations 64/75 a 3210 y y y y Linear system – easy to solve!
Cryptanalysis of stream ciphers Algebraic attacks (5) – Example (1): consider the non-linear PRNG below 65/75
Cryptanalysis of stream ciphers Algebraic attacks (6) – Example (2): The system of equations (1) y1=(x1+x4)(x5+x7)=x1x5+x1x7+x4x5+x4x7 (2) y2=(x1+x4+x3)(x5+x7+x6)= =x1x5+x1x7+x1x6+x4x5+x4x7+x4x6+x3x5+x3x7+x3x6 … (we need 7 independent equations) 66/75
Cryptanalysis of stream ciphers Algebraic attacks (7) – Example (3): Methods of solving the system The brute force method: try all the possible solutions (all zeros are not permitted) The linearization method – Replace all the products by new variables – Solve the obtained linear system (e.g. by Gaussian algorithm) – Try to guess the variables that were included in the products, given the values of the new variables, in such a way that the overall system is consistent 67/75
Cryptanalysis of stream ciphers Algebraic attacks (8) – Example (4): The linearized system y 1 =z 1 +z 2 +z 3 +z 4 y 2 =z 1 +z 2 +z 5 +z 3 +z 4 +z 6 +z 7 +z 8 +z /75
Cryptanalysis of stream ciphers Algebraic attacks (9) – Other methods of solving non-linear systems, applied in cryptanalysis Linear consistency test (LCT) Methods of computational commutative algebra (Gröbner bases etc.) etc. – No matter how sophisticated the method of solving the system is applied, cryptanalysis of a seriously designed system always includes search 69/75
Cryptanalysis of stream ciphers Statistical methods (1) – In the previous example, the majority of the output symbols will be zero, due to the AND combining function – The non-linearity of the assigned system of equations is the highest possible – However, it is possible to make use of bad statistical properties of the output sequence to determine the plaintext sequence 70/75
Cryptanalysis of stream ciphers Statistical methods (2) – Example With the AND output combiner, the probability of zero in the output sequence will be ¾. This means that, upon enciphering with this sequence as the keystream, the probability that the plaintext bit is equal to the ciphertext bit is ¾. Consequence – easy reconstruction of the plaintext. 71/75
Cryptanalysis of stream ciphers Statistical methods (3) – Correlation – The output sequence coincides too much with one or more internal sequences – this enables correlation attacks – a kind of statistical attack. – Correlation attacks It is possible to divide the task of the cryptanalyst into several less difficult tasks – “Divide and conquer” 72/75
Cryptanalysis of stream ciphers Statistical methods (4) – Typical example – the Geffe’s generator 73/75 F balanced – good statistical properties
Cryptanalysis of stream ciphers Statistical methods (5) – Problem: Correlation! 74/75
Cryptanalysis of stream ciphers Statistical methods (6) – Since the output sequence is correlated with both input sequences, we can independently guess the input sequences’ bits with high probability if the output sequence is known. 75/75