Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTICAL AND PERFORMANCE ANALYSIS OF SHA-3 HASH CANDIDATES Ashok V Karunakaran Department of Computer Science Rochester Institute of Technology Committee.

Similar presentations


Presentation on theme: "STATISTICAL AND PERFORMANCE ANALYSIS OF SHA-3 HASH CANDIDATES Ashok V Karunakaran Department of Computer Science Rochester Institute of Technology Committee."— Presentation transcript:

1 STATISTICAL AND PERFORMANCE ANALYSIS OF SHA-3 HASH CANDIDATES Ashok V Karunakaran Department of Computer Science Rochester Institute of Technology Committee Chair: Prof. Stanislaw Radziszowski. Reader: Prof. Peter Bajorski. Observer: Prof. Christopher Homan.

2 Project Abstract Randomness - A good hash function should behave as close to a random function as possible. Statistical tests help in determining the randomness of a hash function and NIST has provided a series of tests in a statistical test suite for this purpose. This tool has been used to analyze the randomness of the final five hash functions. Performance - It is the second most important factor in determining a good hash function. Performance of the all the fourteen Round 2 candidates was measured using Java as the programming language on Sun platform machines for small sized messages. Security - Security is the most important criteria when it comes to hash functions. Grøstl is one of the final five candidates and its architecture, design and security features have been studied in detail. Some of the successful attacks on reduced versions have also been explained. Also, the lesser known candidates, Fugue and ECHO, from Round 2 have been studied.

3 Hash function Input: String of arbitrary size. Output: Predetermined fixed size string.

4 Hash function requirements Pre-image, second pre-image and collision resistant. Collisions – When we find x and y such that h(x) = h(y). Birthday paradox – Gives lower bound on collision attack ◦ q ≈ 1.17√m for ε = ½ (m = 365, q = 23). ◦ Birthday bound for a m-bit message is 2 m/2.

5 The need for a new hash function Most commonly used hash functions are broken ◦ Collisions in MD5 and SHA-0. ◦ Security flaws in SHA-1. Increasing hardware power and parallelization capabilities.

6 SHA-3 Competition Organized by NIST. Started on Nov. 2, 2007. Received 64 entries. 51 met minimum requirements. Round 1 ◦ First candidate conference at KU Leuven, Belgium on Feb 25-28, 2009. ◦ 14 candidates on July 24, 2009.

7 Round 2 and 3 Round 2 ◦ Second candidate conference at Santa Barbara, CA on August, 23-24, 2010. ◦ 5 candidates on Dec. 9, 2010. Round 3/ Final Round ◦ Final conference in Spring 2012. ◦ Select a winner later in 2012.

8 Round 2 and 3 Candidates BLAKE BMW CubeHash ECHO Fugue Grøstl Hamsi JH Keccak Luffa Shabal SHAvite-3 SIMD Skein

9 Randomness and Statistics Hash function should behave indistinguishably from a random function. Avoid finding patterns, which lead to collisions. Statistical randomness tests to determine hash function randomness. Pseudo-randomness is sufficient.

10 Statistical Tests Motivation: Decide whether a particular statement or claim is correct. Null hypothesis: The output of a hash function is random, irrespective of the input. Alternative hypothesis: The output is not random. Test statistic: Computed from sample data. Helps in deciding whether to reject/accept the null hypothesis.

11 NIST Test Suite Statistical test suite for random and pseudo-random number generators for cryptographic applications. Helpful in detecting deviations of a binary sequence from randomness. Total of 15 tests. Ex., Frequency Test, Longest runs of ones in a block.

12 P-value and Significance level P-value is calculated from the test statistic. The probability that a perfect random number generator would have produced a sequence less random than the sequence that was tested. P-value = 1implies perfect randomness. P-value = 0 implies complete non- randomness.

13 P-value and Significance level (cont.) Significance level ( α ) denotes the probability of Type 1 error. ◦ False positive, occurs when a statistical test rejects a true null hypothesis. If P-value ≥ α then the null hypothesis is accepted. ◦ Meaning, the sequence appears to be random. If P-value < α then the null hypothesis is rejected.

14 P-value and Significance level (cont.) For the project, ◦ α = 0.01 ◦ One would expect 1 sequence in 100 sequences to be rejected. ◦ P-value ≥ 0.01 indicates that the sequence would be considered random with a confidence of 99%. ◦ P-value < 0.01 indicates that the sequence is considered non-random with a confidence of 99%.

15 Frequency Test Tests the proportion of zeros and ones in the sequence. For a random sequence, the proportion should be the same. Test Description: ◦ Convert bits to -1 or +1 and then add. S n = X 1 + X 2 + … + X n. For ex., if ε = 1011010101, then n =10 and S n = 2.

16 Frequency Test (cont.) ◦ Compute the test statistic, S obs = Mod( S n ) ⁄ √n. S obs = 2 ⁄ √10 =.63245 ◦ Compute P-value = erfc(S obs ⁄ √2). P-value = erfc(.63245 ⁄ √2) = 0.527089. Decision: P-value > 0.01, so accept sequence as random.

17 Longest Runs of one in a block Tests the longest run of ones within M-bit blocks. It should be similar to what is expected of a random sequence. Test Description: ◦ Input: 110011000001010101101100010011001110000000000010010011010101000100010011110101101000000 01101011111001100111001101101100010110010. ◦ Input length n: 128 bits. ◦ Divide the input into M-bit blocks. M = 8.

18 Longest Runs of one in a block (cont.) ◦ Longest run of ones in each subblock is noted ◦ Calculate the frequencies of the longest run ν 0 = 4; ν 1 = 9; ν 2 = 3; ν 4 = 0. ◦ Compute X 2 (obs), it is a measure of how well the observed longest run length matches the expected longest length within M-bit blocks. SubblockMax-RunSubblockMax-Run 110011002000101011 011011002010011002

19 Longest Runs of one in a block (cont.)

20 Inputs for the experiment Numbers – Hash of numbers 0-3999. ◦ Tests require length of at least 10 6 bits. ◦ For 256 bit output, 256 x 4000 = 1,024,000 bits. KAT Inputs – 2048 hexadecimal inputs from the official candidate documentation.

21 Inputs for the experiment (cont.) From file – The NIST document on the statistical test suite. ◦ Every 10Kb – Each input block has 10Kb. The first input is the first 10Kb, second input skips first m=1Kb and takes next n=10Kb. ◦ Every 100Kb – Each input block has 100Kb. In this case, every 100 bytes are skipped before the next input block. Ensures there is some over-lapping and non-overlapping in the data blocks.

22 Output for BLAKE-256 TestsNumbersKAT10Kb100Kb App. Entropy0.5314030.1329280.3650770.476437 Block Freq.0.5503320.9993490.1051590.634999 Cumulative Sums 0.324573, 0.201009 0.988702, 0.943249 0.000432, 0.001383 0.129711, 0.221312 FFT0.2042330.6559760.2551070.617123 Frequency0.1874120.7654660.0009660.127740 Linear Complex 0.8674030.3124390.5519780.693519 Longest Run0.0954830.3822460.6970270.936944 Overlapping Template 0.0994960.7188460.1807990.214866 Rank0.0779480.1626800.9467970.843130

23 Output for BLAKE-256 (contd.) TestsNumbersKAT10Kb100Kb Runs0.7535260.9780620.8632150.048920 Serial0.876547, 0.838931 0.252703, 0.520978 0.625307, 0.854685 0.988346, 0.986553 Universal0.8610280.0571510.3829270.833105 Non- overlapping Template 0.272553, 0.156433 0.748985, 0.001491 0.013372, 0.593525 0.376109, 0.329376 Random Excursions 0.560459, 0.148643 0.997930, 0.945050 0.000000, 0.000000 0.381784, 0.935452 Random Excursions Variant 0.612882, 0.582494 0.163078, 0.205123 0.000000, 0.000000 0.219435, 0.393705 Total Bits1024000524288167705616936192 No. of 0’s5113332620368406658464962 No. of 1’s5126672622528363918471230

24 Results and Conclusions 0.0 P-values don’t indicate failed tests but inapplicable tests for input. All hash functions are random. ◦ Failed results are outliers rather than the norm. ◦ Aren’t enough to classify as non-random. Areas of failed tests can be explored further.

25 Performance Second most important criteria. Most of the work has been done with C as the programming language. The following combination has not been studied comprehensively before ◦ Language – Java ◦ Platform – Sun ◦ Messages size – Small

26 Specification Machine – Sun Microsystems Ultra 20. Config – AMD 2.2GHz processor. OS – OS5.10 or Solaris 10. Small messages – size < 8192 bytes. Java code – Sphlib, hash function implementations in C and Java.

27 Candidates256512 I/p=1024bytesMbytes/sCycles/byteMbytes/sCycles/byte SHA-257.903819.69111.73 BLAKE45.548.3527.4880.06 Grøstl11.56190.316.87320.23 JH8.33264.118.33264.11 Keccak12.63174.196.89319.3 Skein38.2457.5330.1173.07 Hamsi18.50118.927.12308.99 BMW42.8951.2936.8459.72 CubeHash23.7592.6323.8792.17 ECHO11.24195.735.75382.61 Fugue22.6996.9611.62189.33 Luffa33.2666.1518.97115.97 Shabal104.3721.08103.3621.28 SHAvite24.1191.2513.97157.48 SIMD12.10181.820.752933.33

28 256 output bits

29 512 output bits

30 Performance and Message length Most of them claim performance is better than SHA-2. Interesting to see how it is affected by message length. For final five candidates, 16-byte and 4096-byte inputs were hashed.

31 Performance and Message length (cont.) Candidates16-2564096-25616-5124096-512 SHA-211.8961.432.3921.93 BLAKE10.9347.683.4729.99 Grøstl2.812.380.677.74 JH1.88.751.78.64 Keccak1.5213.71.567.26 Skein9.1838.773.7831.76

32 Performance and Message length (cont.) Rate of hashing  Keccak-256 > SHA-256.  Grøstl-512 > SHA-512.

33 Performance and Block size For JH, the performance remains the same for 256 and 512 version. ◦ Only one large internal state of 1024 bits. For BLAKE and Keccak, the performance difference is almost twice. ◦ The 256 version has block size of 512 whereas the 512 version has block size of 1024.

34 Candidates256512 I/p=1024bytesMbytes/sCycles/byteMbytes/sCycles/byte SHA-257.903819.69111.73 BLAKE45.548.3527.4880.06 Grøstl11.56190.316.87320.23 JH8.33264.118.33264.11 Keccak12.63174.196.89319.3 Skein38.2457.5330.1173.07 Hamsi18.50118.927.12308.99 BMW42.8951.2936.8459.72 CubeHash23.7592.6323.8792.17 ECHO11.24195.735.75382.61 Fugue22.6996.9611.62189.33 Luffa33.2666.1518.97115.97 Shabal104.3721.08103.3621.28 SHAvite24.1191.2513.97157.48 SIMD12.10181.820.752933.33

35 Hardware vs Software implementation Visualizing area-time tradeoffs for SHA-3 has hardware implementation of the candidates.

36 Hardware vs Software implementation HardwareSoftware 1) Keccak1) Shabal 2) CubeHash2) Skein 3) JH3) BLAKE 4) Shabal4) CubeHash 5) Skein5) Luffa 6) Fugue6) SHAvite-3 7) Luffa7) Fugue 8) BLAKE8) JH 9) Hamsi 10) SHAvite-310) Keccak 11) Grøstl

37 Hardware vs Software implementation (cont.) Among the final five candidates ◦ Grøstl remains last in both implementations. ◦ Keccak has the biggest difference in terms of position. ◦ JH and BLAKE swap positions with BLAKE performing better in software. ◦ Skein is the only one to perform reasonably well in both.

38 Security of Grøstl One of the final five candidates. Developed at the University of Denmark. What makes Grøstl interesting? ◦ Does not use block cipher components like SHA family. ◦ Based on few individual permutations. ◦ Borrows components from AES like the S- box.

39 Hash Function Construction Message M is padded and split into l bit message blocks. o If H(x) <= 256, l = 512 else l = 1024. The compression function f is as follows: h i ← f (h i-1, m i ) for i = 1 to t. Initial value of h, h 0 = iv is predefined. The final value of h, h t is passed to the output transformation function H(M) = Ω(h t )

40 Compression Function Based on two permutations P and Q. Defined as f(h, m) = P(h ⊕ m) ⊕ Q(m) ⊕ h Design of P and Q Inspired from Rijndael. Consists of r rounds, which consists of a number of round transformations.

41 Design of P and Q (cont.) The four round transformations o AddRoundConstant o SubBytes o ShiftBytes o MixBytes One round consists of the above transformations in the following order R = MixBytes ◦ ShiftBytes ◦ SubBytes ◦ AddRoundConstant.

42 Byte Sequence to State Matrix Mapping is done in a similar way to Rijndael. The 64-byte sequence 00 01 02 … 3f is mapped to a 8x8 matrix

43 AddRoundConstant Adds a round dependent constant to the matrix. Transformation in round i is defined as A ← A ⊕ C[i]

44 SubBytes Each byte in the matrix is substituted with a corresponding value from the S-box. S-box is same as the one used in Rijndael. The transformation is as follows a i,j ← S(a i,j ), 0 ≤ i < 8, 0 ≤ j < v. a i,j is the element in row i and column j.

45 ShiftBytes Shifts the bytes within a row to the left by a number of positions. In round i, all bytes in row i are shifted σ positions to the left. σ = [0, 1, 2, 3, 4, 5, 6, 7]

46 MixBytes Each column in the matrix is multiplied by a constant 8x8 matrix. The transformation is defined as A ← B × A.

47 Output Transformation Defined as Ω(x) = trunc n (P(x) ⊕ x) trunc n (x) discards all but the trailing n bits of x. n is the length of the message digest.

48 Cryptanalysis Differential Cryptanalysis There are at least 9 2 active S-boxes in a 4 round differential trail. o MixBytes ensures branch number is 9. Meaning, a difference of k >0 bytes of a column will result in a difference of at least 9-k bytes after one mix bytes operation. o ShiftBytes moves bytes in one column to 8 different columns. Maximum distance propagation probability of S- box = 2 -6.

49 Cryptanalysis (cont.) Linear Cryptanalysis o Propagates similar to differential trail. o Max distance propagation of S-box = 2 -3. Integrals o Sets of plaintexts are chosen with one part held constant and other part varies through all possibilities. o For ex., an attack may chose 256 plaintexts that have all but 8 of their bits the same, but all differ in those 8 bits. o Has an XOR sum of 0. o XOR sums of corresponding ciphertexts provide information about the cipher’s operation.

50 Integrals (cont.)  Similar to integrals on AES.  Grøstl- 256 o 2 120 texts for 6 and 7 rounds. o The texts are balanced in every byte of input and output.  Grøstl-512 o 2 704 for 8 and 9 rounds. o For 8 rounds, the texts are balanced in every byte of input and output. o For 9 rounds, every byte of input and every bit of output is balanced.  Conclusion: Integrals cannot expose non-random behavior in Grøstl.

51 Cryptanalysis (cont.) Algebraic Cryptanalysis o Attack on AES S-box, which is used by Grøstl. o There are 200 S-box applications in AES for 1 encryption, it gives 8000 quadratic equations with 1600 variables (the solution derives the key). o The time complexity of solving this is unknown. o Grøstl-256 and Grøstl-512 have 1280 and 3584 S-box applications, respectively.

52 Rebound Attack Can be applied on block or permutation based ciphers. Consists of two phases: ◦ Inbound phase: Meet-in-the-middle (E in ) plus exploiting the available degrees of freedom.

53 Rebound Attack (cont.) ◦ Outbound phase: Use the values obtained from the inbound phase to move in the forward (E fw ) and backward (E bw ) directions to find collisions. Collisions found on reduced Grøstl ◦ Grøstl-256: 4 out of 10 rounds. ◦ Grøstl-512: 5 out of 12 rounds.

54 Internal Differential Attack Exploits the differential trails between parallel computations that are not distinct enough. The idea is to device a differential path that represents the difference between the two paths rather than the differences between the inputs. Grøstl has two permutations, P and Q, which are very similar to each other.

55 Internal Differential Attack (cont.) Compute two internal states, A and B. o A ⊕ B = Δ in. o P(A) ⊕ Q(B) = Δ out. Collisions Found: o Grøstl-256: 5 rounds, 2 79 computations and 2 64 memory. o Grøstl-512: 6 rounds, 2 177 computations and 2 64 memory. P and Q were modified in the final round to make them more different.

56 Conclusion Frontrunners among the five ◦ Performance:  Good: BLAKE and Skein.  Bad: Keccak.  Ugly: Grøstl and JH. ◦ Randomness tests: Weakest is BLAKE. ◦ Novel algorithm: Skein and Keccak. ◦ Potential Winners: Skein or Keccak.

57 Thank You. Questions?


Download ppt "STATISTICAL AND PERFORMANCE ANALYSIS OF SHA-3 HASH CANDIDATES Ashok V Karunakaran Department of Computer Science Rochester Institute of Technology Committee."

Similar presentations


Ads by Google