Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital signature using MD5 algorithm Hardware Acceleration

Similar presentations


Presentation on theme: "Digital signature using MD5 algorithm Hardware Acceleration"— Presentation transcript:

1 Digital signature using MD5 algorithm Hardware Acceleration
Final Presentation Students: Eyal Mendel & Aleks Dyskin Instructor: Evgeny Fiksman High Speed Digital Systems Laboratory

2 Agenda Introduction HW/SW System Design Performance Evaluation
Conclusions & Summary

3 Agenda Introduction HW/SW System Design Performance Evaluation
Conclusions & Summary

4 Project Goals Hardware Accelerator Design & Implementation
Introduction Hardware Accelerator Design & Implementation Evaluation C to FPGA technique Study case: MD5 algorithm Tool: ASC – A Stream Compiler

5 MD5 Goals/Usage Goal: Usage: Introduction
The MD5 (Message Digest 5)algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem Usage: MD5 is widely used as cryptographic hash function . As an internet standard RFC1321, MD5 has been employed in wide variety of security applications, commonly used to check the integrity of files. The MD5 (Message Digest 5)algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem .The MD5 algorithm takes as input a message of arbitrary length and produces as output a 128-bit signature of the input. The algorithm consists of 5 steps, that are performed to compute the message digest of the message/ file:

6 MD5 steps (1) Introduction Step 1: Append Padding Bits
The message is "padded" so that its length (in bits) is congruent to 448, modulo 512. Step 2: Append Length A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step.

7 with the low-order byte of A, and end with the high-order byte of D.
MD5 steps (2) Introduction Step 3: Initialize MD buffer a=0x ; b=0xefcdab89; c=0x98badcfe; d=0x Step 4-5: Process message in 16-word blocks and Output 1.The message digest produced as output is A, B, C, D. That is, we begin with the low-order byte of A, and end with the high-order byte of D. 2. A four-word buffer (A,B,C,D) is used to compute the message digest. Here each of A, B, C, D is a 32-bit register. These registers are initialized to the following values in hexadecimal, low-order bytes first): 3. We first define four auxiliary functions that each take as input three 32-bit words and produce as output one 32-bit word:

8 ASC Overview Introduction
ASC (A Stream Compiler) simplifies exploration of hardware accelerators by transforming the hardware design task into a software design process using only ’gcc’ and ’make’ to obtain a hardware netlist. Single C++ program with custom types and operators is the only syntax needed. ASC provides all the environment and implements all the protocols needed to communicate between HW module and CPU.

9 SW Model Evaluation(1) Introduction Accelerated Part
Maximum speed up in ideal case is: (process and speed_up takes 0 sec to evaluate) The evaluation for the finish stage was done for the worst case: i.e. the append_bits step is performed. In general case the append_bits is performed only once per file/string. All the measurements were held on Xilinx PowerPC

10 SW Model Evaluation(2) Introduction
For huge chunks amount the total speed up will be: Where: n is number of chunks Tsw1,Thw1 is average time of not_last chunk execution Tsw2,Thw2 is average time of the last chunk execution

11 Agenda Introduction HW/SW System Design Performance Evaluation
Conclusions & Summary

12 System High-Level SW/HW System Design
This module serves as input/output of the system, starting and finishing the process. Manages MD5 hardware interface. Serial communication manager between PC and M310 board Step 4 implementation SW reference module for comparison

13 SW/HW algorithm flow SW/HW System Design

14 HW Accelerator insights
SW/HW System Design Basic structure of the hardware module after the initial design “on paper” : is based on the periodical updates of the registers for each 4 inputs. Each process cycle four 32-bit words are sent to the processing unit and each register is updated ones.

15 Processing Unit SW/HW System Design Detailed explanation of one process cycle : Problem- which result is relevant for given ‘i’. each register is updated once and serves as an input to the next word processing. It is important to mention that registers are being updated only over process , i.e. only at the next cycle process. In each block all 4 logical functions are being performed and only one of them is being chosen by the help of the mask mechanism (next slide)… The process cycle is being run 16 times per 512 bit input (32bit*16=512bit)

16 Function Masking SW/HW System Design
After each process cycle we need to choose the right result from 4 available (4 functions) in order to update the registers. The mechanism is based on counter register i. All the mask variables (F_mask,G_mack, etc.) are 32-bit unsigned and are initialized to ‘0’. The proper variable will get the value 0xFFFFFFFF according to counter register’s 2-3 bits.

17 ? T-Table access(1) SW/HW System Design
Every process cycle we need to fetch 32X4=128bits from the T-table a Problem: ASC supports only 32bit wide memories b Using 2-port BRAM result in 2 clock cycles

18 T-Table Access (2) SW/HW System Design
Each memory bank has 2 address sets pins: one for odd addresses and one for even addresses. T0 and T1 get their values from Low memory bank, and T2, T3 get their value from High memory bank.

19 Agenda Introduction HW/SW System Design Performance Evaluation
Conclusions & Summary

20 HW Module Performance Performance Evaluation
One data process of 512 bits takes: 680ns S_CYCLE=4 clock cycles S_ LOOP = 16+1

21 Measurements (1) Performance Evaluation String Software Hardware 99.92
Init. Append Finish_SW Total Finish_HW ‘a’ 2.1 6.68 91.14 99.92 =66.98 75.76 ‘Aleks’ 8.58 89.62 100.3 =64.78 75.46 ‘message digest’ 13.1 86.2 101.4 =57.88 73.08 All 56-byte strings 8.77 73.24 84.11 =50.78 61.65 All times are in usec Finish_SW=append Bits_SW+Process_SW+Output_SW Finish_HW=append Bits_SW+Process_HW+Output_SW Average speed-up HW-SW = times

22 Measurements (2) Performance Evaluation String Finish Software
Finish Hardware Append bits Process Output ‘a’ 64.1 24.84 2.2 0.68 ‘Aleks’ 62 25.52 2.1 ‘message digest’ 55 29 All 56-byte strings 47.9 23.14 All times are in usec

23 Agenda Introduction HW/SW System Design Performance Evaluation
Conclusions & Summary

24 Conclusions(1) Conclusions & Summary
x1.35 Speedup with HW implementation (Worst Case). The expected Speed Up in ideal case for one chunk is: The theoretical speedup of larger than can be achieved with large data chunks, when append_bit is evaluated only for the last chunk. In that case the ideal speed up of 2.83 is expected, but in reality the speed up of ~ 2.75 is reached from measurments (graph next slide) ASC tool proved the ability to implement complicated hardware modules with the use of few software commands and its code is easy_to_read This is the worst case (lower boundary) when in all chunk we perform the append steps. In reality we do it only for the last “partial” chunk Based on average speedup calculation in the previous slide

25 Conclusions(2) Speed Up Prediction When:
T1s,T1h is average time of not_last chunk execution T2s,T2h is average time of the last chunk execution su2 is speed up for not_last chunk su1 is speed up for the last chunk n is number of chunks

26 Summary Conclusions & Summary
We learned ASC :design approach, debug and synthesize process. We showed the feasibility of MD5 implementation with ASC Implementation design of algorithm from pseudo code to hardware Masking mechanism Parallel processing and mux-ing the appropriate result Overcoming over the limitations of hardware by creative approach (memory imp.) Flow control Project goals were partially achieved The File version was not implemented

27 Further Work Conclusions & Summary
Further acceleration can be reached using pipe line architecture: File version further development.

28 The End Thank you for your time.


Download ppt "Digital signature using MD5 algorithm Hardware Acceleration"

Similar presentations


Ads by Google