Hash Function Performance Metrics

Slides:



Advertisements
Similar presentations
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Advertisements

Comprehensive environment for benchmarking using FPGAs: ATHENa - Automated Tool for Hardware EvaluatioN 1.
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
ECE 448 – FPGA and ASIC Design with VHDL Lecture 13 PicoBlaze I/O & Interrupt Interface.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
ECE 545 Project 1 Part IV Key Scheduling Final Integration List of Deliverables.
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Automated Design of Custom Architecture Tulika Mitra
Lecture 9 RTL Design Methodology Sorting Example.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
J. Christiansen, CERN - EP/MIC
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Characterization Presentation Spring 2010 ASIC Tester Abo-Raya Dia- 4 th year student Damouny Samer- 4 th year student 10-April1 Supervised by: Ina Rivkin.
1ECE 545 – Introduction to VHDL Project Deliverables.
ECE 545 Project 2 Specification Part I. Adjust your synthesizable code for Project 1 in such a way that it complies with the following requirements: a.
ECE 545 Project 2 Specification. Project 2 (15 points) – due Tuesday, December 19, noon Application: cryptography OR digital signal processing optimized.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Lecture 9 RTL Design Methodology. Structure of a Typical Digital System Datapath (Execution Unit) Controller (Control Unit) Data Inputs Data Outputs Control.
CS548_ ADVANCED INFORMATION SECURITY Jong Heon, Park / Hyun Woo, Cho Evaluation of Hardware Performance for the SHA-3 Candidates Using.
This material exempt per Department of Commerce license exception TSU Synchronous Design Techniques.
RTL Design Methodology Transition from Pseudocode & Interface
Lecture 5B Block Diagrams HASH Example.
Lecture 3 RTL Design Methodology Transition from Pseudocode & Interface to a Corresponding Block Diagram.
Lecture 13 PicoBlaze I/O & Interrupt Interface Example of Assembly Language Routine ECE 448 – FPGA and ASIC Design with VHDL.
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
Encryption / Decryption on FPGA Final Presentation Written by: Daniel Farcovich ID Saar Vigodskey ID Advisor: Mony Orbach Summer.
بسم الله الرحمن الرحيم MEMORY AND I/O.
ECE 545 Project 1 Introduction & Specification Part I.
System-on-Chip Design Homework Solutions
HISTORY OF MICROPROCESSORS
RTL Design Methodology Transition from Pseudocode & Interface
Introduction to Programmable Logic
Cache Memory Presentation I
8085 Microprocessor Architecture
FPGA Implementation of Multicore AES 128/192/256
RTL Design Methodology
RTL Design Methodology
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Implementing Combinational and Sequential Logic in VHDL
Lecture 16 PicoBlaze I/O & Interrupt Interface
Message-Digest 5 (MD5) Hash Reversal System
Lecture 18 PicoBlaze I/O Interface
RTL Design Methodology
Project Deliverables ECE 545 – Introduction to VHDL.
Lecture 18 SORTING in Hardware.
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE 448: Spring 2015 Lab 3 FPGA Design Flow Based on Aldec Active-HDL.
Lecture 13 PicoBlaze I/O & Interrupt Interface
8085 Microprocessor Architecture
ECE 448: Spring 2018 Lab 3 – Part 2 FPGA Design Flow Based on
RTL Design Methodology Transition from Pseudocode & Interface
Lecture 15 PicoBlaze I/O & Interrupt Interface
Lecture 13 PicoBlaze I/O & Interrupt Interface
Lecture 17 PicoBlaze I/O & Interrupt Interface
RTL Design Methodology
RTL Design Methodology Transition from Pseudocode & Interface
RTL Design Methodology
RTL Design Methodology Transition from Pseudocode & Interface
RTL Design Methodology
RTL Design Methodology
RTL Design Methodology Transition from Pseudocode & Interface
RTL Design Methodology
RTL Design Methodology
RTL Design Methodology
ECE 448 Lab 3 – Part 1 FPGA Design Flow Based on
ECE 448: Spring 2018 Lab 3 – Part 2 Design of Controllers
RTL Design Methodology
Presentation transcript:

Hash Function Performance Metrics GMU SHA Core Interface & Hash Function Performance Metrics

Interface

Why Interface Matters? Pin limit Support for the maximum throughput Total number of i/o ports ≤ Total number of an FPGA i/o pins Support for the maximum throughput Time to load the next message block ≤ Time to process current block

Interface: Two possible solutions msg_bitlen SHA core message end_of_msg zero_word Length of the message communicated at the beginning Dedicated end-of-message port + easy to implement passive source circuit − area overhead for the counter of message bits − more intelligent source circuit required + no need for internal message bit counter

SHA Core: Interface & Typical Configuration clk rst clk rst clk rst clk rst clk rst clk rst Input FIFO SHA core Output FIFO ext_idata idata odata ext_odata din dout din dout din dout w w w w fifoin_full fifoin_empty fifoout_full fifoout_empty full empty src_ready dst_ready full empty fifoin_write fifoin_read fifoout_write fifoout_read write read src_read dst_write write read SHA core is an active component; surrounding FIFOs are passive and widely available Input interface is separate from an output interface Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel

SHA Core Interface SHA core w din dout src_ready src_read dst_ready dst_write clk rst

SHA Core Interface + Surrounding FIFOs fifoin_empty fifoin_read idata w odata fifoout_full fifoout_write fifoin_full fifoin_write fifoout_empty fifoout_read Input FIFO SHA core clk rst ext_idata ext_odata din dout src_ready src_read dst_ready dst_write full empty write read Output

Operation of FIFO

Communication Protocol for Unpadded Messages msg_bitlen zero_word −−−−− message w bits . seg_0_bitlen seg_0 seg_1_bitlen seg_1  seg_n-1_bitlen seg_n-1 a) b)

SHA Core Interface with Additional Faster I/O Clock din dout src_ready src_read dst_ready dst_write clk rst io_clk

SHA Core Interface with Two Clocks + Surrounding FIFOs fifoin_empty fifoin_read idata w odata fifoout_full fifoout_write fifoin_full fifoin_write fifoout_empty fifoout_read Input FIFO SHA core clk rst ext_idata ext_odata din dout src_ready src_read dst_ready dst_write full empty write read Output io_clk

Communication Protocol for Padded Messages Without Message Splitting w bits msg_len_ap | last = 1 msg_len_bp message msg_len_ap – message length after padding [bits] msg_len_bp – message length before padding [bits]

Communication Protocol for Padded Messages With Message Splitting . seg_0_len_ap | last=0 seg_0 w bits seg_1_len_ap | last=0 seg_1  seg_n-1_len_ap | last=1 seg_n-1 seg_n-1_len_bp seg_i_len_ap – segment i length after padding* [bits] seg_i_len_bp – segment i length before padding [bits] * For all i < n-1 segment i length after padding is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the word size, w. The last segment cannot consist of only padding bits. It must include at least one message bit.

Performance Metrics

Performance Metrics - Speed Throughput for Long Messages [Mbit/s] Throughput for Short Messages [Mbit/s] Execution Time for Short Messages [ns] Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries)

Performance Metrics - Speed Time to hash N blocks of message [cycles] = Htime(N) The exact formula from analysis of a block diagram, confirmed by functional simulation. Minimum Clock Period [ns] = T From a place & route and/or static timing analysis report file.

Time to Hash N Blocks of the Message [clock cycles]

Performance Metrics - Speed Minimum time to hash N blocks of message [ns] = Htime(N)⋅T block_size Maximum Throughput (for long messages) = T * (Htime(N+1) - Htime(N)) block_size = T * block_processing_time Effective maximum throughput for short messages:

Performance Metrics - Speed from specification Maximum Throughput (for long messages) block_size = T * block_processing_time from place & route report and/or static timing analysis report from analysis of block diagram and/or functional simulation

Performance Metrics - Area For the basic, folded, and unrolled architectures, we force these vectors to look as follows through the synthesis and implementation options: Areaa

Choice of Optimization Target Primary Optimization Target: Throughput to Area Ratio Features: practical: good balance between speed and cost very reliable guide through the entire design process, facilitating the choice of high-level architecture implementation of basic components choice of tool options leads to high-speed, close-to-maximum-throughput designs

Our Design Flow Specification Interface Controller Template Datapath Block diagram Controller ASM Chart Library of Basic Components VHDL Code Formulas for Throughput & Hash time Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages

How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) Performance for long message = 2048

How to compare hardware speed vs. software speed? 8 bits/byte ⋅ clock frequency [GHz] Throughput [Gbit/s] = Performance for long message [cycles/byte]