Previously, we discussed about “prototyping” code for SHA1 and SHA256

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

Lecture 5: Cryptographic Hashes
CS364 CH16 Control Unit Operation
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
Microprogramming Andreas Klappenecker CPSC321 Computer Architecture.
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
1 Background The latest video coding standard H.263 -> MPEG4 Part2 -> MPEG4 Part10/AVC Superior compression performance 50%-70% bitrate saving (H.264 v.s.MPEG-2)
Lecture #32 Page 1 ECE 4110–5110 Digital System Design Lecture #32 Agenda 1.Improvements to the von Neumann Stored Program Computer Announcements 1.N/A.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
© 2010 IBM Corporation Code Alignment for Architectures with Pipeline Group Dispatching Helena Kosachevsky, Gadi Haber, Omer Boehm Code Optimization Technologies.
Team C.O.B.R.A. Derrick Chiu Matthew Denker Kyle Morse Mark Srebro.
1 Data Structures CSCI 132, Spring 2014 Lecture 1 Big Ideas in Data Structures Course website:
Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.
Fundamentals of Programming Languages-II
Information Security and Management 11. Cryptographic Hash Functions Chih-Hung Wang Fall
An optimization of the SAFER+ algorithm for custom hardware and TMS320C6x DSP implementation. By: Sachin Garg Vikas Sharma.
Run-Length Encoding Project (RLE)
Topics to be covered Instruction Execution Characteristics
If the hash algorithm is properly designed and distributes the hashes uniformly over the output space, "finding a hash collision" by random guessing is.
If the hash algorithm is properly designed and distributes the hashes uniformly over the output space, "finding a hash collision" by random guessing is.
Design and Analysis of Low-Power novel implementation of encryption standard algorithm by hybrid method using SHA3 and parallel AES.
Introduction to programming
William Stallings Computer Organization and Architecture 8th Edition
Final Project 6 Submission
Multilevel Memories (Improving performance using alittle “cash”)
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
IS310 Hardware & Network Infrastructure Ronny L
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Overview Instruction Codes Computer Registers Computer Instructions
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
These 19 words are given and fixed
פרק 2: חיווט, זיכרונות בנקים זוגיים ואי-זוגיים
Digital Signatures Last Updated: Oct 14, 2017.
Example Best and Median Results
Instruction Level Parallelism and Superscalar Processors
2. Authentication & Message Authentication
Figure 13.1 MIPS Single Clock Cycle Implementation.
Software and Hardware Circular Buffer Operations
Memory Hierarchies.
Lecture 4 Single Cycle Machine Prof. Xiaoyao Liang 2015/3/18
Instruction Level Parallelism and Superscalar Processors
بسم الله الرحمن الرحيم الموضوع:الوضوء صفته وفرائضه وسننه
CPSC 457 Operating Systems
Systems Architecture I (CS ) Lecture 2: A Simplified Computer
Digital Control Systems Waseem Gulsher
Final Testbench: tb_final_shp.sv
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
PZ01C - Machine architecture
LC-2: The Little Computer 2
ICS 252 Introduction to Computer Design
Alireza Hodjat IVGroup
Created by Vivi Sahfitri
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Memory System Performance Chapter 3
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
Systems Architecture II
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Today's lecture System Implementation Discrete Time signals generation
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
William Stallings Computer Organization and Architecture
SHA: Secure Hash Algorithm
Research: Past, Present and Future
The Secure Hash Function (SHA)
Presentation transcript:

Previously, we discussed about “prototyping” code for SHA1 and SHA256 eval_sha1.sv (219.88 MHz) eval_sha256.sv (129.4 MHz) Today, we will consider prototyping the “unfolding” of SHA1 and SHA256 (2 rounds per cycle) eval_sha1_2x.sv (151.17 MHz, 31% slower Fmax) eval_sha256_2x.sv (86.99 MHz, 33% slower Fmax) Note that doing 2 rounds/cycle does not reduce Fmax by 50%, more like 31-33%.

eval_sha1 #ALUTS = 205, #registers = 680 Fmax = 219.88 MHz

eval_sha1_2x #ALUTS = 384, #registers = 679 Fmax = 151.17 MHz

eval_sha256 #ALUTS = 526, #registers = 774 Fmax = 129.4 MHz

eval_sha256_2x #ALUTS = 940, #registers = 779 Fmax = 86.99 MHz

To implement unfolding, best to read in all 16 16 words from memory (or generate necessary padding) first before processing each block To “hide” the delay of reading in 16 words (or generating padding), can read ahead the 16 words (generate padding) for the next block Unfolding possibly a good design strategy for “DELAY” metric, but you will likely need to do a different design for the “AREA*DELAY” metric. Can further improve unfolding performance by “pipelining” (see Lecture 10 on unfolding) Can also pre-compute the W’s and the K’s as they do not depend on A, B, C, D, E …

To implement a different unfolding or pipelining strategy for each hash algorithm, you can implement a different state machine sequence. e.g.,