Previously, we discussed about “prototyping” code for SHA1 and SHA256

Slides:

Advertisements

Similar presentations

Advertisements

Lecture 5: Cryptographic Hashes

CS364 CH16 Control Unit Operation

RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.

ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Microprogramming Andreas Klappenecker CPSC321 Computer Architecture.

EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.

Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.

1 Background The latest video coding standard H.263 -> MPEG4 Part2 -> MPEG4 Part10/AVC Superior compression performance 50%-70% bitrate saving (H.264 v.s.MPEG-2)

Lecture #32 Page 1 ECE 4110–5110 Digital System Design Lecture #32 Agenda 1.Improvements to the von Neumann Stored Program Computer Announcements 1.N/A.

SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.

© 2010 IBM Corporation Code Alignment for Architectures with Pipeline Group Dispatching Helena Kosachevsky, Gadi Haber, Omer Boehm Code Optimization Technologies.

Team C.O.B.R.A. Derrick Chiu Matthew Denker Kyle Morse Mark Srebro.

1 Data Structures CSCI 132, Spring 2014 Lecture 1 Big Ideas in Data Structures Course website:

Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.

Fundamentals of Programming Languages-II

Information Security and Management 11. Cryptographic Hash Functions Chih-Hung Wang Fall

An optimization of the SAFER+ algorithm for custom hardware and TMS320C6x DSP implementation. By: Sachin Garg Vikas Sharma.

Run-Length Encoding Project (RLE)

Topics to be covered Instruction Execution Characteristics

If the hash algorithm is properly designed and distributes the hashes uniformly over the output space, "finding a hash collision" by random guessing is.

If the hash algorithm is properly designed and distributes the hashes uniformly over the output space, "finding a hash collision" by random guessing is.

Design and Analysis of Low-Power novel implementation of encryption standard algorithm by hybrid method using SHA3 and parallel AES.

Introduction to programming

William Stallings Computer Organization and Architecture 8th Edition

Final Project 6 Submission

Multilevel Memories (Improving performance using alittle “cash”)

Chapter 9 a Instruction Level Parallelism and Superscalar Processors

IS310 Hardware & Network Infrastructure Ronny L

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Overview Instruction Codes Computer Registers Computer Instructions

ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.

These 19 words are given and fixed

פרק 2: חיווט, זיכרונות בנקים זוגיים ואי-זוגיים

Digital Signatures Last Updated: Oct 14, 2017.

Example Best and Median Results

Instruction Level Parallelism and Superscalar Processors

2. Authentication & Message Authentication

Figure 13.1 MIPS Single Clock Cycle Implementation.

Software and Hardware Circular Buffer Operations

Memory Hierarchies.

Lecture 4 Single Cycle Machine Prof. Xiaoyao Liang 2015/3/18

Instruction Level Parallelism and Superscalar Processors

بسم الله الرحمن الرحيم الموضوع:الوضوء صفته وفرائضه وسننه

CPSC 457 Operating Systems

Systems Architecture I (CS ) Lecture 2: A Simplified Computer

Digital Control Systems Waseem Gulsher

Final Testbench: tb_final_shp.sv

Lecture 17 Logistics Last lecture Today HW5 due on Wednesday

PZ01C - Machine architecture

LC-2: The Little Computer 2

ICS 252 Introduction to Computer Design

Alireza Hodjat IVGroup

Created by Vivi Sahfitri

Lecture 17 Logistics Last lecture Today HW5 due on Wednesday

Memory System Performance Chapter 3

ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.

Systems Architecture II

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Today's lecture System Implementation Discrete Time signals generation

ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.

William Stallings Computer Organization and Architecture

SHA: Secure Hash Algorithm

Research: Past, Present and Future

The Secure Hash Function (SHA)

Presentation transcript:

Previously, we discussed about “prototyping” code for SHA1 and SHA256 eval_sha1.sv (219.88 MHz) eval_sha256.sv (129.4 MHz) Today, we will consider prototyping the “unfolding” of SHA1 and SHA256 (2 rounds per cycle) eval_sha1_2x.sv (151.17 MHz, 31% slower Fmax) eval_sha256_2x.sv (86.99 MHz, 33% slower Fmax) Note that doing 2 rounds/cycle does not reduce Fmax by 50%, more like 31-33%.

eval_sha1 #ALUTS = 205, #registers = 680 Fmax = 219.88 MHz

eval_sha1_2x #ALUTS = 384, #registers = 679 Fmax = 151.17 MHz

eval_sha256 #ALUTS = 526, #registers = 774 Fmax = 129.4 MHz

eval_sha256_2x #ALUTS = 940, #registers = 779 Fmax = 86.99 MHz

To implement unfolding, best to read in all 16 16 words from memory (or generate necessary padding) first before processing each block To “hide” the delay of reading in 16 words (or generating padding), can read ahead the 16 words (generate padding) for the next block Unfolding possibly a good design strategy for “DELAY” metric, but you will likely need to do a different design for the “AREA*DELAY” metric. Can further improve unfolding performance by “pipelining” (see Lecture 10 on unfolding) Can also pre-compute the W’s and the K’s as they do not depend on A, B, C, D, E …

To implement a different unfolding or pipelining strategy for each hash algorithm, you can implement a different state machine sequence. e.g.,