Kris Gaj Office hours: Monday, 7:30-8:30 PM Tuesday, 6:00-7:00 PM, and by appointment Research and teaching interests: cryptography computer arithmetic.

Slides:



Advertisements
Similar presentations
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Advertisements

ECE 645 – Computer Arithmetic Lecture 11: Advanced Topics and Final Review ECE 645—Computer Arithmetic 4/22/08.
Multioperand Addition Lecture 6. Required Reading Chapter 8, Multioperand Addition Note errata at:
CPT 310 Logic and Computer Design Instructor: David LublinerPhone Engineering Technology Dept.Cell
Kris Gaj Office hours: Monday, 3:00-4:00 PM, Wednesday, 3:00-4:00 PM, 7:30-8:30 PM and by appointment Research and teaching interests: cryptography computer.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Design of a Reconfigurable Hardware For Efficient Implementation of Secret Key and Public Key Cryptography.
CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 1: Introduction and Numbers.
Distributed Arithmetic: Implementations and Applications
M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Improving Cryptographic Architectures by Adopting Efficient.
ECE 232 L1 Intro.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 1 Introduction.
Kris Gaj Office hours: Monday, 6:00-7:00 PM, Tuesday 7:30-8:30 PM, Thursday, 4:30-5:30 PM, and by appointment Research and teaching interests: cryptography.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Cryptography and Network Security
Montgomery Multipliers & Exponentiation Units
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
Kris Gaj Office hours: Monday, 6:00-7:00 PM Tuesday, Thursday, 7:30-8:30 PM, and by appointment Research and teaching interests: cryptography computer.
ECE 545 Project 1 Part IV Key Scheduling Final Integration List of Deliverables.
CS-2710 Computer Organization Dr. Mark L. Hornick web: faculty-web.msoe.edu/hornick – CS-2710 info syllabus, homework, labs… –
ENG3050 Embedded Reconfigurable Computing Systems General Information Handout Winter 2015, January 5 th.
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
AES Background and Mathematics CSCI 5857: Encoding and Encryption.
Kris Gaj Office hours: Monday, 3:00-4:00 PM, Monday, 6:30-7:30 PM, Wednesday, 3:00-4:00 PM, and by appointment Research and teaching interests: cryptography.
Follow-up Courses. ECE Department MS in Electrical Engineering MS EE MS in Computer Engineering MS CpE COMMUNICATIONS & NETWORKING SIGNAL PROCESSING CONTROL.
LOGO Hardware side of Cryptography Anestis Bechtsoudis Patra 2010.
EL 3101 EL310 Hardware Description Languages Spring 2015 Instructor: Ilker Hamzaoglu Teaching Assistant: Ercan Kalalı Web Site:
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
ECE 545 – Introduction to VHDL ECE 645—Project 2 Project Options.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Lecture 4 Multiplier using FPGA 2007/09/28 Prof. C.M. Kyung.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
ECE 448 FPGA and ASIC Design with VHDL Spring 2010.
Kris Gaj Office hours: Monday, 3:00-4:00 PM, Wednesday, 3:00-4:00 PM, Thursday, 6:00-7:00 PM, and by appointment Research and teaching interests: cryptography.
Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
ELEC692/04 course_des 1 ELEC 692 Special Topic VLSI Signal Processing Architecture Fall 2004 Chi-ying Tsui Department of Electrical and Electronic Engineering.
1 Dividers Lecture 10. Required Reading Chapter 13, Basic Division Schemes 13.1, Shift/Subtract Division Algorithms 13.3, Restoring Hardware Dividers.
Welcome to the ECE 449 Computer Design Lab Spring 2005.
Kris Gaj Office hours: Monday, 7:30-8:30 PM, Tuesday & Thursday 4:30-5:30 PM, and by appointment Research and teaching interests: cryptography computer.
Kris Gaj Electrical and Computer Engineering George Mason University Towards secure cryptographic transformations efficient in both software and hardware:
1 Basic Dividers Lecture 10. Required Reading Chapter 13, Basic Division Schemes 13.1, Shift/Subtract Division Algorithms 13.3, Restoring Hardware Dividers.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL ASICs vs. FPGAs ECE 448 Lecture 15.
ECE 545 Project 2 Specification. Schedule of Projects (1) Project 1 RTL design for FPGAs (20 points) Due date: Tuesday, November 22, midnight (firm) Checkpoints:
ECE 545 Digital System Design with VHDL
ECE 545 Project 2 Specification. Project 2 (15 points) – due Tuesday, December 19, noon Application: cryptography OR digital signal processing optimized.
George Mason University ECE 449 – Computer Design Lab Welcome to the ECE 449 Computer Design Lab Spring 2004.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Computer Engineering 1502 Advanced Digital Design Professor Donald Chiarulli Computer Science Dept Sennott Square
Folding Technique: Compromising in Special Purpose Hardware Design
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
RTL Design Methodology Transition from Pseudocode & Interface
CS/EE 3700 : Fundamentals of Digital System Design Chris J. Myers Lecture 5: Arithmetic Circuits Chapter 5 (minus 5.3.4)
George Mason University Follow-up Courses. ECE Department MS in Electrical Engineering MS EE MS in Computer Engineering MS CpE COMMUNICATIONS & NETWORKING.
Course web page: ECE 646 Cryptography and Computer Network Security ECE web page  Courses  Course web pages  ECE 646.
Lecture 5B Block Diagrams HASH Example.
Kris Gaj Office hours: Monday, 3:00-4:00 PM, Wednesday, 3:00-4:00 PM, Thursday, 6:00-7:00 PM, and by appointment Research and teaching interests: FPGA.
An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, Tarek El-Ghazawi 1 1 The George.
ECE 545 Project 1 Introduction & Specification Part I.
Course web page: ECE 545 Introduction to VHDL ECE web page  Courses  Course web pages  ECE 545.
Combinational Logic Design
ECE web page  Courses  Course web pages
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
ECNG 1014: Digital Electronics Lecture 1: Course Overview
RTL Design Methodology Transition from Pseudocode & Interface
CPE 626 Advanced VLSI Design, Spring 2002 Admin
Presentation transcript:

Kris Gaj Office hours: Monday, 7:30-8:30 PM Tuesday, 6:00-7:00 PM, and by appointment Research and teaching interests: cryptography computer arithmetic VLSI design and testing Contact: Engineering Bldg., room 3225 (703)

ECE 645 Part of: MS in EE MS in CpE Digital Systems Design – pre-approved course Other concentration areas – elective course Certificate in VLSI Design/Manufacturing PhD in IT PhD in ECE

DIGITAL SYSTEMS DESIGN 1.ECE 545 Digital System Design with VHDL – K. Gaj, project, FPGA design with VHDL, Aldec/Synplicity/Xilinx/Altera 2. ECE 645 Computer Arithmetic – K. Gaj, project, FPGA design with VHDL or Verilog, Aldec/Synplicity/Xilinx/Altera 3. ECE 586 Digital Integrated Circuits – D. Ioannou 4. ECE 681 VLSI Design for ASICs – N. Klimavicz, project/lab, front-end and back-end ASIC design with Synopsys tools 5. ECE 682 VLSI Test Concepts – T. Storey, homework

Prerequisites Permission of the instructor, granted assuming that you know VHDL or Verilog,High level programming language (preferably C) ECE 545 Digital System Design with VHDL or

Prerequisite knowledge This class assumes proficiency with the FPGA CAD tools from ECE 545 You are expected to be proficient with: –Synthesizable VHDL coding –Advanced VHDL testbenches, including file input/output –Xilinx FPGA synthesis and post-synthesis simulation –Xilinx FPGA place-and-route and post-place and route simulation –Reading and interpreting all synthesis and implementation reports

Course web page ECE web page  Courses  Course web pages  ECE 645

Computer Arithmetic LectureProject Project 1 20 % Project 2 30 % Homework 10 % Midterm exam (in class) 15 % Final Exam (in class) 25 %

Advanced digital circuit design course covering addition and subtraction multiplication division and modular reduction exponentiation Efficient Integers unsigned and signed Real numbers fixed point single and double precision floating point Elements of the Galois field GF(2 n ) polynomial base

At the end of this course you should be able to: Understand mathematical and gate-level algorithms for computer addition, subtraction, multiplication, division, and exponentiation Understand tradeoffs involved with different arithmetic architectures between performance, area, latency, scalability, etc. Synthesize and implement computer arithmetic blocks on FPGAs Be comfortable with different number systems, and have familiarity with floating-point and Galois field arithmetic for future study Understand sources of error in computer arithmetic and basics of error analysis This knowledge will come about through homework, projects and practice exams. Course Objectives

Lecture topics (1) 1. Applications of computer arithmetic algorithms 2. Number representation Unsigned Integers Signed Integers Fixed-point real numbers Floating-point real numbers Elements of the Galois Field GF(2 n ) INTRODUCTION

1. Basic addition, subtraction, and counting 2. Carry-lookahead, carry-select, and hybrid adders 3. Adders based on Parallel Prefix Networks ADDITION AND SUBTRACTION

MULTIOPERAND ADDITION 1. Carry-save adders 2. Wallace and Dadda Trees 3. Adding multiple unsigned and signed numbers

TECHNOLOGY 1. Internal Structure of Xilinx and Altera FPGAs 2. ASIC standard cell libraries and synthesis tools for ASICs 3. Two-operand and multi-operand addition in FPGAs

MULTIPLICATION 1. Tree and array multipliers 2. Sequential multipliers 3. Multiplication of signed numbers and squaring

TECHNOLOGY 1. Pipelining 2. Multi-cycle paths 3. Multiplication in Xilinx and Altera FPGAs - using distributed logic - using embedded multipliers - using DSP blocks

LONG INTEGER ARITHMETIC 1.Modular Exponentiation 2.Montgomery Multipliers and Exponentiation Units

DIVISION 1.Basic restoring and non-restoring sequential dividers 2. SRT and high-radix dividers 3. Array dividers

FLOATING POINT AND GALOIS FIELD ARITHMETIC 1.Floating-point units 2. Galois Field GF(2 n ) units

Literature (1) Required textbook: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, 2 nd edition, Oxford University Press, 2010.

Literature (2) Jean-Pierre Deschamps, Gery Jean Antoine Bioul, Gustavo D. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems, Wiley-Interscience, Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, Recommended textbooks:

Literature (2) 1.Pong P. Chu, RTL Hardware Design Using VHDL: Coding for Efficiency, Portability, and Scalability, Wiley-IEEE Press, Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, VHDL books:

Literature (3) Supplementary books: 1.E. E. Swartzlander, Jr., Computer Arithmetic, vols. I and II, IEEE Computer Society Press, Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptology, Chapter 14, Efficient Implementation, CRC Press, Inc., 1998.

Literature (3) Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic: 8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94, 7/00, 3/05. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing

Homework reading assignments design of small hardware units using VHDL analysis of computer arithmetic algorithms and implementations

Midterm exams Midterm Exam - 2 hrs 30 minutes, in class multiple choice + short problems Final Exam – 2 hrs 45 minutes comprehensive conceptual questions, analysis and design of arithmetic units Practice exams on the web Midterm Exam - Monday, March 23 Final Exam - Tuesday, May 11, 7:30-10:15 PM Tentative days of exams:

Project (1) Project I (individual, 20% of grade) Comprehensive analysis of basic operations of SHA-3 candidates Final report due Tuesday, March 16 Optimization criteria: minimum latency minimum area minimum product latency · area use of embedded FPGA resources (BRAMs, embedded multipliers, DSP units, Different for all students Done individually

Limitations of the Current Approach Time and effort Accuracy of comparison One designer = too long time to implement all candidates Multiple designers = significant inaccuracies associated with different skills and coding styles

Problem How to predict ranking and relative performance of candidate algorithms without the actual time-consuming hardware implementation at the Register Transfer Level (RTL)? Applications: Ranking of candidate algorithms submitted to the contests (large number of candidates, time limit) Ranking of candidate algorithms during the design process by designers themselves (no experience in hardware design, short response time needed)

Features of our Problem to Exploit No need to obtain the functioning netlist or HDL description (performance numbers sufficient) Limited accuracy required (less than 20% differences in performance considered insignificant) Limited number of basic operations Limited number of architectures used in practice

The proposed approach

1.Determine the minimum set of basic operations required to implement a given class of cryptographic transformations 2.Determine the required range of parameters of these operations (e.g., operand sizes in arithmetic operations) 3.Implement basic operations in RTL VHDL (or Verilog) in a parametric fashion (using constants and generics) 4. Characterize all operations, for all required parameter values using Xilinx and/or Altera development environments -Area and latency -Low cost FPGAs and high-performance FPGAs Steps of Our Methodology (1)

Mars Twofish Serpent RC6 Rijndael Major operations of AES finalists S-boxes Integer multiplication Variable rotation Multiplication in GF(2 m )

Mars Twofish Serpent RC6 Rijndael Auxiliary operations of AES finalists Boolean Addition/ subtraction Permutation Fixed rotation

Major cipher operations (1) - S-box S-box n x m ROM Software Hardware C ASM WORD S[1<<n]= { 0x23, 0x34, 0x } S DW 23H, 34H, 56H ….. direct logic n m 2 n words n-bit address m-bit output... x1x1 x2x2 xnxn y1y1 y2y2 ymym S 2 n  m bits

variable rotation ROL32 Mux-based rotation High-speed clock C ASM Major cipher operations (2) – Variable Rotation A <<< B ROL A, B C = (A > (32-B)); min (B, 32-B) CLK’ cycles Hardware Software fast clock CLK’ A A<<<B A<<<0A<<<16 32 B[4] B[3] B[2] B[1] B[0]

Permutation C order of wires Auxiliary cipher operations (1) - Permutation P Hardware Software ASM complex sequence of instructions <<, |, & complex sequence of instructions ROL, OR, AND n n x1x1 x2x2 x3x3 xnxn x n-1... y1y1 y2y2 y3y3 ynyn y n-1...

C=A+B mod 2 n Adder/subtractor ASM C Auxiliary cipher operations (4) Addition/subtraction Hardware Software C = A+B; ADD n n n nn n unsigned long A, B, C; A B C n=32, 16

Delay Area Multiple designs for hardware adders Ripple carry adder (RC) Carry-Skip adder (CS) Carry-LookAhead adder (CLA) Carry-Select adder Parallel-Prefix Network adder (Kogge-Stone, Brent-Kung)

Delay Area modular multiplication Boolean permutation variable rotation GF(2 n ) multiplication fixed rotation Delay and area in HARDWARE Basic operations addition (CLA) addition (RC) S-box 4x4 S-box 8x8 S-box 9x32 modular inverse

addition multiplication Boolean permutation fixed rotation GF(2 n ) multiplication variable rotation Delay and area in SOFTWARE Basic operations Delay Memory S-box 4x4 S-box 8x8 S-box 9x32 modular inverse

5.Develop a simple and human-friendly notation to describe cryptographic algorithms (or their repetitive parts [rounds]), which reveals the parallelism present in the algorithm  Graphical representation more human friendly  Textual representation easier to process by computer programs Steps of Our Methodology (2) Possible Approach: start from a textual description adopt one of the existing graphical editors

6.Develop a tool capable of estimating algorithm performance in terms of area and throughput using  High-level description  Library of basic components  Choice of architecture  Optimization criteria (minimum area, maximum throughput, maximum throughput to area ratio, etc.)  Other constraints, such as required clock frequency, etc. 7.Calibration of the developed tools using existing RTL designs for a limited subset of the algorithms Steps of Our Methodology (2)

Possible Problems Routing (interconnect) delays Optimizations on the boundary between two operations Combining multiple operations into one (e.g., using look-up table approach) Inter-round optimizations Resource sharing techniques, in particular resource sharing between encryption and decryption circuits Dependence of results on selected FPGA devices Others…

Summary Main project goals: Provide cryptographic community and in particular standardization organizations/groups with a reliable and fast way of comparing large number of candidates for a cryptographic standard Save designers of cryptographic algorithms from design blunders (such as that of IBM team in case of MARS) Project in progress… Feedback and collaboration is very welcome

addition multiplication Boolean permutation fixed rotation GF(2 n ) multiplication variable rotation Delay and area in SOFTWARE MARS – IBM team Delay Memory S-box 4x4 S-box 8x8 S-box 9x32 modular inverse

Delay Area modular multiplication Boolean permutation variable rotation GF(2 n ) multiplication fixed rotation Delay and area in HARDWARE MARS – IBM team addition (CLA) addition (RC) S-box 4x4 S-box 8x8 S-box 9x32 modular inverse

Project II (30% of grade) Project (2) Real life application Requirements derived from the analysis of an application Software implementation (typically public domain) used as a source of test vectors and to determine HW/SW speed ratio Several project topics proposed on the web You can suggest project topic by yourself New Design in the area of Public Key Cryptography, Cryptanalysis, Digital Signal Processing, etc.

Cooperation (but not exchange of codes) between teams is encouraged Every team works on a slightly different problem Project topics should be more complex for larger teams Project II (rules) Can be done in a group of 1-3 students Oral presentation and written report: Tuesday, May 4

Degrees of freedom and possible trade-offs speedarea power testability ECE 645 ECE 682 ECE 586, 681

speed area latency throughput Degrees of freedom and possible trade-offs

Primary applications (1) Execution units of general purpose microprocessors Integer units Floating point units Integers (8, 16, 32, 64 bits) Real numbers (32, 64 bits)

Primary applications (2) Digital signal and digital image processing Real or complex numbers (fixed-point or floating point) e.g., digital filters Discrete Fourier Transform Discrete Hilbert Transform General purpose DSP processors Specialized circuits

Primary applications (3) Coding Elements of the Galois fields GF(2 n ) (4-64 bits) Error detection codes Error correcting codes

Secret-key (Symmetric) Cryptosystems key of Alice and Bob - K AB Alice Bob Network Encryption Decryption

Hash Function arbitrary length message hash function hash valueh(m) h m fixed length It is computationally infeasible to find such m and m’ that h(m)=h(m’)

Primary applications (4) Cryptography Integers (16, 32 bits) IDEA, RC6, MarsTwofish, Rijndael, SHA-3 candidates Elements of the Galois field GF(2 n ) (4, 8 bits)

RC6 MARS Twofish MUL32, 2 x ROL32, S-box 9x32 Main operations Auxiliary operations XOR, ADD/SUB32 2 x SQR32, 2 x ROL32 XOR, ADD/SUB32 96 S-box 4x4, 24 MUL GF(2 8 ) XOR ADD32 Rijndael Serpent 8 x 32 S-box 4x4 XOR 16 S-box 8x8 24 MUL GF(2 8 ) XOR

Public Key (Asymmetric) Cryptosystems Public key of Bob - K B Private key of Bob - k B Alice Bob Network Encryption Decryption

RSA as a trap-door one-way function M C = f(M) = M e mod N C M = f -1 (C) = C d mod N PUBLIC KEY PRIVATE KEY N = P  Q P, Q - large prime numbers e  d  1 mod ((P-1)(Q-1))

RSA keys PUBLIC KEY PRIVATE KEY { e, N } { d, P, Q } N = P  Q e  d  1 mod ((P-1)(Q-1)) P, Q - large prime numbers

Primary applications (5) Cryptography Long integers ( ,000 bits) Public key cryptography RSA, DSA, Diffie-Hellman Elliptic Curve Cryptosystems Elements of the Galois field GF(2 n ) ( bits)

Primary applications (5) Cipher Breaking Public key cryptography RSA PUBLIC KEY RSA PRIVATE KEY { e, N } { d, P, Q } N = P  Q P, Q e  d  1 mod ((P-1)(Q-1))