Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing.

Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing Contact: Science & Technology II, room 223 kgaj@gmu.edu, kgaj01@yahoo.com, (703) 993-1575

ECE 645 Part of: MS in EE MS in CpE Digital Systems Design – required course Other concentration areas – elective course Certificate in VLSI Design/Manufacturing PhD in IT PhD in ECE

Spring 2006 Enrollment as of January 23, 2006 MS in CpE 7 MS in EE 6 BS in CpE 1 PhD in ECE 1 PhD in IT 1 MS in ISA 1 NDG 1

DIGITAL SYSTEMS DESIGN Concentration advisor: Ken Hintz 1. ECE 545 Introduction to VHDL – K. Gaj, K. Hintz, project, VHDL, Aldec/Synplicity/Xilinx and ModelSim/Synopsys 2. ECE 645 Computer Arithmetic: HW and SW Implementation – K. Gaj, project, VHDL, Aldec/Synplicity/Xilinx and ModelSim/Synopsys 3. ECE 586 Digital Integrated Circuits – D. Ioannou 4. ECE 681 VLSI Design Automation – T. Storey, project/lab, back-end design with Synopsys tools

algorithmic Design level register-transfer gate transistor layout devices Courses Computer Arithmetic Introduction to VHDL Digital Integrated Circuits ECE 545 ECE 645 ECE 586 ECE 684 MOS Device Electronics VLSI Design Automation ECE 681 Semiconductor Device Fundamentals ECE 584

Prerequisites Permission of the instructor, granted assuming that you know VHDL or Verilog,High level programming language (preferably C) ECE 545 Introduction to VHDL or

Course web page ECE web page  Courses  Course web pages  ECE 645 http://teal.gmu.edu/courses/ECE645/index.htm

Computer Arithmetic LectureProject Project 1 20 % Project 2 30 % Homework 15 % Midterm exam 1 (in class) 20 % Midterm exam 2 (take-home) 15 %

Advanced digital circuit design course covering addition and subtraction multiplication division and modular reduction exponentiation Efficient Integers unsigned and signed Real numbers fixed point single and double precision floating point Elements of the Galois field GF(2 n ) polynomial base

Lecture topics (1) 1. Applications of computer arithmetic algorithms 2. Number representation Unsigned Integers Signed Integers Fixed-point real numbers Floating-point real numbers Elements of the Galois Field GF(2 n ) INTRODUCTION

1. Basic addition, subtraction, and counting 2. Carry-lookahead, carry-select, and hybrid adders 3. Adders based on Parallel Prefix Networks ADDITION AND SUBTRACTION

MULTIOPERAND ADDITION 1. Carry-save adders 2. Wallace and Dadda Trees 3. Adding multiple signed numbers

MULTIPLICATION 1. Tree and array multipliers 2. Sequential multipliers 3. Multiplication of signed numbers and squaring

DIVISION 1.Basic restoring and non-restoring sequential dividers 2. SRT and high-radix dividers 3. Array dividers

FLOATING POINT AND GALOIS FIELD ARITHMETIC 1.Floating-point units 2. Galois Field GF(2 n ) units

University of California, Santa Barbara, Behrooz Parhami, ECE252B: Computer Arithmetic. University of Massachusetts, Amherst, Israel Koren, ECE666: Digital Computer Arithmetic Lehigh University, Michael Schulte, ECE496: High-Speed Computer Arithmetic. Worcester Polytechnic Institute, Berk Sunar, EE-579 V Computer Arithmetic Circuits. Stanford University, Michael Flynn, EE486: Advanced Computer Arithmetic. University of California, Davies, Vojin Oklobdzija, ECE278: Computer Arithmetic for Digital Implementation. Similar courses at other universities

New in this course real-life project based on VHDL or Verilog HDL operations in the Galois Field (with the application in cryptography and communications)

Possible topics for a Scholarly Paper or Research Project for the CpE & EE students Advanced Computer Arithmetic Square root Exponential and logarithmic functions Trigonometric functions Hyperbolic functions Fault-Tolerant Arithmetic Low-Power Arithmetic High-Throughput Arithmetic

Three Curriculum Options MS Thesis Option Research Project Option Scholarly Paper Option 2 core courses 4 required courses 2 elective courses 3 elective courses 4 elective courses ECE 799 Master’s Thesis (6 cr. hrs) ECE 798 Research Project Scholarly paper

Literature (1) Required textbook: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000. Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, 2004. Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, 2002. Recommended textbooks:

Literature (2) 1. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998. 2. Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004. VHDL books (used in ECE 545 in Fall 2005)

Literature (3) Supplementary books: 1.E. E. Swartzlander, Jr., Computer Arithmetic, vols. I and II, IEEE Computer Society Press, 1990. 2. Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptology, Chapter 14, Efficient Implementation, CRC Press, Inc., 1998. 3. Christof Paar, Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields, VDI Verlag, 1994.

Literature (3) Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic: 8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing

Homework reading assignments (main textbook + articles) analysis of hardware and software algorithms and implementations design of small hardware units using VHDL or Verilog Optional assignments Possibility of trading analysis vs. design vs. coding

Midterm exams Exam 1 - 2 hrs 30 minutes, in class multiple choice + short problems Exam 2 – 48 hrs, take-home analysis and design of arithmetic units using VHDL or Verilog HDL Practice exams on the web Exam 1 - Monday, March 27 Exam 2 - Saturday-Sunday, May 6-7 Tentative days of exams:

Project (1) Project I (20% of grade) Design and comparative analysis of fast adders (several hundred bits long) Final report due Monday, March 20 Optimization criteria: minimum latency maximum throughput minimum area minimum product latency · area maximum ratio throughput/area scalability Similar for all studentsDone individually

Project II (30% of grade) Fast multiplication squaring division modular reduction, or modular exponentiation Project (2) or Fast addition or multiplication Long unsigned or signed integers Floating-point numbers

Written report & oral presentation Monday, May 15 Real life application Requirements derived from the analysis of the application Typically both hardware and software design Several project topics proposed on the web You can choose project topic by yourself Can be done in a group of 1-3 students Project II (rules)

Cooperation (but not exchange of code) between teams is encouraged Every team works on a slightly different problem Project topics should be more complex for larger teams Project II (rules)

Project Hardware Software VHDL (or Verilog) code Latency and/or throughput Area High level language (C preferred) Execution time Memory requirements Scalability

Degrees of freedom and possible trade-offs speedarea power testability ECE 645 ECE 682 ECE 586, 681

speed area latency throughput Degrees of freedom and possible trade-offs

Timing parameters definitionunitspipelining latency throughput delay clock period clock frequency time input  output #output bits/time unit time point  point rising edge  rising edge of clock 1 clock period ns Mbits/s ns MHz bad good

Project technologies semi-custom Application Specific Integrated Circuits and Field Programmable Gate Arrays

Levels of design description Algorithmic level Register Transfer Level Logic (gate) level Circuit (transistor) level Physical (layout) level Level of description most suitable for synthesis

Register Transfer Logic (RTL) Design Description Combinational Logic Combinational Logic … Clock Registers

RTL Block Synthesis* *Simplified design flow Estimated Area Estimated Timing

VHDL Design Styles Components and interconnects structural VHDL Design Styles dataflow Concurrent statements behavioral (algorithmic) Registers State machines Test benches Sequential statements Subset most suitable for use in this course

CAD software available at GMU (1) Aldec Active-HDL (under Windows) ModelSim (under Unix) available from all PCs in the ECE educational labs using an X-terminal emulator available remotely from home using a fast Internet connection available in the FPGA Lab, S&T II, room 203 VHDL simulators student edition can be purchased on an individualstudent edition basis ($59.95 + S&H)

CAD software available at GMU (2) Synplicity Synplify Pro (under Windows) Synopsys Design Compiler (under Unix) available from all PCs in the ECE educational labs using an X-terminal emulator available remotely from home using a fast Internet connection available in the FPGA Lab, S&T II, room 203 Tools used for logic synthesis Xilinx XST (under Windows) FPGA synthesis ASIC synthesis

CAD software available at GMU (3) Xilinx ISE (under Windows) available in the FPGA Lab, S&T II, room 203 Tools used for implementation (mapping, placing & routing) in the FPGA technology

How to learn VHDL for synthesis by yourself? Lecture slides for ECE 545 from Fall 2005 Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998. Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004. Individual or small-group hands-on sessions with the TA Practice, Practice, Practice!!!

Testbench testbench design entity Architecture 1 Architecture 2 Architecture N.. Non-synthesizable Synthesizable

Design Environment Test Vectors (Inputs) Actual Results vs. Expected Results Comparison HDL Design (VHDL or Verilog) Reference Model ( C )

Primary applications (1) Execution units of general purpose microprocessors Integer units Floating point units Integers (8, 16, 32, 64 bits) Real numbers (32, 64 bits)

Primary applications (2) Digital signal and digital image processing Real numbers (fixed-point or floating point) e.g., digital filters Discrete Fourier Transform Discrete Hilbert Transform General purpose DSP processors Specialized circuits

Primary applications (3) Coding Elements of the Galois fields GF(2 n ) (4-64 bits) Error detection codes Error correcting codes

Secret-key (Symmetric) Cryptosystems key of Alice and Bob - K AB Alice Bob Network Encryption Decryption

Primary applications (4) Cryptography Integers (16, 32 bits) Secret key cryptography IDEA, RC6, MarsTwofish, Rijndael Elements of the Galois field GF(2 n ) (4, 8 bits)

RC6 MARS Twofish MUL32, 2 x ROL32, S-box 9x32 Main operations Auxiliary operations XOR, ADD/SUB32 2 x SQR32, 2 x ROL32 XOR, ADD/SUB32 96 S-box 4x4, 24 MUL GF(2 8 ) XOR ADD32 Rijndael Serpent 8 x 32 S-box 4x4 XOR 16 S-box 8x8 24 MUL GF(2 8 ) XOR

Public Key (Asymmetric) Cryptosystems Public key of Bob - K B Private key of Bob - k B Alice Bob Network Encryption Decryption

RSA as a trap-door one-way function M C = f(M) = M e mod N C M = f -1 (C) = C d mod N PUBLIC KEY PRIVATE KEY N = P  Q P, Q - large prime numbers e  d  1 mod ((P-1)(Q-1))

RSA keys PUBLIC KEY PRIVATE KEY { e, N } { d, P, Q } N = P  Q e  d  1 mod ((P-1)(Q-1)) P, Q - large prime numbers

Primary applications (5) Cryptography Long integers (1000-2000 bits) Public key cryptography RSA, DSS, Diffie-Hellman Elliptic Curve Cryptosystems Elements of the Galois field GF(2 n ) (150-250 bits)

Topic 1 Application: modern secret-key ciphers, candidates for the new Advanced Encryption Standard (AES): MARS developed by IBM RC6 developed at MIT Function: 32-bit unsigned multiplication and squaring modulo 2 32 Optimization: maximum throughput minimum latency minimum area Environment: hardware, software for 8-bit processors C = A · B mod 2 32, C = A 2 mod 2 32

Topic 2 Application: digital filters Function: 64-bit signed multiplier-accumulator (MAC) accumulating at least 256 partial products Environment: hardware, software for a general purpose DSP or microprocessor Optimization: Hardware - maximum throughput limited area Software – minimum execution time, limited memory C =  A i · B i i=1 256

Topic 3 Application: general purpose microprocessor Function: multiplication of two 64-bit signed numbers + division of a 128-bit number by a 64-bit number Environment: hardware, software for a 64-bit processor without multiplication and division built in Optimization: Hardware – minimum latency maximum throughput limited area Software – minimum execution time, limited memory C = A · B C=A / B

Topic 4 Application: modern public-key ciphers RSA Diffie-Hellman Elliptic Curve Cryptosystems Function: modular exponentiation C=M E mod N M, N – arbitrary 768-bit numbers, E=2 16 +1 Optimization: Hardware - minimum latency limited area Software – minimum execution time, limited memory Environment: hardware, software for 32-bit or 8-bit processors C = A E mod N

Topic 5 Application: general purpose microprocessor or digital signal processor Function: floating point addition and multiplication according to ANSI/IEEE 754 Environment: hardware, software for a 32-bit processor without floating point operations Optimization: Hardware – minimum latency maximum throughput limited area Software – minimum execution time, limited memory Z = X+Y Z = X · Y

Famous computer arithmetic bugs and flaws

Learn to deal with approximations In digital arithmetic one has to come to grips with approximation and questions like: –When is approximation good enough –What margin of error is acceptable Be aware of the applications you are designing the arithmetic circuit or program for. Analyze the implications of your approximation.

Calculators u = 10 times v = 2 1/1024 = 1.000 677 131= 1.000 677 131 x = (((u 2 ) 2 )…) 2 = 1.999 999 963 10 times x’ = u 1024 = 1.999 999 973 y = (((v 2 ) 2 )…) 2 = 1.999 999 983 10 times y’ = v 1024 = 1.999 999 994 Hidden digits in the internal representation of numbers Different algorithms give slightly different results Very good accuracy

Consequences of bad approximations Example: Failure of Patriot Missile (1991 Feb. 25) Source http://www.math.psu.edu/dna/455.f96/disasters.html American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28 Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot) Specifics of the problem: time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds Internal registers were 24 bits wide 1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b) Error  0.1100 1100  2 –23  9.5  10 –8 Error in 100-hr operation period  9.5  10 –8  100  60  60  10 = 0.34 s Distance traveled by Scud = (0.34 s)  (1676 m/s)  570 m This put the Scud outside the Patriot’s “range gate” Ironically, the fact that the bad time calculation had been improved in some (but not all) code parts contributed to the problem, since it meant that inaccuracies did not cancel out

Example: Explosion of Ariane Rocket (1996 June 4) Source http://www.math.psu.edu/dna/455.f96/disasters.html Unmanned Ariane 5 rocket launched by the European Space Agency veered off its flight path, broke up, and exploded only 30 seconds after lift-off (altitude of 3700 m) The $500 million rocket (with cargo) was on its 1st voyage after a decade of development costing $7 billion Cause: “software error in the inertial reference system” Specifics of the problem: a 64 bit floating point number relating to the horizontal velocity of the rocket was being converted to a 16 bit signed integer An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767) Consequences of bad approximations

Pentium bug (1) October 1994 Thomas Nicely, Lynchburg Collage, Virginia finds an error in his computer calculations, and traces it back to the Pentium processor Tim Coe, Vitesse Semiconductor presents an example with the worst-case error c = 4 195 835/3 145 727 Pentium = 1.333 739 06... Correct result = 1.333 820 44... November 7, 1994 Late 1994 First press announcement, Electronic Engineering Times

Pentium bug (2) Intel admits “subtle flaw” Intel’s white paper about the bug and its possible consequences Intel - average spreadsheet user affected once in 27,000 years IBM - average spreadsheet user affected once every 24 days Replacements based on customer needs Announcement of no-question-asked replacements November 30, 1994 December 20, 1994

Pentium bug (3) Error traced back to the look-up table used by the radix-4 SRT division algorithm 2048 cells, 1066 non-zero values {-2, -1, 1, 2} 5 non-zero values not downloaded correctly to the lookup table due to an error in the C script

Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing.

Similar presentations

Presentation on theme: "Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing.

Similar presentations

Presentation on theme: "Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests: cryptography computer arithmetic VLSI design and testing."— Presentation transcript:

Similar presentations

About project

Feedback