ECE 545 – Introduction to VHDL ECE 645—Project 2 Project Options
2 Project 2 Overview Project 2 will involve the FPGA implementation of a complex digital arithmetic function The project will have an application in either cryptography or signal processing Due to the scope of the project, students should be in groups of 3 The specification and scope of the project will be an interactive process between groups and the instructor
3 Project Options Each group will involve the FPGA implementation of one of the following projects: Cryptography related 1.Trial division sieve 2.Elliptic curve method of factoring 3.RSA encryption & decryption with Montgomery multipliers based on carry save adders Signal processing related 4.Iterative and pipeline CORDIC (coordinate rotation digital computer) processors 5.Finite impulse response filter architectures for FPGA implementations 6.Direct digital frequency synthesis
Cryptography Projects Background ECE 645 – Computer Arithmetic
5 RSA Public Key Cryptosystem M C = f(M) = M e mod N C M = f -1 (C) = C d mod N PUBLIC KEY PRIVATE KEY N = P Q P, Q - large prime numbers e d 1 mod ((P-1)(Q-1))
6 RSA Keys PUBLIC KEY PRIVATE KEY { e, N } { d, P, Q } N = P Q e d 1 mod ((P-1)(Q-1)) P, Q - large prime numbers
7 Polynomial Selection Linear Algebra Square Root RelationCollection Sieving Cofactoring 200 bit numbers & 350 bit Trial division ECM method Factoring 1024-bit RSA keys using Number Field Sieve (NFS)
Topic 1: Trial Division Sieve ECE 645 – Computer Arithmetic
9 RSA Keys PUBLIC KEY PRIVATE KEY { e, N } { d, P, Q } N = P Q e d 1 mod ((P-1)(Q-1)) P, Q - large prime numbers
10 Topic 1: Trial Division Sieve (1) Given: Inputs: Variables: 1.Integers N 1, N 2, N 3,.... each of the size of k-bits Constants: 2. Factor base = set of all primes smaller smaller than a certain bound B = { p 1 =2, p 2 =3, p 3 =5,..., p t ≤ B } Parameters of interest: 4 ≤ k ≤ ≤ B ≤ 10 5
11 Topic 1: Trial Division Sieve (2) Required: Outputs: For each integer N i : A list of primes from the factor base that divides N i, and the number of times each prime divides N i. For example if N i = p 1 e1 · p 2 e2 · p 3 e3 · M i, where M i is not divisible by any prime belonging to a factor base, then the output is {p 1, e1}, {p 2, e2}, {p 3, e3}
12 Topic 1: Trial Division Sieve (3) Example: Constants: k=10, B=5 Factor base = {2, 3, 5} Variables: N 1 = 408 = 2 3 · 3 · 17 N 2 = 630 = 2 · 3 2 · 5 · 7 Outputs: {2, 3}, {3, 1} {2, 1}, {3, 2}, {5, 1}
Topic 2: Elliptic Curve Method of Factoring ECE 645 – Computer Arithmetic
14 P=(6,19) Q=(7,12) R=P+Q=(13,7) A Addition P=(3,13) 2P=P+P=(7,11) D Doubling P: Elliptic Curves
15 Inputs : N – number to be factored E – elliptic curve P 0 – point of the curve E : initial point B 1 – smoothness bound for Phase1 B 2 – smoothness bound for Phase2 Outputs: q - factor of N, 1 < q ≤ N or FAIL ECM Algorithm
16 precomputations postcomputations main computations ECM Algorithm Phase 1
17 postcomputations main computations ECM Algorithm Phase 2
18 ECM k·P P+Q2P x·y mod Nx+y mod Nx-y mod N Top level Medium level Point addition Low level Moduar multiplication Modular addition Modular subtraction Scalar multiplication Point doubling Elliptic curve point operations Modular arithmetic (ring operations) Functional units Control unit Host computer Hierarchy of Elliptic Curve Operations
Topic 3: RSA Encryption & Decryption with Montgomery Multipliers based on Carry Save Adders ECE 645 – Computer Arithmetic
20 M C = f(M) = M e mod N C M = f -1 (C) = C d mod N PUBLIC KEY PRIVATE KEY N = P Q P, Q - large prime numbers e d 1 mod ((P-1)(Q-1)) RSA as a Trap-Door One-Way Function
21 Right-to-left binary exponentiation Left-to-right binary exponentiation E = (e L-1, e L-2, …, e 1, e 0 ) 2 Y = 1; S = X; for i=0 to L-1 { if (e i == 1) Y = Y S mod N; S = S 2 mod N; } Y = 1; for i=L-1 downto 0 { Y = Y 2 mod N; if (e i == 1) Y = Y X mod N; } Exponentiation: Y = X E mod N
22 C = A B mod M A Integer domain Montgomery domain A’ = A 2 k mod M B B’ = B 2 k mod M C’ = MP(A’, B’, M) = = A’ B’ 2 -k mod M = = (A 2 k ) (B 2 k ) 2 -k mod M = = A B 2 k mod M C’ = C 2 k mod M C = A B A, B, M – k-bit numbers Montgomery Modular Multiplication
23 A’ = MP(A, 2 2k mod M, M) C = MP(C’, 1, M) A A’ C C’ Montgomery Modular Multiplication
24 = MPMP CPCP P dPdP mod = MQMQ CQCQ Q dQdQ C P = C mod P d P = d mod (P-1) C Q = C mod Q d Q = d mod (Q-1) = mod C M d N M = M P ·R Q + M Q ·R P mod N where R P = (P -1 mod Q) ·P = P Q-1 mod N R Q = (Q -1 mod P) ·Q= Q P-1 mod N Fast Modular Exponentiation using Chinese Remainder Theorem
Topic 4: Iterative and Pipeline CORDIC (Coordinate Rotation Digital Computer) Processors ECE 645 – Computer Arithmetic
26 - If we have a computationally efficient way of rotating a vector, we can evaluate cos, sin, and tan –1 functions Rotation by an arbitrary angle is difficult, so we: Perform psuedorotations that require simpler operations Use special angles to synthesize the desired angle z z = (1) + (2) (m) Key ideas in CORDIC COordinate Rotation DIgital Computer used this method in 1950s; modern electronic calculators also use it Rotations and Pseudo-Rotations in CORDIC
27 Fig A pseudorotation step in CORDIC Our strategy: Eliminate the terms (1 + tan 2 (i) ) 1/2 and choose the angles (i) ) so that tan (i) is a power of 2; need two shift-adds x (i+1) = x (i) cos (i) – y (i) sin (i) = (x (i) – y (i) tan (i) ) / (1 + tan 2 (i) ) 1/2 y (i+1) = y (i) cos (i) + x (i) sin (i) = (y (i) + x (i) tan (i) ) / (1 + tan 2 (i) ) 1/2 z (i+1) = z (i) – (i) Recall that cos = 1 / (1 + tan 2 ) 1/2 Rotating a Vector by an Angle
28 Fig A pseudorotation step in CORDIC Pseudorotation: Whereas a real rotation does not change the length R (i) of the vector, a pseudorotation step increases its length to: R (i+1) = R (i) / cos (i) = R (i) (1 + tan 2 (i) ) 1/2 x (i+1) = x (i) – y (i) tan (i) y (i+1) = y (i) + x (i) tan (i) z (i+1) = z (i) – (i) Pseudorotating a Vector by an Angle
29 CORDIC iteration: In step i, we pseudorotate by an angle whose tangent is d i 2 –i (the angle e (i) is fixed, only direction d i is to be picked) x (i+1) = x (i) – d i y (i) 2 –i y (i+1) = y (i) + d i x (i) 2 –i z (i+1) = z (i) – d i tan –1 2 –i = z (i) – d i e (i) –––––––––––––––––––––––––––––––– i –––––––––––––––––––––––––––––––– –––––––––––––––––––––––––––––––– e (i) in degrees (approximate) e (i) in radians (precise) Table 22.1 Value of the function e (i) = tan –1 2 –i, in degrees and radians, for 0 i 9 Example: 30 angle 30.0 45.0 – – – – = 30.1 Basic CORDIC Iterations
30 Project Task Implement iterative and pipeline solutions to CORDIC in various modes
Topic 5: Finite Impulse Response Filter Architectures for FPGA Implementations ECE 645 – Computer Arithmetic
32 Digital filters are widely used in digital communications and audio/video processing. In particular, finite impulse response (FIR) filters are used for their ease of implementation and stability. FIR Filters
33 As seen above digital filters, boxed in blue, play a crucial role in digital communication chips such as Ethernet transceivers, cable modems, DSL modems, satellite receivers, mobile phones, etc. Example: Gigabit Ethernet
34 x(n) Z -1 h0h0 h1h1 h2h2 h N-1 An FIR filter implements a convolution in the time- domain Critical path of N-tap filter: N-1 adds + 1 multiply Arithmetic complexity of N-tap filter modeled as: N multiplications/sample + N-1 adds/sample y(n) Direct Form Filter
35 Project Task: FIR Architecture Explorations and Optimizations Transpose form Parallel subexpression sharing Canonic signed digit representations using carry- save addition Parallel, word-serial, bit-serial implementation Xilinx DSP multipliers and multiply-accumulate structures
Topic 6: Direct Digital Frequency Synthesis ECE 645 – Computer Arithmetic
37 Direct Digital Frequency Synthesis Direct digital frequency synthesis is used to generate sin and cosine functions for digital communication applications Used in many applications: cell phones, cable modems, satellite receivers, etc.
38 DDFS: Basic Understanding and Architecture Output of DDFS is a sine and cosine waveform k = frequency control word L = accumulator bit width N=2 L =number of slots in ROM D=number of output bits phi(n) = (nk) mod N 1/T = clock frequency f 0 = 1/ (NT) = lowest frequency output (i.e. resolution) f c = kf 0 = k/(NT) = desired frequency, output will be cos(2π f c nT) and sin(2π f c nT) f max = greatest frequency achievable = 1/(2T) = ½ f clk +N slots of ROM k D D Lcos(2π/N * phi(n)) sin(2π/N * phi(n))
39 DDFS: Example Output
40 Project task The ROM-based architecture is simplistic; new architectures which are superior exist Investigate various architectures of DDFS and implement in FPGA