Random Number Generator Dmitriy Solmonov W1-1 David Levitt W1-2 Jesse Guss W1-3 Sirisha Pillalamarri W1-4 Matt Russo W1-5 Design Manager – Thiago Hersan March 1, 2006 Component Layout and Floorplan Project Objective: Create a Cryptologically Secure Pseudo-Random Number Generator
Need for Encryption Explain how a good random number can make data transfer that much more secure.
Random Number? Pseudo-random number generator Uses RC4 encryption algorithm –Cryptographically secure Internally Updated Seed –not in programmer's visible state –hacker
Usage
Demand Potential markets –Defense and Intelligence Organizations –Gambling –Component of future secure mobile communications
The IBAA Algorithm #define ALPHA (8) #define SIZE (1<<ALPHA) #define ind(x) ((x)&(0x1F)) #define barrel(a) (((a)<<19)^((a)13)) /*beta=32, shift=19*/ … y=y1+b; m[ind(i)]=y; b=m[ind(y>>ALPHA]+x; r[ind(i)]=b; for(i=0;i<SIZE;++i){ X=m[ind(i)]; A=barrel(a)+m[ind(I +16)]; Y1=m[ind(x)]+a; Y=y1+b; M[ind(i)]=y; B=m[ind(y)>>ALPHA]+x; R[ind(i)]=b; }
Algorithm Animation TBC
Algorithm to Architecture Explain progression from C code to choice of hardware.
Algorithm to Architecture Explain the choice for a 2-Stage Pipeline with multiple cycles per stage.
Algorithm to Architecture Explain why 4 cycles per stage yields the best throughput under hardware assumptions
SRAM (M) SRAM (R) FSM Adder Counter Control Logic Register Counter (M1, M2, M3) Registers Adder (X) Reg (B) Reg (Y) Reg Adder (Y1) Reg Adder (A) Reg typedef unsigned int u4; /* unsigned four bytes, 32 bits */ #define ALPHA (8) #define SIZE (1<<ALPHA) #define ind(x) ((x)&(SIZE-1)) #define barrel(a) (((a) >13)) /* beta=32,shift=19 */ static void ibaa(m,r,aa,bb) u4 *m; /* Memory: array of SIZE ALPHA-bit terms */ u4 *r; /* Results: the sequence, same size as m */ u4 *aa; /* Accumulator: a single value */ u4 *bb; /* the previous result */ { register u4 a,b,x,y,i; a = *aa; b = *bb; for (i=0; i<SIZE; ++i) { x = m[i]; a = barrel(a) + m[ind(i+(SIZE/2))]; /* set a */ m[i] = y = m[ind(x)] + a + b; /* set m */ r[i] = b = m[ind(y>>ALPHA)] + x; /* set r */ } *bb = b; *aa = a; } (M4) Reg
Floorplan Evolution: #1
Floorplan #2
Final Floorplan
Animation showing what happens on every cycle of the loop.
DFM & ME The Rules –Everything is on a grid –Everything is mono-directional –All metal widths are the same –Contacts same width as metals
Why DFM Easier to perform RET Manufacturability A must for the new generation of transistor sizes.
Pros Regular Layout Enforced Standardization More Accurate Resolution Contacts match metal widths
Example: Group Propagate
CONS Harder to “cut-corners” More time-involving Increased Area Decreased Speed More Metal Layers Learning Curve
Adder Four adders execute 256 times each to generate one number. Hybrid carry skip, carry look ahead, conditional sum, … Fast and low power. Chirca, Schulte, Glossner, et al. “A Static Low-Power, High-Performance 32- bit Carry Skip Adder”
32-Bit Adder Block Diagram CS4CS18CS6CS4 A[3:0]B[3:0]A[9:4]B[9:4] A[27:10]B[27:10] A[31:28]B[31:28] S[31:28]S[27:10]S[9:4]S[3:0] C[0]C’[4]C[10]C’[28]C[32]
First CS4 Block 32-Bit Adder
CS18 Block 32-Bit Adder
32 Bit Fast Adder
Adder Performance Discuss trade off’s in speed and power.
SRAM Single Bus Cell Double Bus Cell
SRAM Single Bus
Dual Bus SRAM
Discuss Speed and Power SRAM power consumption Why we can’t do better with the SRAM
Verification Tested architectural verilog against C code for matching 1024-bit number results. Tested architectural verilog against structural verilog for matching port outputs.
Verification Verified Schematic against Verilog implementation in cadence –Made sure that output was the same –Checked delays and voltage levels Verified layout vs. schematic –Checked levels with parasitics –Performed LVS test
Poly Density 7.06%
Metal Density %
Metal2 Density 18.85%
Metal3 Density 19.24%
Metal5 Density 4.75% Metal4 Density 8.91%
Critical Delay
Specs Pins –40 input pins (including clock, vdd, gnd) –32 output pins (the random number) 475 MHz chip speed 436 KHz throughput
Part Trans Count AreaDensity Prop Delay Schematic #s ExtractRC #s Power 500MHz Power 500 MHz Adders (4) 5,856 (1,464 ea.) 25,200 um2 (6,300um2 ea.) ns 1.56 ns 600 uW 620 uW 140 uW 148 uW SRAM (M&R) 17,736 (M=10,458 R=7,278) 51,000 um2 (M=35,000 R=16, (M=0.293 R=0.456) 735ps 845ps W: 510 uW W: 3.25 mW R: 190 uW R: 1.40 mW 270 uW 1.86 mW Registers (10) 6400 (640 ea.) 38,400um2 (3,840um2 ea.) ps 275 ps 530 uW 590 uW 130 uW 145 uW Total ,000 um ns 475 MHz mW Putting it All Together
Questions