E-Voting Machine - Design Presentation Group M1 Bohyun Jessica Kim Jonathan Chiang Chi Ho Yoon Donald Cober Mon. Sept 29 System Hardware Component Diagram Gate-level Data path Updated Transistor Estimates Floorplan Secure Electronic Voting Terminal
Behavioral Verilog Entire System Gate-level Hardware Block Diagram Updated Transistor Count Calculations Initial Floorplan Structural Verilog Entire System Refined Floorplan Status Update
Data Bus Machine Init FSM User ID FSM Selectio n FSM Confirm ation FSM Display User ID SRAM Message ROM Card Reader Fingerprint Scanner Encryption Key SRAM User Input Write-in SRAM Choice SRAM TX_Check Selection Counter Key Register XOR 8 bit Full Adder 8 bit Full Adder 8 bit Full Adder 8 bit Full Adder XOR 8 bit MUX bit Add/Sub 01 8 bit MUX T: bit REG T: 88 8-bit REG COMMS Register Shift Registe r In Shift Registe r Out constant init
SUPER MUX! SuperMux: Our data flow consists of shuffling 8 bits of data from a source to a destination These sources and destination are SRAMs, User Input, Comms, etc Many are bidirectional Since only one piece of data will be sent at a time, it makes sense to use a bus configuration for data movement rather than a set of giant muxes We can gate which srcs/dests (drop points) are connected to the bus with one level of pass logic This way the data will only ever go through two layers of pass logic to – Get onto the bus – Get off of the bus We will still call this the SuperMux for legacy purposes Layout will be fun data[7:0] Drop point … …
Original Implementation: 64-bit blocks: Two 32-bit inputs 128-bit key: Four 32-bit keys (K[0], K[1], K[2], K[3]) Feistel Structure: Symmetric structure used in block ciphers “Magic” constant: 9E3779B9 (Delta) = 2^32 / (golden ratio) 64 Feistel rounds = 32 cycles E-Voting Machine Implementation: 16-bit blocks: Two 8-bit inputs 32-bit key: Four 8-bit keys 32 Feistel rounds = 16 cycles Decision: Scale up 1.6 golden ratio by magnitude of 10 to 16, scale (2^16) by 10 = and do division / 16 to get Delta. Avoids using Floating point for key scheduler. New Delta = A000, truncate least sig bit to A000 to fit 16 bits when decrypting, since A00 * 8 cycles = 0x5000 Hardware: 4, 5-bit Shifters 16-bit Multipliers 16-bit Adder / Subtractor Tiny Encryption Algorithm Project Specs
COMMS BLOCK Hardware Implementation 1 StatesinA[7:0]inB[7:0]sel_outsel_shift[1:0]sel_sumv_out[7:0] (1)deltasum[7:0] v_out0 = sum[7:0] (2) v1sum[7:0] v_out1= (C+D) (3) v1 << 4k v_out2= (A+B) ^ (C+D) (4) v1 >> 5k v_out3 = (A+B) ^ (C+D) ^ (E+F) (5)v0out v_outx = V0 + (A+B) ^ (C+D) ^ (E+F) States (6)-(9) same as above except using k2, k3, and flip v1, v0 Implementation goes through 9 states/clk cycles each iteration to update output function v_outx. Reusing of: (1x) 8 bit Full adder/sub (Ripple carry) [16*8 = 128] (2x) 2:1 8 bit MUX for output pass-through [4*8*2 = 64] (8x) 2-input XORS [6*8 = 48] (1x) 8 bit REG [11*8 = 88] (1x) 4:1 8 bit MUX for shifting selection [12*8 = 96] In addition, logic will to iterate 8 times and be controlled via FSM machine that uses: (2x) 3:1 8 bit MUX for state input selection [8*8*2 = 128] (2x) 1 bit Counter adder for updating cycle [16*2 = 32] (2x) 1 bit REG for storing updated cycle [11*2 = 22] Total: 606 Advantages: Saves transistors and area for Comms Block Disadvantages: Very heavy pass-logic from MUX layers and XOR High clk frequency required since reusing same components for calculating outx by stages. This translates to higher power consumption since we are trying to do more with less hardware. Tradeoff: Every 8-bit MUX uses 4*8 = 32 transistors compared to 8- bit Full Adder 16*8 = 128 transistors. However MUXES have high pass-logic so area vs. power tradeoff is concerned here. sum += delta; v0 += ((v1 >5)+k1); v1 += ((v0 >5)+k3); sel_out 3:1 8 bit MUX 1-bit REG clk 1 bit Full Adder 8 bit Full Adder/Sub 8 bit MUX 8-bit REG 8’h00 inA[7:0]inB[7:0] sel_sum 01 clk T: 128 T: 48 T: 88 v_outx 4:1 8 bit MUX sel_shift[1:0] T: 64 8 bit MUX 01 T: 32 inA[7:0] sel_shift[1:0] delta 00 v1 01 v1 << 4 10 v1 >> :1 8 bit MUX Logical Shifter Code XOR
COMMS BLOCK Hardware Implementation 2 Implementation 2 does concurrent calculations for all 3 parts of function, completes full iteration of calculations in 2 clk cycles. Uses: (1x) 8 bit Full adder/sub (Ripple carry) [16*8 = 128] (3x) 8 bit Full adder (Ripple carry) [12*8*4 = 384] (4x) 2:1 8 bit MUX for output pass-through [4*8*4 = 128] (16x) 2-input XORS [6*16 = 96] (2x) 8 bit REG [11*8*2 = 176] (1x) 1 bit Counter adder for updating cycle [16] (1x) 1 bit REG for storing updated cycle [11] Total: 939 In addition, logic will not need complex FSM, just needs to do 8 iterations. Advantages: Low pass logic, speed performance, low power, MUX logic transistor count essentially halved. Disadvantages: More Transistor Count and larger area. Tradeoff: Larger area but low pass logic from reduced MUX and complex FSM simplifies design, increases speed and minimizes power. sum += delta; v0 += ((v1 >5)+k1); v1 += ((v0 >5)+k3); XOR clk T: 128 T: 88 v_outx 8 bit Full Adder K0V1 sum K1 T: bit Full Adder 8 bit Full Adder 8 bit Full Adder V0 T: 128 XOR sel_out 8 bit MUX 01 T: 32 8 bit MUX T: 32 {V1[3:0], 4’b0}{5’b0, V1[7:5]} V1 8 bit Add/Sub delta sel_out output 0 pass sum, V1 1 pass new sum, V bit MUX T: bit REG clk T: 88 8-bit REG 1-bit REG clk 1 bit Full Adder
E-Voting TEA Gate Level Hardware Full Adder Common full adder Mirror Adder -Uses 28 transistors (including 4 transistors in inverters) -NMOS and CMOS are completely symmetrical logic :S = a ⊕ b ⊕ Carryin Carryout = (a ⊕ b) Carryin +(a b)
E-Voting TEA Gate Level Hardware Full Adder What we decided to use in this project… 1-bit full adder -Uses pass-transistor logic for computing XNOR -Sum-bit equals to A^B^C, where A and B are 2 inputs and Cin is the Carry-in input; muxing at the bottom will sort out the Cout bit to carry out. -Will use this adder 8 times to compute all 8 bits of data -Uses inverters to strengthen the signal at the end of each XNOR -Uses only 16 transistors yet strong signal
E-Voting TEA Gate Level Hardware XOR -To avoid using two t-gates -Uses 6 transistors (XNOR + inv) MUX T-gate Mux -4 transistors -very tiny hence difficult to layout
E-Voting TEA Gate Level Hardware REG TSPC Register -True single phase clock flip-flop -Advantage of single clock distribution, small area for clock lines, high speed and no clock skew -We will use 8T instead of 9T
SRAM Gate Level Hardware SRAM Cell -6T SRAM Cell -smaller transistor size -lower energy dissipation -efficient layout
SRAM Gate Level Hardware Address Decoder -Combination of inverters and nand gates
SRAM Gate Level Hardware SRAM -Input/Ouput tri-state buffers? -Need of Sense amplifier?
Encryption Key SRAM (4 byte) 2bit Address 8bit Data Card Reader 1bit Card Detected Signal Machine Initialization FSM 1bit Activate next Data Bus 8bit Data COMMS 1bit Data Ready 8bit Data 1bit Message Message ROM 8bit Data 4-bit Data bus control
User ID SRAM (8 byte) 3bit Address 8bit Data Card Reader 1bit Card Detected Signal User ID FSM 1bit Activate next Data Bus 8bit Data COMMS 1bit Data Ready 8bit Data 2bit Message Message ROM 8bit Data Fingerprint Scanner 1bit Finger Scanned Signal 8bit Data 1bit Activate this 1bit Reactivate this Display 8bit Data 7-bit Data bus control User Input 1bit Yes Signal 1bit No Signal
Choice SRAM (4 byte) 2bit Address 8bit Data User Input 1bit Next Page Signal Selection FSM 1bit Activate next Data Bus 8bit Data COMMS 1bit Data Ready 8bit Data 2bit Message Message ROM 8bit Data 1bit Activate this 1bit Reactivate this Display 8bit Data 6-bit Data bus control 1bit Previous Page Signal Selection Counter 8bit Data 3bit Count
User Input 1bit Yes Signal Confirmation FSM 1bit Reactivate Selection Data Bus COMMS 1bit Data Ready 8bit Data 2bit Message Message ROM 8bit Data 1bit Activate this Display 8bit Data 8-bit Data bus control 1bit No Signal 1bit Reactivate User ID User ID SRAM (8 byte) 8bit Data Write-in SRAM (64 byte) 8bit Data Choice SRAM (4 byte) 8bit Data 3bit Address 2bit Address 6bit Address 1bit Reset TX_Check 1bit TX_good
The statement that we only transfer one byte of data at a time is technically false For example: Encryption Key SRAM (4 byte) COMMSMessage ROM When the Message ROM is sending a message to the COMMS The COMMS are using data from the Encryption Key SRAM to encode the message SUPER MUX! We can circumvent this by hardwiring the Encryption Key SRAM data to the COMMs Key input in addition to attaching it to the bus. This only works because the Key SRAM will never be active on the data bus while the COMMs are accessing it Data Bus
SUPER MUX! Other hardwired Connections: Choice SRAMTX Check The transmission check confirms that the data sent to the main computer and held in it’s current session matches the choices stored in our SRAM During the Confirmation FSM the SRAM data is sent to the main computer and the main computer echos it back. The echo is streamed into the TX Check (as well as the display) and the TX Check compares it (as it is streaming) to the Choice SRAM Write-In SRAMUser Input
Converting Behavioral Verilog to Transistor Counts module machine_init_fsm(clk, cardDetectSig, commDetectSig, actNext, mux_src, mux_dest, message, address); //Initialize initial begin actNext = 0; state = 0; next_state = 1'b0; end //Main FSM begin if(!actNext) begin case (state) `s1: begin mux_src = 0; mux_dest = 0; //Wait for card data if(cardDetectSig) begin //Send card data to the Key SRAM next_address = 0; next_state = `s2; end `s2: begin mux_src = `CARD_SRC; mux_dest = `KEY_SRAM_DEST; //read in 4 bytes from card reader if(address==3) begin next_state = `s3; end next_address = address + 1; end `s3: begin //Send a key request to the comms message = `KEY_REQUEST; mux_src = `MESSAGE_SRC; mux_dest = `COMMS_DEST; next_state = `s4; end `s4: begin mux_src = 0; mux_dest = 0; next_address = 0; //Wait for data to arrive if(commDetectSig==0) begin next_state = `s4; end else begin next_state = `s5; end `s5: begin mux_src = `COMMS_SRC; mux_dest = `KEY_SRAM_DEST; //read in 4 bytes from card reader if(address==3) begin next_state = `s6; end next_address = address + 1; end `s6: begin //proceed mux_src = 9'bzzzzzzzzz; mux_dest = 8'bzzzzzzzz; message = 3'bzzz; address = 2'bzz; next_address = 2'bzz; actNext = 1; end endcase end else begin mux_src = 9'bzzzzzzzzz; mux_dest = 8'bzzzzzzzz; message = 3'bzzz; address = 2'bzz; next_address = 2'bzz; end //State Register: clk) begin state = next_state; address = next_address; end endmodule Machine Init FSM 1.Create registers: 6 states => 3 D-flip-Flops + 2bit SRAM address 2.State Change Logic: Most changes are sequentially incrementing Flip Flops are configured as counters 3.Further Logic: Remaining logic consists of output signals generated mostly by state Random logic can be approximated based on number and configuration of outputs D ~Q > Q D ~Q > Q D ~Q > Q D ~Q > Q D ~Q > Q State:srcdestmessage CARDKEY0 3MESSAGECOMMSKEY_REQUEST COMMSKEY0 6zNEXTz 5 distinct 1bit outputs Each 1-bit output derived from a 3-bit input (state) Approx 2 / 2 input gates for each ~10 transistors tfor each distinct output 50 transistors total for random logic
Converting Behavioral Verilog to Transistor Counts (cont) BlockStatesAddressRegisters Distinct Outputs RandomTransistors Machine Init FSM62 bits User ID FSM123 bits Selection FSM72 bits Confirmation FSM96 bits User InputNA6 bits Selection CounterNA TX CompareNA2 bit31033 BlockPoints on BusT-gatesTransistors Data Bus MUX BlockMessagesInputs~ Gates / BitTransistors Message ROM8 (1 byte)87 (35 transistors) 280 Total: 1425
Converting Behavioral Verilog to Transistor Counts (cont) BlockBitsAddress transistorsTransistors Key SRAM328*(2^2)+2*2 = User ID SRAM648*(2^3)+2*3 = Choice SRAM328*(2^2)+2*2 = Write-In SRAM5128*(2^6)+2*6 = Total: 7254 BlockBitsTransistors COMMs 939 Shift IN888 Shift Out888 Input/Output MUX832 Register16176
Write-In SRAM Choice SRAM User ID SRAM Encryption Key SRAM Comm Register MUX User Input USER ID FSM COMMS Shift In Shift Out Selection FSM Confirmation FSM Machine Init FSM
Questions? Thank you!