Example Best and Median Results

Slides:



Advertisements
Similar presentations
Synchronous Sequential Logic
Advertisements

EE 361 Fall 2003University of Hawaii1 Hardware Design Tips EE 361 University of Hawaii.
Combinational Logic.
Verilog Modules for Common Digital Functions
Table 7.1 Verilog Operators.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Slide 1 7. Verilog: Combinational always statements. VHDL: Combinational Processes: To avoid (I.E. DO NOT What in your HDL code?) Cases that generate Synthesis.
ECE 551 Digital System Design & Synthesis Lecture 09 Synthesis of Common Verilog Constructs.
ELEN 468 Lecture 91 ELEN 468 Advanced Logic Design Lecture 9 Behavioral Descriptions III.
ELEN 468 Advanced Logic Design
Overview Logistics Last lecture Today HW5 due today
ECE 551 Digital System Design & Synthesis Lecture 11 Verilog Design for Synthesis.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
Chapter 11: System Design Methodology Digital System Designs and Practices Using Verilog HDL and 2008, John Wiley11-1 Ders 8: FSM Gerçekleme ve.
Slide 1 6. VHDL/Verilog Behavioral Description. Slide 2 Verilog for Synthesis: Behavioral description Instead of instantiating components, describe them.
© 2003 Xilinx, Inc. All Rights Reserved Global Timing Constraints FPGA Design Flow Workshop.
Anurag Dwivedi. Basic Block - Gates Gates -> Flip Flops.
Behavioral Modelling - 1. Verilog Behavioral Modelling Behavioral Models represent functionality of the digital hardware. It describes how the circuit.
Finite State Machine (FSM) Nattha Jindapetch December 2008.
Design, Memory, and Debugging CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos.
COMP541 Sequential Circuits
Teaching Digital Logic courses with Altera Technology
Chapter 11: System Design Methodology Digital System Designs and Practices Using Verilog HDL and 2008, John Wiley11-1 Chapter 11: System Design.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU 99-1 Under-Graduate Project Design of Datapath Controllers Speaker: Shao-Wei Feng Adviser:
1 COMP541 Sequential Circuits Montek Singh Feb 24, 2016.
1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.
1 Lecture 3: Modeling Sequential Logic in Verilog HDL.
Overview Logistics Last lecture Today HW5 due today
Project 2: Byte Rotation
EECE6017C - Lab 0 Introduction to Altera tools and Basic Digital Logic
Lecture 10 Flip-Flops/Latches
Supplement on Verilog FF circuit examples
“The quick brown fox jumps over the lazy dog”
Lab 1: Using NIOS II processor for code execution on FPGA
Figure 8.1. The general form of a sequential circuit.
Last Lecture Talked about combinational logic always statements. e.g.,
ELEN 468 Advanced Logic Design
EMT 351/4 DIGITAL IC DESIGN Week # Synthesis of Sequential Logic 10.
COMP541 Sequential Circuits
Final Project 6 Submission
‘if-else’ & ‘case’ Statements
Advance Skills TYWu.
Timing issues.
FPGA Implementation of Multicore AES 128/192/256
These 19 words are given and fixed
Latches and Flip-flops
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
CS Fall 2005 – Lec. #5 – Sequential Logic - 1
SYNTHESIS OF SEQUENTIAL LOGIC
FPGA Tools Course Basic Constraints
Final Testbench: tb_final_shp.sv
SystemVerilog Implementation of GCD
Chapter 4: Behavioral Modeling
COE 202 Introduction to Verilog
Register-Transfer Level Components in Verilog
Introduction to Verilog
Dr. Tassadaq Hussain Introduction to Verilog – Part-3 Expressing Sequential Circuits and FSM.
Sequential Logic.
The Verilog Hardware Description Language
Introduction to Verilog
Lecture 4: Continuation of SystemVerilog
ECE 551: Digital System Design & Synthesis
Previously, we discussed about “prototyping” code for SHA1 and SHA256
Dr. Tassadaq Hussain Introduction to Verilog – Part-4 Expressing FSM in Verilog (contd) and FSM State Encoding Dr. Tassadaq Hussain.
Verilog Synthesis & FSMs
EGR 2131 Unit 12 Synchronous Sequential Circuits
Sequntial-Circuit Building Blocks
Lecture 3: Timing & Sequential Circuits
Lecture 7: Verilog Part II
Presentation transcript:

Example Best and Median Results Targeting Delay Only: effectively create 16 SHA256 units to work in parallel Targeting Area*Delay: effectively use one SHA256 unit to enumerate 16 nonces Best Delay Only Median Delay Only Best Area*Delay Median Area*Delay #ALUTs 25,201 31,607 1,627 1,525 #Registers 19,432 20,932 1,230 2,076 Area 44,633 52,539 2,857 3,601 Fmax (Mhz) 182.55 134.01 179.21 151.92 #Cycles 225 242 2,201 2,252 Delay (microsecs) 1.233 1.806 12.282 14.821 Area*Delay (millisec*area) 55.012 94.877 35.089 53.369

Tips function logic [255:0] sha256_op(input logic [31:0] a, b, c, d, e, f, g, h, w, input logic [7:0] t); logic [31:0] S1, S0, ch, maj, t1, t2; // internal signals begin S1 = rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25); ch = (e & f) ^ ((~e) & g); t1 = ch + S1 + h + k[t] + w; S0 = rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22); maj = (a & b) ^ (a & c) ^ (b & c); t2 = maj + S0; sha256_op = {t1 + t2, a, b, c, d + t1, e, f, g}; end endfunction You can just precompute this part only as p = g + k[t+1]; // g is h[t+1] t1 = ch + S1 + w + p; This way, you don’t need to precompute “w” 2 cycles ahead (just 1 cycle ahead is enough). You can also precompute as follows: p = g + w + k[t+1]; // g is h[t+1], but w needs to be t+2 t1 = ch + S1 + p; Here, you have to precompute “w” 2 cycles ahead. You will need to figure out for yourself how to implement this in SystemVerilog.

Tips If you cannot “fit” 16 “SHA 256 computations” into the FPGA, then try fitting 8 “SHA 256 computations”, etc. e.g., #ALUTs > 30,000 and #registers > 20,000 possible when parallel execution is applied. While more sophisticated pipelining (e.g. p = g + k + w) can increase Fmax by e.g. 20 MHz, if the logic is too complex, then Fmax could be reduced by > 20 MHz. Net effect could be negative. e.g., it’s possible to get Fmax = 150 MHz without sophisticated pipelining. Make sure you avoid referencing w[t], which will create MUXes and decoders (will increase area and decrease Fmax). Pipelining can be effective. Just be careful not to make the logic too complicated. Design may not “fit” in FPGA if logic is too complex, even when #ALUTs < 36,000 max on the device.

Tips For most people, the critical path will be through the “sha256_op” logic that updates registers “a, b, c, … h”. If you already have an efficient “Delay-only” design (meaning Fmax ≈ 150 MHz, #Cycles < 250, fitter successful), you may try separating out the “sha256_op” logic and “a, b, c, … h” registers into a separate “always_ff” statement (again, sample code below is not necessarily complete): h0 .. h7 constants always_ff @(posedge clk) begin case (state) PREP: if (phase2) begin a <= h0; ... h <= h7; end else begin a <= 32'h6a09e667; h <= 32'h5be0cd19; end COMPUTE: begin {a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[15], t); endcase MUX a b ... h sha256_op

Tips If you separate out the “always_ff” statement like this, then only this always_ff statement can assign to “a, b, c, … h”. Note that we don’t need “reset_n” for “a, b, c, … “; they will just be implemented as “edge-triggered flip-flops”. Note this “always_ff” statement does not contain “next state logic” for state. The always_ff is only a function of “state”. h0 .. h7 constants always_ff @(posedge clk) begin case (state) PREP: if (phase2) begin a <= h0; ... h <= h7; end else begin a <= 32'h6a09e667; h <= 32'h5be0cd19; end COMPUTE: begin {a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[15], t); endcase MUX a b ... h sha256_op

Tips Can also put something like this inside a “module”. But in this case, “state” cannot be declared as an enum, and we’ll have to define the state labels as parameters. Must be consistently defined same as main module, module (input logic clk, input logic [3:0] state, input logic [31:0] h0, h1, h2, h3, h4, h5, h6, h7, w, input logic [7:0] t, output logic [31:0] a, b, c, d, e, f, g, h); parameter IDLE=4’b0000, PREP=4’b0001, COMPUTE=4’0010,... always_ff @(posedge clk) begin case (state) PREP: if (phase2) begin a <= h0; ... h <= h7; end else begin a <= 32'h6a09e667; h <= 32'h5be0cd19; end COMPUTE: begin {a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w, t); endcase endmodule h0 .. h7 constants MUX a b ... h sha256_op

Tips Many possible implementations, so no single “right way”. Good rule of thumb is to make your code easy to read. If there are so many nested if-then-else so that the code is hard to read, try to simplify the code as it tends to lead to better implementations. Minimizing the number of states is not necessarily good if it means that you have to add many if-then-else to effectively recreate the same next-state logic. Complexity: Should be possible to implement complete design in 300-400 lines of code.

Tips Debug your design first with a smaller NUM_NONCES. e.g., by changing the NUM_NONCES parameter in testbench and your design to NUM_NONCES = 1 or NUM_NONCES = 2. Testbench module tb_bitcoin_hash(); parameter NUM_NONCES = 16 : Initial begin $stop; end endmodule Can change this parameter to try smaller design Your Design module bitcoin_hash(input logic clk, reset_n ...); parameter NUM_NONCES = 16 : always_ff @(posedge clk, negedge reset_n) begin if (!reset_n) begin end else case (state) endcase end endmodule

Final Project Submission Put following files into (LastName, FirstName)_(LastName, FirstName)_finalproject.zip finalsummary.xlsx (see link in Project5 page or Class schedule page) bitcoin_hash1.sv (min delay) and bitcoin_hash2.sv (min area*delay). Add other sv files if you split your designs into different sv files. transcript1.txt (min delay) and transcript2.sv (min area*delay) message1.txt (min delay) and message2.sv (min area*delay) bitcoin_hash1.fit.rpt (min delay) and bitcoin_hash2.fit.rpt (min area*delay) bitcoin_hash1.sta.rpt (min delay) and bitcoin_hash2.sta.rpt (min area*delay)

finalsummary.xlsx See finalsummary.xlsx template provided See link to this spreadsheet in Project 5 page or Class Schedule page If you worked alone, just fill out one row Spreadsheet already contains calculation fields: e.g. Area = #ALUTs + #Registers. Please use them. Make sure to use Arria II GX EP2AGX45DF29I5 device Make sure to use Fmax for Slow 900mV 100C Model Make sure to use Total number of cycles

bitcoin_hash1.sv and bitcoin_hash2.sv Name your “min delay” design “bitcoin_hash1.sv” and your “min area*delay” design “bitcoin_hash2.sv” Include other sv files if you have more and rename them as needed.

transcript1.txt and transcript2.txt Copy of the ModelSim simulation results. Just need simulation results for tb_bitcoin_hash.sv. After you run the “run –all” command, you can save your transcript by going to the “File” menu and clicking on “save transcript as”. Transcript file will contain the history of all commands used in the current modelsim session. You can clear the current transcript by going to the “Transcript” menu on the GUI and clicking “Clear”. Use Total number of cycles for your cycle count.

message1.txt and message2.txt Copy of the Quartus compilation messages. You can save the messages by “right-clicking” the message window and choosing “save message” IMPORTANT: Make sure that are no warnings about “latches” or “inferred latches”.

bitcoin_hash1.fit.rpt and bitcoin_hash2.fit.rpt Copy of the fitter reports (not the flow report) with area numbers. Make sure to use Arria II GX EP2AGX45DF29I5 device IMPORTANT: Make sure Total block memory bits is 0.

No Block Memory Bits In your bitcoin_hash1.fit.rpt and bitcoin_hash2.fit.rpt files, they must say Total block memory bits is 0 (otherwise will not pass). If not, go to “Assignments→Settings” in Quartus, go to “Compiler Settings”, click “Advanced Settings (Synthesis)” Turn OFF “Auto RAM Replacement” and “Auto Shift Register Replacement”

No Inferred Megafunctions/Latches In your Quartus compilation message No inferred megafunctions: Most likely caused by block memories or shift-register replacement. Can turn OFF “Automatic RAM Replacement” and “Automatic Shift Register Replacement” in “Advanced Settings (Synthesis)”. If you still see “inferred megafunctions”, contact Professor. Your design will not pass if it has inferred megafunctions. No inferred latches: Your design will not pass if it has inferred latches.

bitcoin_hash1.sta.rpt and bitcoin_hash2.sta.rpt Copy of the sta (static timing analysis) reports. Make sure to use Fmax for Slow 900mV 100C Model IMPORTANT: Make sure “clk” is the ONLY clock. You must assign mem_clk = clk; Your bitcoin_hash1.sta.rpt and bitcoin_hash2.sta.rpt must show “clk” is the only clock.