Presentation is loading. Please wait.

Presentation is loading. Please wait.

Example Best and Median Results

Similar presentations


Presentation on theme: "Example Best and Median Results"— Presentation transcript:

1 Example Best and Median Results
Targeting Delay Only: effectively create 16 SHA256 units to work in parallel Targeting Area*Delay: effectively use one SHA256 unit to enumerate 16 nonces Best Delay Only Median Delay Only Best Area*Delay Median Area*Delay #ALUTs 25,201 31,607 1,627 1,525 #Registers 19,432 20,932 1,230 2,076 Area 44,633 52,539 2,857 3,601 Fmax (Mhz) 182.55 134.01 179.21 151.92 #Cycles 225 242 2,201 2,252 Delay (microsecs) 1.233 1.806 12.282 14.821 Area*Delay (millisec*area) 55.012 94.877 35.089 53.369

2 Tips function logic [255:0] sha256_op(input logic [31:0] a, b, c, d, e, f, g, h, w, input logic [7:0] t); logic [31:0] S1, S0, ch, maj, t1, t2; // internal signals begin S1 = rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25); ch = (e & f) ^ ((~e) & g); t1 = ch + S1 + h + k[t] + w; S0 = rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22); maj = (a & b) ^ (a & c) ^ (b & c); t2 = maj + S0; sha256_op = {t1 + t2, a, b, c, d + t1, e, f, g}; end endfunction You can just precompute this part only as p = g + k[t+1]; // g is h[t+1] t1 = ch + S1 + w + p; This way, you don’t need to precompute “w” 2 cycles ahead (just 1 cycle ahead is enough). You can also precompute as follows: p = g + w + k[t+1]; // g is h[t+1], but w needs to be t+2 t1 = ch + S1 + p; Here, you have to precompute “w” 2 cycles ahead. You will need to figure out for yourself how to implement this in SystemVerilog.

3 Tips If you cannot “fit” 16 “SHA 256 computations” into the FPGA, then try fitting 8 “SHA 256 computations”, etc. e.g., #ALUTs > 30,000 and #registers > 20,000 possible when parallel execution is applied. While more sophisticated pipelining (e.g. p = g + k + w) can increase Fmax by e.g. 20 MHz, if the logic is too complex, then Fmax could be reduced by > 20 MHz. Net effect could be negative. e.g., it’s possible to get Fmax = 150 MHz without sophisticated pipelining. Make sure you avoid referencing w[t], which will create MUXes and decoders (will increase area and decrease Fmax). Pipelining can be effective. Just be careful not to make the logic too complicated. Design may not “fit” in FPGA if logic is too complex, even when #ALUTs < 36,000 max on the device.

4 Tips For most people, the critical path will be through the “sha256_op” logic that updates registers “a, b, c, … h”. If you already have an efficient “Delay-only” design (meaning Fmax ≈ 150 MHz, #Cycles < 250, fitter successful), you may try separating out the “sha256_op” logic and “a, b, c, … h” registers into a separate “always_ff” statement (again, sample code below is not necessarily complete): h0 .. h7 constants clk) begin case (state) PREP: if (phase2) begin a <= h0; ... h <= h7; end else begin a <= 32'h6a09e667; h <= 32'h5be0cd19; end COMPUTE: begin {a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[15], t); endcase MUX a b ... h sha256_op

5 Tips If you separate out the “always_ff” statement like this, then only this always_ff statement can assign to “a, b, c, … h”. Note that we don’t need “reset_n” for “a, b, c, … “; they will just be implemented as “edge-triggered flip-flops”. Note this “always_ff” statement does not contain “next state logic” for state. The always_ff is only a function of “state”. h0 .. h7 constants clk) begin case (state) PREP: if (phase2) begin a <= h0; ... h <= h7; end else begin a <= 32'h6a09e667; h <= 32'h5be0cd19; end COMPUTE: begin {a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w[15], t); endcase MUX a b ... h sha256_op

6 Tips Can also put something like this inside a “module”.
But in this case, “state” cannot be declared as an enum, and we’ll have to define the state labels as parameters. Must be consistently defined same as main module, module (input logic clk, input logic [3:0] state, input logic [31:0] h0, h1, h2, h3, h4, h5, h6, h7, w, input logic [7:0] t, output logic [31:0] a, b, c, d, e, f, g, h); parameter IDLE=4’b0000, PREP=4’b0001, COMPUTE=4’0010,... clk) begin case (state) PREP: if (phase2) begin a <= h0; ... h <= h7; end else begin a <= 32'h6a09e667; h <= 32'h5be0cd19; end COMPUTE: begin {a, b, c, d, e, f, g, h} <= sha256_op(a, b, c, d, e, f, g, h, w, t); endcase endmodule h0 .. h7 constants MUX a b ... h sha256_op

7 Tips Many possible implementations, so no single “right way”.
Good rule of thumb is to make your code easy to read. If there are so many nested if-then-else so that the code is hard to read, try to simplify the code as it tends to lead to better implementations. Minimizing the number of states is not necessarily good if it means that you have to add many if-then-else to effectively recreate the same next-state logic. Complexity: Should be possible to implement complete design in lines of code.

8 Tips Debug your design first with a smaller NUM_NONCES. e.g., by changing the NUM_NONCES parameter in testbench and your design to NUM_NONCES = 1 or NUM_NONCES = 2. Testbench module tb_bitcoin_hash(); parameter NUM_NONCES = 16 : Initial begin $stop; end endmodule Can change this parameter to try smaller design Your Design module bitcoin_hash(input logic clk, reset_n ...); parameter NUM_NONCES = 16 : clk, negedge reset_n) begin if (!reset_n) begin end else case (state) endcase end endmodule

9 Final Project Submission
Put following files into (LastName, FirstName)_(LastName, FirstName)_finalproject.zip finalsummary.xlsx (see link in Project5 page or Class schedule page) bitcoin_hash1.sv (min delay) and bitcoin_hash2.sv (min area*delay). Add other sv files if you split your designs into different sv files. transcript1.txt (min delay) and transcript2.sv (min area*delay) message1.txt (min delay) and message2.sv (min area*delay) bitcoin_hash1.fit.rpt (min delay) and bitcoin_hash2.fit.rpt (min area*delay) bitcoin_hash1.sta.rpt (min delay) and bitcoin_hash2.sta.rpt (min area*delay)

10 finalsummary.xlsx See finalsummary.xlsx template provided
See link to this spreadsheet in Project 5 page or Class Schedule page If you worked alone, just fill out one row Spreadsheet already contains calculation fields: e.g. Area = #ALUTs + #Registers. Please use them. Make sure to use Arria II GX EP2AGX45DF29I5 device Make sure to use Fmax for Slow 900mV 100C Model Make sure to use Total number of cycles

11 bitcoin_hash1.sv and bitcoin_hash2.sv
Name your “min delay” design “bitcoin_hash1.sv” and your “min area*delay” design “bitcoin_hash2.sv” Include other sv files if you have more and rename them as needed.

12 transcript1.txt and transcript2.txt
Copy of the ModelSim simulation results. Just need simulation results for tb_bitcoin_hash.sv. After you run the “run –all” command, you can save your transcript by going to the “File” menu and clicking on “save transcript as”. Transcript file will contain the history of all commands used in the current modelsim session. You can clear the current transcript by going to the “Transcript” menu on the GUI and clicking “Clear”. Use Total number of cycles for your cycle count.

13 message1.txt and message2.txt
Copy of the Quartus compilation messages. You can save the messages by “right-clicking” the message window and choosing “save message” IMPORTANT: Make sure that are no warnings about “latches” or “inferred latches”.

14 bitcoin_hash1.fit.rpt and bitcoin_hash2.fit.rpt
Copy of the fitter reports (not the flow report) with area numbers. Make sure to use Arria II GX EP2AGX45DF29I5 device IMPORTANT: Make sure Total block memory bits is 0.

15 No Block Memory Bits In your bitcoin_hash1.fit.rpt and bitcoin_hash2.fit.rpt files, they must say Total block memory bits is 0 (otherwise will not pass). If not, go to “Assignments→Settings” in Quartus, go to “Compiler Settings”, click “Advanced Settings (Synthesis)” Turn OFF “Auto RAM Replacement” and “Auto Shift Register Replacement”

16 No Inferred Megafunctions/Latches
In your Quartus compilation message No inferred megafunctions: Most likely caused by block memories or shift-register replacement. Can turn OFF “Automatic RAM Replacement” and “Automatic Shift Register Replacement” in “Advanced Settings (Synthesis)”. If you still see “inferred megafunctions”, contact Professor. Your design will not pass if it has inferred megafunctions. No inferred latches: Your design will not pass if it has inferred latches.

17 bitcoin_hash1.sta.rpt and bitcoin_hash2.sta.rpt
Copy of the sta (static timing analysis) reports. Make sure to use Fmax for Slow 900mV 100C Model IMPORTANT: Make sure “clk” is the ONLY clock. You must assign mem_clk = clk; Your bitcoin_hash1.sta.rpt and bitcoin_hash2.sta.rpt must show “clk” is the only clock.


Download ppt "Example Best and Median Results"

Similar presentations


Ads by Google