These 19 words are given and fixed

Slides:



Advertisements
Similar presentations
//HDL Example 8-2 // //RTL description of design example (Fig.8-9) module Example_RTL (S,CLK,Clr,E,F,A);
Advertisements

Verilog Overview. University of Jordan Computer Engineering Department CPE 439: Computer Design Lab.
Table 7.1 Verilog Operators.
Hardware Description Language (HDL)
CSE 201 Computer Logic Design * * * * * * * Verilog Modeling
Verilog. 2 Behavioral Description initial:  is executed once at the beginning. always:  is repeated until the end of simulation.
//HDL Example 6-1 // //Behavioral description of //Universal shift register // Fig. 6-7 and Table 6-3 module shftreg.
FSM examples.
Verilog Sequential Circuits Ibrahim Korpeoglu. Verilog can be used to describe storage elements and sequential circuits as well. So far continuous assignment.
Overview Logistics Last lecture Today HW5 due today
SoC Verification HW #2 TA: Wei-Ting Tu Assignment: 04/12/06
ECE 551 Digital System Design & Synthesis Fall 2011 Midterm Exam Overview.
1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.
EEE2243 Digital System Design Chapter 4: Verilog HDL (Sequential) by Muhazam Mustapha, January 2011.
1 Hardware description languages: introduction intellectual property (IP) introduction to VHDL and Verilog entities and architectural bodies behavioral,
M.Mohajjel. Structured Procedures Two basic structured procedure statements always initial All behavioral statements appear only inside these blocks Each.
Chapter 11: System Design Methodology Digital System Designs and Practices Using Verilog HDL and 2008, John Wiley11-1 Chapter 11: System Design.
OUTLINE Introduction Basics of the Verilog Language Gate-level modeling Data-flow modeling Behavioral modeling Task and function.
1 University of Jordan Computer Engineering Department CPE 439: Computer Design Lab.
EMT 351/4 DIGITAL IC DESIGN Verilog Behavioral Modeling  Finite State Machine -Moore & Mealy Machine -State Encoding Techniques.
Pusat Pengajian Kejuruteraan Mikroelektronik EMT 351/4 DIGITAL IC DESIGN Verilog Behavioural Modeling (Part 4) Week #
1 Lecture 3: Modeling Sequential Logic in Verilog HDL.
Figure Implementation of an FSM in a CPLD..
Structural Description
Overview Logistics Last lecture Today HW5 due today
Hardware Description Languages: Verilog
Project 2: Byte Rotation
Development Environment
EECE6017C - Lab 0 Introduction to Altera tools and Basic Digital Logic
Supplement on Verilog FF circuit examples
“The quick brown fox jumps over the lazy dog”
Supplement on Verilog for Algorithm State Machine Chart
Figure 8.1. The general form of a sequential circuit.
Last Lecture Talked about combinational logic always statements. e.g.,
Discussion 2: More to discuss
Testbenches HDL that tests another module: device under test (dut)
Direct Memory address and 8237 dma controller LECTURE 6
‘if-else’ & ‘case’ Statements
Learning Outcome By the end of this chapter, students are expected to be able to: Design State Machine Write Verilog State Machine by Boolean Algebra and.
Hardware Description Languages: Verilog
FPGA Implementation of Multicore AES 128/192/256
Hardware Description Languages
Registers and Counters
Example Best and Median Results
Chapter 9: Sequential Logic Modules
Why segregate blocking and non-blocking assignments to separate always blocks? always blocks start when triggered and scan their statements sequentially.
Testbenches HDL that tests another module: device under test (dut)
Computer Architecture and Design Lecture 6
FSM MODELING MOORE FSM MELAY FSM. Introduction to DIGITAL CIRCUITS MODELING & VERIFICATION using VERILOG [Part-2]
ESE 437: Sensors and Instrumentation
Registers and Counters
Final Testbench: tb_final_shp.sv
SystemVerilog Implementation of GCD
332:437 Lecture 8 Verilog and Finite State Machines
Chapter 4: Behavioral Modeling
Test Fixture (Testbench)
Register-Transfer Level Components in Verilog
The Verilog Hardware Description Language
Lecture 4: Continuation of SystemVerilog
332:437 Lecture 9 Verilog Example
332:437 Lecture 9 Verilog Example
Registers and Counters
Previously, we discussed about “prototyping” code for SHA1 and SHA256
Verilog Synthesis & FSMs
Registers and Counters
332:437 Lecture 8 Verilog and Finite State Machines
332:437 Lecture 9 Verilog Example
Registers and Counters
Advanced Computer Architecture Lecture 3
Lecture 7: Verilog Part II
Presentation transcript:

These 19 words are given and fixed Bitcoin Hashing Bitcoin’s header: Field Purpose Updated when … Size (Words) Version Block version number You upgrade the software and it specifies a new version 1 hashPrevBlock 256-bit hash of the previous block header A new block comes in 8 hashMerkleRoot 256-bit hash based on all of the transactions in the block A transaction is accepted Time Current timestamp as seconds since 1970-01-01T00:00 UTC Every few seconds Bits Current target in compact format The difficulty is adjusted Nonce 32-bit number (starts at 0) A hash is tried (increments) Main Point These 19 words are given and fixed Try different nonces

Bitcoin Hashing Bitcoin hashing Change input message by changing the “nonce” (32-bits = 1 word), starting with nonce = 0 … Keep trying new nonces 1, 2, … until finish hash < target 8 word hash Final 8 word hash Fixed Part of Block Header (19 words) SHA256 H0 SHA256 H0 H1 H1 H2 H2 H3 H3 H4 H4 H5 H5 H6 H6 Nonce H7 H7 Pad 640-bit message into 2 blocks Pad 256-bit message into 1 block

Final Project For the final project, we will simply compute final hashes for 16 nonces, nonce = 0, 1, 2, … 15 without checking if any < target Key observation: The hash computation for the 1st block of the 1st hash is the same for all nonce values; therefore, can be computed just once. 8 word hash Final 8 word hash Fixed Part of Block Header (19 words) SHA256 H0 SHA256 H0 H1 H1 H2 H2 H3 H3 H4 H4 H5 H5 H6 H6 Nonce H7 H7 Pad 640-bit message into 2 blocks Pad 256-bit message into 1 block

Final Project Compute final hash for SHA256(SHA256(message)) for 16 nonces = 0, 1, … 15, each message = {block header, nonce} Will produce 16 final hashes H0[0], H1[0], H2[0], H3[0], H4[0], H5[0], H6[0], H7[0] H0[1], H1[1], H2[1], H3[1], H4[1], H5[1], H6[1], H7[1] : : H0[15], H1[15], H2[15], H3[15], H4[15], H5[15], H6[15], H7[15] We will just write to memory H0[0], H0[1] …, H0[15], a total of 16 words

Final Project Module Interface Wait in idle state for start Read 19 word block header starting at block_addr Compute final hash for SHA256(SHA256(message)) for 16 nonces, each message = {block header, nonce} Just write final H0 for each of the 16 nonces into memory starting at output_addr. Set done to 1 when finished. Memory (provided by testbench) bitcoin_hash mem_clk mem_addr[15:0] mem_we mem_write_data [31:0] mem_read_data[31:0] memory interface clk reset_n message_addr[31:0] start done output_addr[31:0]

Final Project Module Interface Write the final hash values for H0[0], H0[1] …, H0[15] in 16 words to memory starting at output_addr as follows: output_addr H0[0] output_addr + 1 H0[1] : : output_addr + 31 H0[15]

Final Project Module Interface Your assignment is to design the yellow box: module bitcoin_hash (input logic clk, reset_n, start, input logic [31:0] message_addr, output_addr, output logic done, mem_clk, mem_we, output logic [15:0] mem_addr, output logic [31:0] mem_write_data, input logic [31:0] mem_read_data); ... endmodule Memory (provided by testbench) bitcoin_hash mem_clk mem_addr[15:0] mem_we mem_write_data [31:0] mem_read_data[31:0] memory interface clk reset_n message_addr[31:0] start done output_addr[31:0]

Rough Estimation of Cycles Basic implementation: at least 2147 cycles Cycle Count Step Comments 19 Read 19 words 64 Process 1st block in 1st SHA256 hash Same for all 16 nonces 16*64 = 1024 For each nonce, process 2nd block of 1st SHA256 hash For each nonce, compute 2nd SHA256 hash 16 For each nonce, write out H0

Rough Estimation of Cycles Hide reading: at least 2128 cycles Cycle Count Step Comments 64 Process 1st block in 1st SHA256 hash 19 words read “on-the-fly”. Same for all 16 nonces 16*64 = 1024 For each nonce, process 2nd block of 1st SHA256 hash For each nonce, compute 2nd SHA256 hash 16 For each nonce, write out H0

Rough Estimation of Cycles Parallel execution: at least 208 cycles Cycle Count Step Comments 64 Process 1st block in 1st SHA256 hash 19 words read “on-the-fly”. Same for all 16 nonces For all 16 nonces, compute in parallel the 2nd block of 1st SHA256 hash Requires more hardware For all 16 nonces, compute in parallel the 2nd SHA256 hash 16 For each nonce, write out H0

Rough Estimation of Cycles Parallel execution and unfolding: at least 112 cycles Cycle Count Step Comments 32 Process 1st block in 1st SHA256 hash 19 words read “on-the-fly”. Same for all 16 nonces For all 16 nonces, compute in parallel the 2nd block of 1st SHA256 hash by doing 2 rounds per cycle Requires more hardware For all 16 nonces, compute in parallel the 2nd SHA256 hash by doing 2 rounds per cycle 16 For each nonce, write out H0

Implementing Parallelism Can use module instantiation to create multiple instances of the SHA256 unit. SHA256 SHA256 . . . . . SHA256 parameter NUM_NONCES = 16 // INSTANTIATE SHA256 MODULES genvar q; generate for (q = 0; q < NUM_NONCES; q++) begin : generate_sha256_blocks sha256_block block ( .clk(clk), .reset_n(reset_n), .start(blk_start[q]), ... .done(blk_done[q])); end endgenerate always_ff @(posedge clk, negedge reset_n) begin

Implementing Parallelism Can also implement “vectorization” like this (effectively doing SIMD execution like a GPU). This will create 16 sets of A, B, … H registers and 16 sets of logic for sha256_op, but under the same state machine control. parameter NUM_NONCES = 16 logic [31:0] A[NUM_NONCES], B[NUM_NONCES], ..., H[NUM_NONCES]; always_ff @(posedge clk, negedge reset_n) begin if (!reset_n) begin ... end else case (state) IDLE: : COMPUTE: begin for (int n = 0; n < NUM_NONCES; n++) begin {A[n], B[n], ..., H[n]} <= sha256_op(A[n], B[n], ..., H[n], ..., t); end endcase

Optimization in Quartus In practice, these modes don’t always do what you want, so wait until the end to try out different optimization modes.