Digital Design using FPGAs and Verilog HDL Project IT – Autumn 2016 Mahdad Davari <mahdad.davari@it.uu.se>
Programmable Devices Since 1969: PROM, (E)EPROM, PAL, PLA, GAL, CPLD, FPGA Key Players in programmable-device industry: Altera (first CPLD) Xilinx (first FPGA)
FPGA from a Bird’s-Eye View
FPGA in a Nutshell
Logic Slice
Look-Up Table (LUT) SRAM cells 1 abc
FPGA Bird’s-Eye View
Roadmap Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example
FPGA Design Flow Design Entry (RTL design using HDL) Behavioral Simulation (ModelSIM) Behaviour OK? NO Synthesis (Quartus II) Place and Route (PAR) (Quartus II) Timing Analysis (Quartus II) SpeedOK? NO Generate Bit Stream & Programme the Device (Quartus II)
Roadmap Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example
CPU vs FPGA vs ASIC High Low
Roadmap Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example
One Monday Morning … FFT algorithm on CPU
Butterfly Operation
4-Point Butterfly Operation X0 2-Point BF 2-Point BF Y0 X2 Y1 X1 2-Point BF 2-Point BF Y2 X3 Y3
8-Point Butterfly Operation
16-Point Butterfly Operation
32-Point Butterfly Operation
Speedup ≈ 8*TMem. + 24*TALU CPU 8x Accel. 1*TMem. + 3*TALU ( CPU TMem. ≈ Accel. TMem. ) ( CPU TALU ≈ Accel. TALU )
Roadmap Programmable Devices FPGA Design Flow FPGA vs GP-CPU vs ASIC Accelerator Design Example Verilog HDL Example
Top-Down Design 8-Point FFT O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7
Top-Down Design
Top-Down Design i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 Top: 8-Point FFT i0 o0 i0 o0 i0 o0 O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7 2-Point BF 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 FFT:Stage 1 FFT:Stage 2 FFT:Stage 3
Top-Down Design i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 Top: 8-Point FFT i0 o0 i0 o0 i0 o0 O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7 2-Point BF 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 Pipe1 Pipe2 Pipe3 FFT:Stage 1 FFT:Stage 2 FFT:Stage 3
Top-Down Design
Top-Down Design i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 Top: 8-Point FFT i0 o0 i0 o0 i0 o0 O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7 2-Point BF 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 Pipe1 Pipe2 Pipe3 FFT:Stage 1 FFT:Stage 2 FFT:Stage 3
Top-Down Design
Top-Down Design i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 Top: 8-Point FFT i0 o0 i0 o0 i0 o0 I0 I4 I2 I6 I1 I5 I3 I7 1 2 3 4 5 6 7 2-Point BF O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 Pipe1 Pipe2 Pipe3 FFT:Stage 1 FFT:Stage 2 FFT:Stage 3
Top-Down Design i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 Top: 8-Point FFT i0 o0 i0 o0 i0 o0 I0 I4 I2 I6 I1 I5 I3 I7 2-Point BF O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 FFT:Stage 1 FFT:Stage 2 FFT:Stage 3 Valid Ready Reset Clock Pipe1 Pipe2 Pipe3
Bottom-Up Implementation X0 Y0 2-Point BFY X1 Y1 module butterfly (x0, x1, y0, y1); input x0, x1; output y0, y1; assign y0 = x0 + x1; assign y1 = x0 – x1; endmodule Adder Add1 (y0, x0, x1); Subtractor Sub1 (y1, x0, x1);
Bottom-Up Implementation Y0 module fft (i0, i1, i2, i3, i4, i5, i6, i7, o0, o1, o2, o3, o4, o5, o6, o7); Input i0, i1, i2, i3, i4, i5, i6, i7; output o0, o1, o2, o3, o4, o5, o6, o7; butterfly bf1 (i0, i1, o0, o1); butterfly bf2 (i2, i3, o2, o3); butterfly bf3 (i4, i5, o4, o5); butterfly bf4 (.y0 (o6), .y1 (o7), .x0 (i6), .x1 (i7)); endmodule i0 o0 1 2 3 4 5 6 7 2-Point BF 1 2 3 4 5 6 7 i1 o1 X0 Y1 i0 o0 X1 i1 o1 i0 o0 i1 o1 i0 o0 i1 o1 FFT
Top-Down Design i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 Top: 8-Point FFT i0 o0 i0 o0 i0 o0 I0 I4 I2 I6 I1 I5 I3 I7 2-Point BF O0 O1 O2 O3 O4 O5 O6 O7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 i0 o0 i0 o0 i0 o0 i1 o1 i1 o1 i1 o1 FFT:Stage 1 FFT:Stage 2 FFT:Stage 3 Valid Ready Reset Clk Pipe1 Pipe2 Pipe3
Bottom-Up Implementation module top (i0, i1, i2, i3, i4, i5, i6, i7, valid, rst, clk, o0, o1, o2, o3, o4, o5, o6, o7, ready); input i0, i1, i2, i3, i4, i5, i6, i7, valid, rst, clk; output o0, o1, o2, o3, o4, o5, o6, o7, ready; endmodule module top (input [7:0] i, input valid, rst, clk, output [7:0] o, output ready); reg [8:0] pipe1; reg [8:0] pipe2; reg [8:0] pipe3; wire [7:0] w1; wire [7:0] w2; wire [7:0] w3; fft stage1 (i[0], i[4], i[2], i[6], i[1], i[5], i[3], i[7], w1[0:7]); fft stage2 (pipe1[0], pipe1[2], pipe1[1], pipe1[3], pipe1[4], pipe1[6], pipe1[5], pipe1[7], w2[0:7]); fft stage3 (pipe2[0], pipe2[4], pipe2[2], pipe2[6], pipe2[1], pipe2[5], pipe2[3], pipe2[7], w3[0:7]); // continued in the next slide … Y0 Y1
Bottom-Up Implementation // continued from the previous slide … always @ (posedge clk) begin if (rst) pipe1 <= 9’b000000000; pipe2 <= 9’d0; pipe3 <= 0; end else pipe1 <= {valid, w1}; pipe2 <= {pipe1[8], w2}; pipe3 <= {pipe2[8], w3}; // continued in the next slide … Y0 Y1
Bottom-Up Implementation // continued from the previous slide … always @ (w3 or pipe3[8]) // also always @ (w3, pipe3[8]), or simply always @ (*) for all the signals Begin {ready, o} = pipe3; end // assign {ready,o} = pipe3; endmodule Y0 Y1
Testbench Testbench Top (Design Under Test) Input Generator Expected Result == Input Output Test OK!
Testbench module fft_tb; reg clk, rst, valid; reg [8:0] i; wire [8:0] o; wire ready; top dut (.i(i), .valid(valid), .rst(rst), .clk(clk), .o(o), .ready(ready)); always #5 clk = !clk; initial begin rst=0; clk=0; valid=0; rst = #20 1’b1; i = #20 8’hff; valid = 1’b1; valid = #10 1’b0; #50 $finish; end endmodule
Net Types in Verilog Wire Reg Used only as connectors, or left-hand side of “assign”, e.g. “assign w = a & b” Reg Implements combinatorial or sequential logic Used inside “always” blocks
Combinatorial vs. Sequential wire myWire; assign myWire = a | b; reg myReg; always @ (a or b) // also @ (a, b) myReg = a | b; N.B. always @ (a or b) begin if (a == 1 or b == 1) myReg = 1; else end // sequential reg myReg; always @ (posedge Clk) myReg <= a | b; N.B. a net should be assigned ONLY in a single block combinatorial: = sequential: <=
Two-Dimensional Input Ports module myModule (input [7:0] i [0:3], output [7:0] o [0:3]); module myModule (input [31:0] i, output [31:0] o); wire [7:0] myArray [0:3]; assign {myArray [3], myArray [2], myArray [1], myArray [0]} = i; assign o = {myArray [0], myArray [1], myArray [2], myArray [3]}
Useful References https://www.doulos.com/knowhow/verilog_designers_guide/ (good starting point into Verilog) https://inst.eecs.berkeley.edu/~cs150/Documents/Nets.pdf (net types in Verilog, wire vs. reg) http://www.asic-world.com/tidbits/blocking.html (blocking vs. non-blocking assignmets, see the example) http://web.mit.edu/6.111/www/f2007/handouts/L06.pdf (another reference for blocking vs. non-blocking assignments and finite-state-machine design; slides 1 to 7 and slides 11 to 15) http://www.asic-world.com/verilog/art_testbench_writing1.html (writing testbenches in Verilog) http://www.rfwireless-world.com/source-code/ (useful source code examples; jump to Verilog part) http://www.fpl2016.org/slides/Gupta%20--%20Accelerating%20Datacenter%20Workloads.pdf (HARP-related material) http://web.cs.ucla.edu/~haoyc/pdf/dac16.pdf (HARP-related paper) https://pdfs.semanticscholar.org/8b8f/8cb7885bc751fa919d216d96caf4a0234717.pdf (HARP-related paper)
Thank you!