6.375 Tutorial 3 Scheduling, Sce-Mi & FPGA Tools Ming Liu

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Basic HDL Coding Techniques
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Simulation executable (simv)
Spartan-3 FPGA HDL Coding Techniques
Final Presentation Part-A
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
Overview Logistics Last lecture Today HW5 due today
Constructive Computer Architecture Tutorial 4: SMIPS on FPGA Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T04-1.
FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR HDL coding n Synthesis vs. simulation semantics n Syntax-directed translation n.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
© 2003 Xilinx, Inc. All Rights Reserved FPGA Design Techniques.
Computer Architecture: A Constructive Approach Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
Spring Introduction  Today’s tutorial focuses on introducing you to Xilinx ISE and Modelsim.  These tools are used for Verilog Coding Simulation.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
March 4, 2009L13-1http://csg.csail.mit.edu/6.375 Multiple Clock Domains Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
Multiple Clock Domains (MCD) Continued … Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology November.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
Timing and Constraints “The software is the lens through which the user views the FPGA.” -Bill Carter.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Constructive Computer Architecture Tutorial 2 Advanced BSV Sizhuo Zhang TA Oct 9, 2015T02-1http://csg.csail.mit.edu/6.175.
Constructive Computer Architecture Tutorial 4 Debugging Sizhuo Zhang TA Oct 23, 2015T04-1http://csg.csail.mit.edu/6.175.
Constructive Computer Architecture Tutorial 2 Advanced BSV Andy Wright TA September 26, 2014http://csg.csail.mit.edu/6.175T02-1.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Constructive Computer Architecture Tutorial 4: Running and Debugging SMIPS Andy Wright TA October 10, 2014http://csg.csail.mit.edu/6.175T04-1.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
03/30/031 ECE Digital System Design & Synthesis Lecture Design Partitioning for Synthesis Strategies  Partition for design reuse  Keep related.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Introduction to the FPGA and Labs
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits: Constructive Computer Architecture
ECE 4110–5110 Digital System Design
FPGAs in AWS and First Use Cases, Kees Vissers
Design Flow System Level
Constructive Computer Architecture: Lab Comments & RISCV
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
BSV objects and a tour of known BSV problems
6.375 Tutorial 2 Guards and Scheduling Ming Liu
Shenghsun Cho, Mrunal Patel, Han Chen, Michael Ferdman, Peter Milder
Constructive Computer Architecture Tutorial 3 RISC-V Processor
FPGA Tools Course Answers
Sequential Circuits - 2 Constructive Computer Architecture Arvind
6.375 Tutorial 5 RISC-V 6-Stage with Caches Ming Liu
Introduction to Bluespec: A new methodology for designing Hardware
Test Fixture (Testbench)
Constructive Computer Architecture: Guards
Introduction to Bluespec: A new methodology for designing Hardware
THE ECE 554 XILINX DESIGN PROCESS
NetFPGA - an open network development platform
THE ECE 554 XILINX DESIGN PROCESS
Constructive Computer Architecture Tutorial 1 Andy Wright 6.S195 TA
Presentation transcript:

6.375 Tutorial 3 Scheduling, Sce-Mi & FPGA Tools Ming Liu Feb 26, 2016 http://csg.csail.mit.edu/6.375

Overview Scheduling Example Synthesis Boundaries Sce-Mi FPGA Architecture/Tools Timing Analysis Feb 26, 2016 http://csg.csail.mit.edu/6.375

Scheduling Recap 1. Intra-rule: What makes a legal rule? No double write No combinational cycles 2. Inter-rule: When can rules fire in the same cycle? Parallel execution of rules appears as if rules executed in sequential one-rule-at-a-time order Feb 26, 2016 http://csg.csail.mit.edu/6.375

Packet Switching UCLA USC Switch MIT Harvard Demo Show code on csgml. Which rules can fire together? Compile and show schedule Demo Feb 26, 2016 http://csg.csail.mit.edu/6.375

Synthesis Boundary By default BSV compiles your design into a single flat Verilog module All methods/rules in-lined Use synthesis pragma to generate separate modules (.v files) (*synthesize*) module mkMultiplier (Multiplier); Feb 26, 2016 http://csg.csail.mit.edu/6.375

Synthesis Boundary Benefits: Drawbacks: Faster compile times. Modules are reused Modularization at the circuit level Better for synthesis and reporting Drawbacks: Polymorphic modules not supported More conservative guard lifting Feb 26, 2016 http://csg.csail.mit.edu/6.375

Synthesis Boundary Handling Polymorphism Verilog does not support polymorphic interfaces Wrap polymorphic modules with specific instantiations FFT#(N, ComplexData) fft <- mkFFT() (* synthesize *) module mkSynthesizableFFT(FFT#(N, ComplexData)); FFT#(N, ComplexData) fft <- mkFFT(); return fft; endmodule Feb 26, 2016 http://csg.csail.mit.edu/6.375

Synthesis Boundary Guard Logic (Example) Synthesis boundaries simplifies guard logic Lifted guard without synthesis boundary: (x && aQ.notFull) || (!x && bQ.notFull) Lifted guard with synthesis boundary: aQ.notFull && bQ.notFull method Action doAction( Bool x ); if( x ) begin aQ.enq(a); end else begin bQ.enq(b); end endmethod Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi Standard Co-Emulation Modeling Interface A software/hardware communication framework Host Processor FPGA PCIe Endpoint SceMi Xactors SceMiLayer DUT (AP) User SW SceMi Library PCIe Driver PCIe Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi Abstraction Ports and proxies form a FIFO abstraction over a link TCP (simulation) or PCIe (HW) Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi Example: Calculator ServerXactor Testbench Calculator (DUT) Link Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Hardware Calculator DUT typedef Bit#(32) Value; typedef enum { ADD, SUB, MUL} Operation deriving (Bits, Eq); typedef union tagged { void Clear; struct { Operation op; Value val; } Operate; } Command deriving (Bits, Eq); typedef Server#(Command, Value) Calculator; module mkCalculator (Calculator); … endmodule Server interface: Put commands Get results Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Hardware SceMiLayer (* synthesize *) module [Module] mkDutWrapper (Calculator); Calculator calc <- mkCalculator(); return calc; endmodule module [SceMiModule] mkSceMiLayer(); SceMiClockConfiguration conf = defaultValue; SceMiClockPortIfc clk_port <- mkSceMiClockPort(conf); Calculator dut <- buildDut(mkDutWrapper, clk_port); Empty calcxactor <- mkServerXactor(dut, clk_port); Empty shutdown <- mkShutdownXactor(); Wrap DUT in synthesis boundary Instantiate the DUT Create a Server transactor on the Calculator dut interface Note: DutWrapper in Lab 4 is slightly more complex (discussed later) Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Software Auto-generated classes based on your BSV types #include "bsv_scemi.h“ #include "SceMiHeaders.h“ int main(int argc, char* argv[]){ //Initialize scemi … InportProxyT<Command> inport ("", "scemi_calcxactor_req_inport", sceMi); // Initialize the SceMi outport OutportQueueT<Value> outport ("", "scemi_calcxactor_resp_outport", sceMi); Auto-generated classes based on your BSV types SceMi auto-generates the Command class Name of port is found in “scemi.params” file Type of port Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Software Sending/Receiving Command cmd; cmd.the_tag = Command::tag_Operate; cmd.m_Operate.m_val = 100; cmd.m_Operate.m_op = Operation(Operation::e_ADD) inport.sendMessage(cmd); std::cout << outport.getMessage() << std::endl;; Construct the cmd Send the cmd to hardware (blocking) Get results! Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Transactors Many transactor available. See documentation Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Build Process SceMi has its own build tool: build Lab 4: project.bld Always simulate first: Wrap DUT in SceMiLayer in hardware Run build: “build –v” This creates scemi.params and C++ headers Write/update your software C++ code Re-run “build –v” Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Simulation SceMi simulation is done over TCP sockets “build –v” will generate two binaries bsim_dut: simulated hardware tb: the host software Run both binaries. They will communicate with each other. Feb 26, 2016 http://csg.csail.mit.edu/6.375

Sce-Mi: Building Hardware Run Build: vivado_setup build –v” Runs synthesis tool (for a long time) to create FPGA bitfile Compiles software tb Log into a server with FPGA attached programfpga <path_to_bitfile> Run software runtb ./tb 2.0 Show calculator SceMi simulation demo Demo Feb 26, 2016 http://csg.csail.mit.edu/6.375

Clocks and Resets in BSV All BSV modules have implicit clock and reset interfaces Compiled Verilog files will have these ports Modules can also have additional clocks/resets as parameters Can specify clocks/reset explicitly module mkDut(DutIfc); //One default clk/rst //Two clks, one default rst module mkDut2Clk#(Clock clk_usr)(DutIfc); DutIfc dut <- mkDut(clocked_by clk, reset_by rst); DutIfc dut <- mkDut2Clk(clk_usr, clocked_by clk, reset_by rst); Feb 26, 2016 http://csg.csail.mit.edu/6.375

SceMi Clocks SceMi transactors run at 50MHz But our DUT runs faster in a different clock domain Thus need clock domain crossing primitives at the DUT interface Synchronization FIFOs/Registers Bluespec provides these //Scemi clock to user clock (dut) sync fifo SyncFIFOIfc#(Sample) toApSyncQ <- mkSyncFIFOFromCC(2, clk_usr); //User clock to Scemi clock sync fifo SyncFIFOIfc#(Sample) fromApSyncQ <- mkSyncFIFOToCC(2, clk_usr, rst_usr); Feb 26, 2016 http://csg.csail.mit.edu/6.375

Running Design on FPGA Feb 26, 2016 http://csg.csail.mit.edu/6.375

FPGA Hardware Setup PCIe Xilinx VC707 (Virtex 7) Development Board Feb 26, 2016 http://csg.csail.mit.edu/6.375

FPGA Architecture Brief CLB: configurable logic block Routing switches BRAMs (~1-8MB total) DSP IOB: I/O buffers Hard IPs: PCIe, DRAM Clocking network Feb 26, 2016 http://csg.csail.mit.edu/6.375

Tool Flow for FPGA SceMi build file invokes all tools BSV Bluespec Compiler Xilinx Vivado C++/SceMi Verilog Synthesis Constraints (.xdc) Map g++ Place and Route Software TB Binary Bit File Feb 26, 2016 http://csg.csail.mit.edu/6.375

Reading Timing Reports Focus on Max Delay Path (setup) Slack: Timing margin Location: Physical location of the slice Delay Type: which resource in the slice the path goes through FDRE: register, LUT: look up table, CARRY4: carry chain Net: routing delay Incr (ns): Additional delay added Path (ns): Total delay Show failed 75MHz design on lightning; Open Vivado ~/6.375_test/mingliu/audio/scemi/fpga/xilinx_1_75mhz_fail/mkBridge/mkBridge.runs/impl_1 Demo Feb 26, 2016 http://csg.csail.mit.edu/6.375

Debugging Timing Errors Search for “VIOLATED” in the timing report Look at which module the error is from Source and destination What logic is generally on the path Remember that you are looking at one particular path – when there are timing failures on one path, there are typically many paths failing in the same way TNS Failing Endpoints tells you exactly how many Resolve long combinational paths by pipelining/folding your design Feb 26, 2016 http://csg.csail.mit.edu/6.375

Examples of Long Paths Multipliers/adders generally expensive Can be cheap for constants Proportional to width rule doMult; outR <= a * b; endrule Reg#(Bit#(32)) a <- mkReg(0); //value set dynamically Reg#(Bit#(32)) b <- mkReg(0); //value set dynamically method setInputs (Bit#(32) a, Bit#(32) b) … Reg#(Bit#(4)) a <- mkReg(0); Reg#(Bit#(4)) b <- mkReg(0); Bit#(32) a = 2; Reg#(Bit#(32)) b <- mkReg(0); Feb 26, 2016 http://csg.csail.mit.edu/6.375

Examples of Long Paths Dynamic selection creates muxes Vector#(N, Reg#(Bit#(32)) outVectR <- replicateM(mkReg(0)) rule doSel; outR[sel] <= .. endrule Reg#(Bit#(TLog#(N))) sel <- mkReg(0); //dynamically set Dynamic selection creates muxes Many input muxes creates long comb paths Feb 26, 2016 http://csg.csail.mit.edu/6.375

Examples of Long Paths Likely OK rule doPipe; for (Integer sel=0; sel< n; sel=sel+1) begin outFIFO[sel].enq( inpFIFO[sel].first ); inpFIFO[sel].deq; end endrule Likely OK for (Integer sel=0; sel< n; sel=sel+1) begin rule doPipe; outOneFIFO.enq( inpFIFO[sel].first ); inpFIFO[sel].deq; endrule end Many conflicting rules -> large mux Feb 26, 2016 http://csg.csail.mit.edu/6.375

A Few Notes Please don’t program the FPGA until design meets timing Send me an email if having issues with FPGA Servers with FPGAs attached: bdbm[12-14].csail.mit.edu Do not use these for long compiles Additional servers for compiling bdbm[10-11].csail.mit.edu Sharing Use “w” and “top” to check who is using machine Only one person can use the FPGA at a time Change TCP port in project.bld in sim Start early! Synthesis takes >30 minutes. Feb 26, 2016 http://csg.csail.mit.edu/6.375

Lab 4 Code Feb 26, 2016 http://csg.csail.mit.edu/6.375