Computer Architecture Lab at 1 FPGAs and Bluespec: Experiences and Practices Eric S. Chung, James C. Hoe {echung,

Slides:



Advertisements
Similar presentations
What are FPGA Power Management HDL Coding Techniques Xilinx Training.
Advertisements

Basic HDL Coding Techniques
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Spartan-3 FPGA HDL Coding Techniques
Combinational Logic.
Using emulation for RTL performance verification
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Assurance through Enhanced Design Methodology Orlando, FL 5 December 2012 Nirav Davé SRI International This effort is sponsored by the Defense Advanced.
Computer Architecture Lab at Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung,
The Design Process Outline Goal Reading Design Domain Design Flow
Computer Architecture Lab at Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi S IMFLEX /P ROTOFLEX.
Computer Architecture Lab at 1 P ROTO F LEX : FPGA-Accelerated Hybrid Functional Simulator Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi,
Configurable System-on-Chip: Xilinx EDK
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI System Design Lecture 4 - Advanced Verilog.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
02/10/06EECS150 Lab Lecture #41 Debugging EECS150 Spring 2006 – Lab Lecture #4 Philip Godoy Greg Gibeling.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
General FPGA Architecture Field Programmable Gate Array.
Constructive Computer Architecture Tutorial 4: SMIPS on FPGA Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T04-1.
Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
Main Memory CS448.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
Spring 2007 W. Rhett Davis with minor editing by J. Dean Brock UNCA ECE Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 1: Introduction.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE Presenter: Shu-yen Lin Advisor: Prof. An-Yeu Wu 2005/6/6.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
Slide 1 2. Verilog Elements. Slide 2 Why (V)HDL? (VHDL, Verilog etc.), Karen Parnell, Nick Mehta, “Programmable Logic Design Quick Start Handbook”, Xilinx.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
1 Hardware/Software Co-Design Final Project Emulation on Distributed Simulation Co-Verification System 陳少傑 教授 R 黃鼎鈞 R 尤建智 R 林語亭.
Design Flow: HW vs. SW Yilin Huang Overview Software: features and flexibility Hardware: performance Designs have different focuses.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Corflow Online Tutorial Eric Chung
6.375 Tutorial 3 Scheduling, Sce-Mi & FPGA Tools Ming Liu
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Problem: design complexity advances in a pace that far exceeds the pace in which verification technology advances. More accurately: (verification complexity)
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Stmt FSM Arvind (with the help of Nirav Dave)
Combining Simulators and FPGAs “An Out-of-Body Experience”
ProtoFlex Tutorial: Full-System MP Simulations Using FPGAs
Stmt FSM Arvind (with the help of Nirav Dave)
Constructive Computer Architecture: Guards
Win with HDL Slide 4 System Level Design
Introduction to Bluespec: A new methodology for designing Hardware
THE ECE 554 XILINX DESIGN PROCESS
THE ECE 554 XILINX DESIGN PROCESS
Xilinx Alliance Series
Presentation transcript:

Computer Architecture Lab at 1 FPGAs and Bluespec: Experiences and Practices Eric S. Chung, James C. Hoe {echung,

2 My learning experience w/ Bluespec This talk: –Share actual design experiences/pitfalls/problems/solutions –Suggestions for Bluespec

3 August 13, 2007Eric S. Chung / Bluespec Workshop 3 Why Bluespec? Our project –Multiprocessor UltraSPARC III architectural simulator using FPGAs –Run full-system SPARC apps (e.g., Solaris, OLTP) –Run-time instrumentation (e.g., CMP cache) 100x faster than SW CPU SPARC CPU Memory SPARC CPU The role of Bluespec –Retain flexibility & abstraction comparable to SW-based simulators –Reduce design & verification time for FPGAs Berkeley Emulation Engine (BEE2) 5 Vertex-II Pro 70 FPGAs

4 Completed design details Large multi-FPGA system built from scratch (4/07 – now): –16 independent CPU contexts in a 64-bit UltraSPARC III pipeline –Non-blocking caches and memory subsystem –Multiple clock domains within/across multiple FPGA chips –20k lines of Bluespec, pipeline runs up to 90 IPC = 1 L1 I 16-way interleaved SPARC pipeline L1 D FPGA 1FPGA 2 16-way CMP cache simulator Memory controllers Memory traces “Functional” trace generator

5 Summary of lessons learned Lesson #1:Your Bluespec FPGA toolbox: black or white? Lesson #2: Obsessive-Compulsive Synthesis Syndrome Lesson #3:I’m compiling as fast as I can, Captain! Lesson #4: Stress-free with Assertions Lesson #5:Look Ma! No Waveforms! Lesson #6:Have no fear, multi-clock is here Lesson #7:Guilt-free Verilog

6 L1: Your FPGA toolbox: Black or White? Two approaches to creating an FPGA Bluespec toolbox: –Black – was given to me and just works, no area/timing intuition –White – know exactly how many LUTs/FFs/BRAMs you’re getting A cautionary tale: –We initially used Standard Prelude prims extensively (e.g., FIFO) Example 1 64-bit 16-entry FIFO from Bluespec Standard Prelude Xilinx XST synthesis report: 1069 flip-flops 623 LUTs Example 2 Same module redone using Xilinx distributed RAMs Xilinx XST synthesis report: 21 flip-flops 163 LUTs

7 L2: Obsessive-Compulsive Synthesis Syndrome (OCSS) Don’t wait until the end to synthesize your Bluespec! –High-level abstraction makes it almost too easy to “program” HW –Not easy to determine area/timing overheads after 20K lines module mkFooBaz( FooBaz#(idx_t, data_t) ) provisos( Bits#(idx_t, idx_nt), Bits#(data_t, data_nt) ); Vector#( idx_nt, Reg#(Bit#(data_nt)) ) array <- replicateM( mkReg(?) ); method Action write( idx_t idx, data_t din ); array[pack(idx)] <= pack(din); endmethod method data_t read( idx_t idx ); return unpack( array[pack(idx)] ); endmethod endmodule This is an array of N FF-based registers w/ an N-to-1 mux at read port. Is it obvious? Quick tip (OCSS is good for you) Make it effortless to go from *.bsv file  synthesis report $> make mkClippy Clippy.bsv $> compiling./Clippy.bsv … $> Total number of 4-input LUTs used: 500,000

8 L3: I’m compiling as fast as I can, captain! Problem: big designs w/ lots of rules take forever to compile –E.g., compiling our SPARC design takes 30m on 2.93GHz Core 2 Duo Workarounds: –Incremental module compilation w/ (*synthesis*) pragmas  very effective but forgoes passing interfaces into a module –Lower scheduler’s effort & improve your rule/method predicates Feedback for Bluespec a) “-prof” flag that gives timing feedback & suggests optimizations b) more documentation on what each compile stage does c) “-j 2” parallel compilation?

9 L4: Stress-free with Assertions Assert and OVLAssert libraries (USE THEM) –Our SPARC design has over 300 static + dynamic assertions –Caught > 50% design bugs in simulation Key difference from Verilog assertions: –Assertion test expressions automatically include rule predicates –Test expressions look VERY clean Suggestions –Synthesizable assertions for run-time debugging –Assertions at rule-level? (e.g., if R1, R2 fire, then R3 eventually must fire)

10 L5: Look Ma! No Waveforms! Interesting consequence of atomic rule-based semantics: –$display() statements easily associated with atomic rule actions –Majority of our debugging was done with traces only –Very similar to SW debugging Suggestions –Support trace-based debugging more explicitly (gdb for Bluespec?) –Controlled verbosity/severity of $display statements –Context-sensitive $display

11 L6: Have no fear, Multi-clock is here Multiple clock domains show up in large designs –Sometimes start at freq < normal clock to speed up place & route –But synchronization is generally tricky Bluespec Clocks library to the rescue –Contains many clock crossing primitives –Most importantly, compiler statically catches illegal clock crossings –TAKE advantage of this feature (Anecdote) our system has 4 clock domains over 2 FPGAs –With Bluespec, had no synchronization problems on FIRST try

12 L7: Guilt-free Verilog Sometimes talking to Verilog is unavoidable –Systems rarely come in a single HDL –Learn how to import Verilog into Bluespec (import “BVI”) –Understand what methods are and how they map to wires Sometimes you feel like writing Verilog (and that’s okay!) –Synthesis tools can be fickle –Some behaviors better suited to synchronous FSMs (e.g., synchronous hand-shake to DDR2 controller) –Solutions: write sequential FSM within 1 giant Bluespec rule OR write it in Verilog and wrap it into a Bluespec interface

13 Example: “Verilog-style” Bluespec Wire#(Bool) en_clippy <- mkBypassWire(); rule clippy( True ); State_t nstate = Idle; case( state ) Idle: nstate = En_clippy; En_clippy: nstate = Idle; default: dynamicAssert(False,…); endcase if( state == En_clippy ) en_clippy <= True; endrule

14 Conclusion Big thanks to Bluespec Your feedback/comments are welcome! Learn more about our FPGA emulation efforts: