BSV 1 for ESL 2 Design Rishiyur S. Nikhil, PhD. CTO, Bluespec, Inc. © Bluespec, Inc., 2008 Lecture in 6.375, MIT March 10, 2008 1: Bluespec SystemVerilog.

Slides:



Advertisements
Similar presentations
SOC Design: From System to Transistor
Advertisements

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Using emulation for RTL performance verification
March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
Computer Architecture Lab at Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi S IMFLEX /P ROTOFLEX.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
Describing Syntax and Semantics
Guide To UNIX Using Linux Third Edition
C++ fundamentals.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)
ESL: System Level Design Bluespec ESEPro: ESL Synthesis Extenstions for SystemC Rishiyur S. Nikhil CTO, Bluespec, Inc. ( Lecture.
On “Control Adaptivity”: how to sell the idea of Atomic Transactions to the HW design community Rishiyur Nikhil Bluespec, Inc. WG 2.8 meeting, June 16,
1 Chapter 2. The System-on-a-Chip Design Process Canonical SoC Design System design flow The Specification Problem System design.
CS 355 – Programming Languages
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
Extreme Makeover for EDA Industry
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
SystemC and Levels of System Abstraction: Part I.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
Copyright © Bluespec Inc Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface.
February 14, 2007L04-1http://csg.csail.mit.edu/6.375/ Bluespec-1: Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
The Macro Design Process The Issues 1. Overview of IP Design 2. Key Features 3. Planning and Specification 4. Macro Design and Verification 5. Soft Macro.
Execution of an instruction
March, 2007Intro-1http://csg.csail.mit.edu/arvind Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial Intelligence Lab.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
EE121 John Wakerly Lecture #17
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Problem: design complexity advances in a pace that far exceeds the pace in which verification technology advances. More accurately: (verification complexity)
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
System-on-Chip Design
Sequential Circuits: Constructive Computer Architecture
CS/COE0447 Computer Organization & Assembly Language
Design Flow System Level
Combinational Circuits in Bluespec
Stmt FSM Arvind (with the help of Nirav Dave)
Combinational Circuits in Bluespec
Introduction to cosynthesis Rabi Mahapatra CSCE617
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Introduction to Bluespec: A new methodology for designing Hardware
Stmt FSM Arvind (with the help of Nirav Dave)
Combinational Circuits in Bluespec
Constructive Computer Architecture: Guards
Design methods to facilitate rapid growth of SoCs
Introduction to Bluespec: A new methodology for designing Hardware
Computer Operation 6/22/2019.
Presentation transcript:

BSV 1 for ESL 2 Design Rishiyur S. Nikhil, PhD. CTO, Bluespec, Inc. © Bluespec, Inc., 2008 Lecture in 6.375, MIT March 10, : Bluespec SystemVerilog 2: Electronic System Level

2 SoCs: complex assemblies of many application—specific blocks (IPs) On-chip memory banks Structured on- chip networks General- purpose processors Many IPs must be implemented in HW to save power. And, at many power/performance operating points Application- specific processing units

3 need to explore architectures (for speed, area. power) need to refine Central SoC design concerns HW/SW interface (e.g., mem-mapped register read/write) avail. early, fast sim, but not accurate Early software First HW models SW development needs platforms early SW development needs fast platforms Correctness HW Implementation Software HW-accurate, but avail. late, slow sim

4 Central SoC design concerns (contd.): reuse models, implementations, testbenches SoC 1 Across chips and in derivative chips SoC 2SoC n During refinement and exploration

5 The only feasible approach Rapid prototyping for wireless designs: the five-ones approach, M.Rupp, A.Burg and E.Beck, Signal Processing 83:7, 2003, pp “One code, one environment, one team, one documentation, one revision control” I.e., need a single, consistent language and environment for the whole ESL design process But even this is not enough

6 BSV addresses SoC design concerns HW/SW interface (e.g., mem-mapped register read/write) Early software architecture variation refinement First HW models HW Implementation Software One language/ semantics Atomicity: correctness Strong parameterization, higher-order descriptions: reuse Atomicity: composability, control-adaptive reuse Synthesizability even for early models (faster sim, FPGA accel) C, SystemC integration

7 BSV enables a top-down refinement methodology High-level Transactional I/F Design IPVerification IPSoftware High-level Transactional I/F High-level Transactional I/F Final Bus/ Interconnect I/F void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); cout << "data is: " << hex << data[0] << data[1] << data[2] << data[3] << endl; Tcl_Obj* cmd = Tcl_NewStringObj( "reportResults ", -1 ); for( int i=0; i<4; i++ ) void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); cout << "data is: " << hex << data[0] << data[1] << data[2] << data[3] << endl; Tcl_Obj* cmd = Tcl_NewStringObj( "reportResults ", -1 ); for( int i=0; i<4; i++ ) void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); cout << "data is: " << hex << data[0] << data[1] << data[2] << data[3] << endl; Tcl_Obj* cmd = Tcl_NewStringObj( "reportResults ", -1 ); for( int i=0; i<4; i++ ) Add Functionality Add Architectural Detail Initial Testbench (approx) Final Testbench Explore Architectures Final IP Initial IP (approx) Add Functionality Refine/ Replace Tools FPGA/Emulation (>>100X speed) Bluesim (10X speed) RTL simulation (1X speed) Bluespec Synthesis or RTL synthesis Power Estimation Execution at every step Verify/ Analyze/ Debug Verilog

8 Topics today Early models: executable, synthesizable FSMs Interface abstraction Parameterizable microarchitectures C/SystemC integration

9 Early models (executable, synthesizable)

10 Rules allow early models to be like executable and synthesizable specs: e.g., an ISS (instruction set simulator) rule add(instruction_D.first.op == Add && cpuState == Execute); Instruction add = instruction_D.first; // Get the register values involved let a = regfile.sub(add.ra); let b = regfile.sub(add.rb); // Perform the addition let d = add32(a, b); // Writeback the results update_regfile(add.rd, d); // Calculate the overflow bit Bool overflow = (((a[31] == 1) && (b[31] == 1) && (d[31] == 0)) || ((a[31] == 0) && (b[31] == 0) && (d[31] == 1))); // Update Flags update_condition0_ovl(add.rc, d, overflow); update_carry_overflow(add.oe, overflow, False, 1'b0); increment_pc; endrule BSV code: Close to spec No extraneous logic for managing shared resources Scalable – incrementally add functionality Excerpt from Assembler Manual

11 FPGA acceleration Since all of BSV is synthesizable, even such early, functional models can be synthesized into RTL and run on FPGA platforms/emulators Not optimized, but still faster than simulation Typical uses: Architectural exploration Fast emulation for earlier verification Early software development

12 FSMs

13 The StmtFSM package BSV’s rules are of course powerful enough to express any FSM. A rule’s condition identifies a state A rule’s Action specifies the state transition and any other actions for that state However, there are some common idioms (“design patterns”) for FSMs which are made simpler with syntactic support

14 Example Initialize a memory with a 2-D pattern An FSM is needed since the memory can only accept 1 write per cycle typedef enum { Pre, Loop, Post, Done} Phases deriving (Bits, Eq); Reg#(Phases) p <- mkReg (Pre); Reg#(int) i <- mkRegU; Reg#(int) j <- mkRegU; Reg#(Addr) addr <- mkRegU; rule prelude (p == Pre);... pre initialization actions... p <= Loop; i <= 0; j <= 0; addr <= addr0; endrule rule loop (p == Loop && (i < nI) && (j < nJ)); mem.write (addr, f (i, j)); addr <= addr + 1; if (j < nJ-1) j <= j + 1; else begin j <= 0; if (i < nI-1) i <= i + 1; else p <= Post; end endrule rule postlude (p == Post);... post-initialization actions... p <= Done; endrule f(i,j) nJ nI i jaddr0 Messy “bureaucracy” to sequence the FSM

15 Example Initialize a memory with a 2-D pattern (must be written as an FSM since only memory can only accept 1 write per cycle) Stmt s = seq action... pre initialization actions... addr <= addr0; endaction for (i <= 0; i < nI; i <= i + 1) for (j <= 0; j < nJ; j <= j + 1) action mem.write (addr, f (i, j)); addr <= addr + 1; endaction;... post-initialization actions... endseq; FSM fsm <= mkFSM (s); rule... fsm.start(); endrule import StmtFSM::* Reg#(int) i <- mkRegU; Reg#(int) j <- mkRegU; Reg#(Addr) addr <- mkRegU; f(i,j) nJ nI i jaddr0

16 StmtFSM sublanguage The Stmt sublanguage is signalled by ‘seq- endseq’ or ‘par-endpar’ brackets Representing expressions of type ‘Stmt’ A Stmt is a specification of an FSM Sublanguage constructs (cf. Ref Guide C.5): Primitives: any Action (including composite actions) seq s1...sN endseq: sequential composition par s1...sN endpar: parallel composition  All sJ’s initiated in parallel; completes when all have completed if-then, if-then-else for-, while-, repeat(n)- loops, with optional break, continue

17 Using and composing FSMs Various modules take Stmt specs as parameters and generate the specified FSMs FSMs have a standard BSV interface: Note: fsm methods can be used from other FSMs fsm.done() can be used in rule conditions fsm.waitTillDone() can be used in a rule body to block the rule (implicit condition is same as fsm.done()) interface FSM; method Action start (); method Bool done (); method Action waitTillDone (); endinterface: FSM module mkFSM #( Stmt s ) ( FSM ); FSM interface mkFSM module

18 StmtFSM semantics, implementation Integrated cleanly into the language mkFSM takes the Stmt spec and generates the corresponding Rules—the Actions in the spec become rule bodies Compiler manages all state book-keeping  Automatic generation of state variables, state registers, state update Standard rule semantics for FSM actions (e.g., shared counters, queues)  Blocks on implicit conditions, if necessary  Atomic (arbitrated in usual way w.r.t. other actions) Fully synthesizable

19 Modules with FSM interfaces mkFSM#(Stmt s) (FSM) mkFSMWithPred#(Stmt s, Bool b) (FSM) Global ‘disable/suspend’ control on the FSM mkAutoFSM#(Stmt s) (Empty) Build the FSM, plus boilerplate that runs it exactly once and quits See Ref Guide C.5 for more discussion You can also write your own module with an FSM interface See “FSMs from lists”, generating an FSM from a list of (current-state,action,next-state) triples, in Small Examples Suite, Example 12g

20 Stmt specfsm = seq write( 15, 51 ); read( 15 ) ; ack ; write( 16, 61 ) ; write( 17, 71 ) ; // a memory operation and // an acknowledge can occur // simultaneously action read( 16 ) ; ack ; endaction action read( 17 ) ; ack ; endaction ack ; endseq ; FSM testfsm <- mkFSM (specfsm); StmtFSM - example rule run ( True ); testfsm.start ; endrule rule done (testfsm.done); $finish(0); endrule seq / endseq define a sequence (multi-cycle) action / endaction define a simultaneous block (in one cycle) par / endpar defines a parallel block (multi-cycle)

21 Stmt specfsm = seq write( 15, 51 ); read( 15 ) ; ack ; write( 16, 61 ) ; write( 17, 71 ) ; // a memory operation and // an acknowledge can occur // simultaneously action read( 16 ) ; ack ; endaction action read( 17 ) ; ack ; endaction ack ; endseq ; mkAutoFSM (specfsm); StmtFSM - example Same FSM, using mkAutoFSM

22 State Machine Generation Example // Specify an FSM generating a test seqence Stmt test_seq = seq for (i <= 0; i < nI; i <= i + 1) // each source for (j <= 0; j < nJ; j <= j + 1) begin // each destination let pkt <- gen_packet (); send_packet (i, j, pkt); // test i-j path in isolation end // then, test arbitration by sending packets simultaneously to same dest action send_packet (0, 1, pkt0); // to dest 1 send_packet (1, 1, pkt1); // to dest 1 (so, collision) endaction endseq mkAutoFSM (fsm); // Generate the FSM and code to run it automatically

23 On efficiency of generated FSMs In general, compositional implementation of FSMs will not be as efficient as an implementation with: State-encoding optimizations based on global knowledge  E.g., use single wide register for (i,j)  E.g., one-hot encodings State-encoding optimizations based on domain knowledge  E.g., state can be inferred from other data registers Nevertheless, BSV’s StmtFSM package is useful most of the time Rarely fall back to express FSMs directly as rules Because FSM is a standard BSV interface, you can also write your own FSM generators  See “FSMs from lists”, Small Examples Suite, Example 12g

24 FSM summary Stmt sublanguage captures certain common and useful FSM idioms: sequencing, parallel, conditional, iteration FSM modules automatically implement Stmt specs FSM interface permits composition of FSMs Most importantly, same Rule semantics Actions in FSMs are atomic Actions automatically block on method implicit conditions Parallel actions, whether in the same FSM or different FSMs, automatically arbitrated based on atomicity semantics

25 Interface abstraction

26 Transactional interfaces: Get/Put Provides simple handshaking mechanism for getting data from a module or putting data into it Easy to connect together interface Get#(type t); // polymorphic method ActionValue#(t) get(); endinterface: Get interface Put#(type t); method Action put(t x); endinterface: Put get data ready enable put data enable ready

27 data ready enable data enableready Get and Put Interfaces Example: interface provided by a cache (c) to a processor (p) interface CacheIfc; interface Put#(Req_t) p2c_request; interface Get#(Resp_t) c2p_ response; … endinterface module mkCache (CacheIfc); FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO; … rules expressing cache logic … interface p2c_request; method Action put (Req_t req); p2c.enq (req); endmethod endinterface interface c2p_response; method ActionValue#(Resp_t) get (); let resp = c2p.first; c2p.deq; return resp; endmethod; endinterface endmodule get put mkCache

28 data ready enable data enableready Get and Put Interfaces Example: interface provided by a cache (c) to a processor (p) interface CacheIfc; interface Put#(Req_t) p2c_request; interface Get#(Resp_t) c2p_ response; … endinterface module mkCache (CacheIfc); FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO; … rules expressing cache logic … interface p2c_request; method Action put (Req_t req); p2c.enq (req); endmethod endinterface interface c2p_response; method ActionValue#(Resp_t) get (); let resp = c2p.first; c2p.deq; return resp; endmethod; endinterface endmodule get put mkCache function Put#(Req_t) fifoToPut(FIFO#(Req_t) fifo); return ( interface Put; method Action put (a); fifo.enq (a); endmethod endinterface); endfunction

29 data ready enable data enableready Get and Put Interfaces Example: interface provided by a cache (c) to a processor (p) interface CacheIfc; interface Put#(Req_t) p2c_request; interface Get#(Resp_t) c2p_ response; … endinterface module mkCache (CacheIfc); FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO; … rules expressing cache logic … interface p2c_request; method Action put (Req_t req); p2c.enq (req); endmethod endinterface interface c2p_response; method ActionValue#(Resp_t) get (); let resp = c2p.first; c2p.deq; return resp; endmethod; endinterface endmodule get put mkCache function Get#(Resp_t) fifoToGet(FIFO#(Resp_t) fifo); return ( interface Get; method ActionValue#(Resp_t) get (); let a = fifo.first; fifo.deq; return a; endmethod; endinterface); endfunction

30 data ready enable data enableready “Interface transformers”: Converting FIFOs to Get/Put The BSV library provides functions to convert FIFOs to Get/Put interface CacheIfc; interface Put#(Req_t) p2c_request; interface Get#(Resp_t) c2p_ response; … endinterface module mkCache (CacheIfc); FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO; … rules expressing cache logic … interface p2c_request = fifoToPut (p2c); interface c2p_response = fifoToGet (c2p); … endmodule get put mkCache Absolutely no difference in the HW!

31 client Client/Server interfaces Get/Put pairs are very common, and duals of each other, so the library defines Client/Server interface types for this purpose interface Client #(req_t, resp_t); interface Get#(req_t) request; interface Put#(resp_t) response; endinterface interface Server #(req_t, resp_t); interface Put#(req_t) request; interface Get#(resp_t) response; endinterface data ready enable data enable ready get server data ready enable data enable ready get put req_tresp_t

32 Client/Server interfaces interface CacheIfc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm; endinterface module mkCache (CacheIfc); FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO; FIFO#(Req_t) c2m <- mkFIFO; FIFO#(Resp_t) m2c <- mkFIFO; … rules expressing cache logic … interface Server ipc interface Put put(a); method Action put(a); p2c.enq(a); endmethod endinterface interface Get get; method ActionValue#(Resp_t) get; c2p.deq; return c2p.first; endmethod endinterface endinterface // Server mkCache get put client get put server client get put mkMem mkProcessor server get put

33 Client/Server interfaces and more interface transformers interface CacheIfc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm; endinterface module mkCache (CacheIfc); // from / to processor FIFO#(Req_t) p2c <- mkFIFO; FIFO#(Resp_t) c2p <- mkFIFO; // to / from memory FIFO#(Req_t) c2m <- mkFIFO; FIFO#(Resp_t) m2c <- mkFIFO; … rules expressing cache logic … interface ipc = fifosToServer (p2c, c2p); interface icm = fifosToClient (c2m, m2c); endmodule mkCache get put client get put server client get put mkMem mkProcessor server get put

34 rule connect Connecting Get and Put A module m1 providing a Get interface can be connected to a module m2 providing a Put interface with a simple rule module mkTop (…) Get#(int) m1 <- mkM1; Put#(int) m2 <- mkM2; rule connect; let x <- m1.get(); m2.put (x); // note implicit conditions endrule endmodule get data ready enable put data enable ready

35 Abstracting connections: Connectables There are many pairs of types that are duals of each other Get/Put, Client/Server, YourTypeT1/YourTypeT2, … The BSV library defines an overloaded module mkConnection which encapsulates the connections between such duals The BSV library predefines the implementation of mkConnection for Get/Put, Client/Server, etc. Because overloading in BSV is extensible, you can overload mkConnection to work on your own interface types T1 and T2 The BSV library also recursively defines mkConnection for tuples and vectors of interfaces that are already individually connectable (including your own interface types)

36 mkConnection Using these interface facilities, assembling systems becomes very easy interface CacheIfc; interface Server#(Req_t, Resp_t) ipc; interface Client#(Req_t, Resp_t) icm; endinterface module mkTopLevel (…) // instantiate subsystems Client #(Req_t, Resp_t) p <- mkProcessor; Cache_Ifc #(Req_t, Resp_t) c <- mkCache; Server #(Req_t, Resp_t) m <- mkMem; // instantiate connects mkConnection (p, c.ipc); // Server connection mkConnection (c.icm, m); // Client connection endmodule mkCache get put client get put server client get put mkMem mkProcessor server get put

37 Summary of interface abstraction BSV allows writing very high level transactional interfaces that remain synthesizable and efficient We can very quickly, succinctly and correctly define very complex interfaces, and connect them, using interface abstraction, and BSV library elements Get/Put, Client/Server, Connectables, etc.

38 Parameterizable microarchitecture

39 Example: a Transmitter ControllerScramblerEncoderInterleaverMapper IFFT Cyclic Extend headers data IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for 85% area 24 Uncoded bits One OFDM symbol (64 Complex Numbers) Must produce one OFDM symbol every 4 sec Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

40 Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute_1Permute_2Permute_3 All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power,... * * * * *j t2t2 t0t0 t3t3 t1t1 stage_f function

41 Superfolded circular pipeline: Just one Bfly-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Bfly4 Permute_1 Permute_2 Permute_3 Stage Counter 0 to 2 Index Counter 0 to 15 64, 4-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Even the Permute blocks can be unified through parameterization, and reused

42 Different microarchitectural choices  different power/area/speed operating points Higher-order descriptions in BSV Can write functions that compute with any types: modules, rules, actions, interfaces Parameterization in BSV Parameters can be of any type Can conditionally instantiate modules/ state elements Thus: can write a single source which, depending on parameters, “unfolds” to different microarchitectures: degree of “data parallelism”, degree of pipelining, degree of resource sharing, etc. Power/area/speed tradeoff not easy to predict without actual synthesis of different microarchitectures, because of complex interaction with optimization E.g., see MIT’s experiences with xx, OFDM, H.264

43 C and SystemC integration

44 Encapsulating existing C code Well-defined mechanism to incorporate existing C code (for both Bluesim and Verilog sim) “import BDPI” Typical uses: Initial rough models of IP behavior, to eventually be replaced by BSV code Initial reuse of reference codes for standard algorithms (wireless, audio/video, image processing, security,...) Testbenches (may never be replaced by BSV code)  Generate input/output vectors  Generate random numbers  Use C to do complex arithmetic Instrumentation: write log files of model activity

45 The import-BDPI statement Importing C function “add32” as BSV function with the same name: import “BDPI” function Bit#(32) add32 (Bit#(32) v1, Bit#(32) vs); Importing C function “add32” as BSV function called “add”: import “BDPI” add32 = function Bit#(32) add (Bit#(32) v1, Bit#(32) vs); Imported C functions can also have Action and ActionValue#() return types

46 Argument/result Types BSV TypeC Type Stringchar* Bit#(0) – Bit#(8) unsigned char Bit#(9) – Bit#(32) unsigned int Bit#(33) – Bit#(64) unsigned long long Bit#(65) - unsigned int* Bit#(n)unsigned int* Other types are allowed if they can be packed to bits

47 Wide and Polymorphic Data Wide and poly data are passed by reference: unsigned int* For wide data, no size is passed to the C function; the size should be known For poly data, BSV takes no position on how the size is passed; left to the user Size can be passed as an additional argument (see next slide)

48 Polymorphic Example // This function computes a checksum for any size // bit-vector. The second argument is the size of // the input bit-vector. import "BDPI" checksum = function Bit#(32) checksum_raw (Bit#(n), Bit#(32)); // This wrapper handles the passing of the size function Bit#(32) checksum (Bit#(n) vec); return checksum_raw(vec, fromInteger(valueOf(n))); endfunction

49 Implicit pack / unpack Example: import "BDPI“ function Bool my_and (Bool, Bool); Since Bool packs to a Bit#(1), it would connect to a C function such as the following unsigned char my_and (unsigned char x, unsigned char y);

50 Compilation and Linking Imported C functions are compiled like any other C source file See BSV User Guide for details on how to link in the.o files The bsc compiler generates all the auxiliary “glue” for linking in the compiled C code, whether in Bluesim or in Verilog sim

51 Encapsulating BSV into SystemC models SystemC ( is very popular for system modeling and testbencheswww.systemc.org The bsc compiler can encapsulate a BSV module into a SystemC module, which can then be used in a standard SystemC environment “-systemc” flag to bsc (cf. User Guide Sec 4.2) Incorporates the fast Bluesim engine Typical uses: Plug into existing SystemC models and environments to reuse existing testbench

52 Wrapup

53 BSV enables a top-down refinement methodology High-level Transactional I/F Design IPVerification IPSoftware High-level Transactional I/F High-level Transactional I/F Final Bus/ Interconnect I/F void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); cout << "data is: " << hex << data[0] << data[1] << data[2] << data[3] << endl; Tcl_Obj* cmd = Tcl_NewStringObj( "reportResults ", -1 ); for( int i=0; i<4; i++ ) void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); cout << "data is: " << hex << data[0] << data[1] << data[2] << data[3] << endl; Tcl_Obj* cmd = Tcl_NewStringObj( "reportResults ", -1 ); for( int i=0; i<4; i++ ) void TestBench:: Result( void* tbPointer ) { TestBench* tb = reinterpret_cast ( tbPointer ); unsigned data[4]; doResp( tb->drv0, &(tb->out[0]), data ); cout << "data is: " << hex << data[0] << data[1] << data[2] << data[3] << endl; Tcl_Obj* cmd = Tcl_NewStringObj( "reportResults ", -1 ); for( int i=0; i<4; i++ ) Add Functionality Add Architectural Detail Initial Testbench (approx) Final Testbench Explore Architectures Final IP Initial IP (approx) Add Functionality Refine/ Replace Tools FPGA/Emulation (>>100X speed) Bluesim (10X speed) RTL simulation (1X speed) Bluespec Synthesis or RTL synthesis Power Estimation Execution at every step Verify/ Analyze/ Debug Verilog

End

55 Extras

56 Traditional (C/C++ based) High Level Synthesis Source code: standard C/C++ Sequential source semantics to specify highly parallel implementations! Works only for compute-intensive “loop-and-array” algorithms (which can be automatically parallelized) Yes: computation kernels of 802.x, H.264, MIMO,... No: Processors, caches, DMAs, interconnects, I/O peripherals,..., and even the rest of 802.x, H.264,... Even acceptable algorithms need significant “massaging” to produce good synthesis Plus, need tool-specific, and sometimes technology-specific “constraints” guiding synthesis (loop unrolling, resource sharing,...)  Observation: these essentially constrain/specify the microarchitecture (which is explicit in BSV) Non-trivial integration into rest of system BSV, in contrast: microarchitecture precisely specified by designer, but with such strong functional abstraction and parameterization that: often shorter than C/C++ codes single source can represent family of microarchitectures (degree of pipelining, degree of data parallelism, degree of ad-hoc parallelism,...) transparent, predictable, controllable Applications demonstrated: H.264 decode, OFDM, MIMO decode, CIECAM, and more