1 On Controllers, Soft Connections, and Logical Topologies Michael Pellauer MIT CSAIL Angshuman Parashar, Michael Adler, Joel Emer Intel VSSAD
2 The Setup (For both our HAsim simulator and the talk) Virtex5 110t on HiTechGlobal PCIe accelerator Future: FSB-based accelerators. Larrabee? Use HAsim’s Remote-Request-Response (RRR) Protocol of communication between SW/HW Allows calls from one to the other FPGA Host Processor PCIe run program dump stats emulate instr translate address
3 Just because you can talk doesn’t mean you have anything interesting to say! We must control higher-level interactions between software and hardware Example: “Dump Stats” command Transmit requests intra-FPGA, aggregate responses Future: think about multiple-FPGA setup The Problem of the Day Cache Branch Pred PCIe Interface FPGA RRR dump stats Controller …
4 The HAsim Controller Software sees it as… Hardware sees it as… Controller Host Software run, pause, … setPara m dump stats Controller run, pause, … setPara m dump stats enable events debug assertio n fail RRR Which modules use which service is very fluid Different modules access different services
5 Problem: HDLs’ Inflexible Interfaces Branch Predictor has a bug Want to send some debug info to the Controller Fundamental Problem: HDLs allow communication only up and down hierarchy Verilog OOMRs are not an acceptable solution Gets worse if we have alternative modules Branch Pred Simulator CoreControllerFront EndRRRFetchPCIe HW Module Instantiation
6 Our Solution: Soft Connections Goal: “soften” rigid communication hierarchy Users separately instantiate named endpoints Can read and write as if they were half of a guarded FIFO (FI and FO) Instantiator’s interface does not change Bluespec standard ModuleCollect library mkSend “fet2dec” mkRecv “fet2dec” send() recv() Added During Bluespec Static Elaboration Compiler Phase
7 Review: Static Elaboration Phase Inline function calls and datatypes as combinational logic Instantiate modules with specific parameters Resolve polymorphism/overloading run2design2design3 run1 … run2.1 … run1 run3.1 … source Software Toolflow:.exe Compile design1 Elaborate w/params source Hardware Toolflow: run1 run1.1 … run w/ params run1 run w/ params run3
8 Elaboration-Time Algorithm let (sends, recvs) = getCollection() // Get from ModuleCollect for each s in sends do let rs = matchByName(s.name, recvs) if rs == {} and not s.optional then error(“Unmatched Send:” + s.name) else if rs == {r} then connect(s, r) // instantiate buffering else error(“Multiple Receives connected to:” + s.name) recvs = recvs – rs // remove matched recvs for each r in recvs do error(“Unmatched Receive:” + r.name) Open Question: Can we do this in SystemVerilog as well?
9 “Multicast” Connections A one-to-many Send (broadcast) A many-to-one Recv (listener) mkBcast “start_prog” mkRecv “start_prog” broadcast() recv() mkRecv “start_prog” recv() mkRecv “start_prog” recv() Standard receive modules ID + data mkListener “debug_out” listen() mkSend “debug_out” send() mkSend “debug_out” send() mkSend “debug_out” send() Standard send modules (now multiple recvs are no longer an error)
10 Building 2-Way Communication More complex abstractions from primitives Client/Server “Multicast” Client/Server mkClient “mem_load” makeReq() getReq() mkServer getResp() makeResp() ID + data mk Client “mem_load” makeReq() getReq() mk Server getResp() makeResp() mk Client “mem_load” makeReq() getResp() Standard Client modules ID + data Standard Server modules mk Client “stats_count” broadcastReq() mk Server getResp() makeResp() getReq() mk Server makeResp() “stats_count” getReq() Pair of normal send and recv
11 Controller Services: Revisited Which should get which type of soft connection? Commands/Params: Receive from software, send to many modules One-to-Many Broadcast Can make a nice abstraction for local commands, params Events/Stats: Receive from software, send to many modules, aggregate responses Many-to-one Client Assertions/Debug: Receive from many modules, send to software Many-to-one Receive
12 Case Study: span span(c) = number of instantiation boundaries crossed between sender and receiver Roughly, the pain of changing a communication path In HAsim, 118/217 connections are to/from Controller We start to worry about the massive fan-in
13 Logical Topology vs Physical Topology We described the “logical” communication topology Could be implemented with different physical topology Could use Rings/Trees/Grids to offset massive fan-in Implemented: Rings and Trees So far no improvement over physical point-to-point send recv station station has an address for “foo” #5 “foo” station has to know #5 means “foo” this station doesn’t have #5 Station routing tables made at elaboration send recv Connection interface does not change!
14 Take Aways FPGA-as-accelerator model is rapidly maturing The FPGA-as-raw-fabric model is not ideal Something like HAsim’s Controller helps Coordinates interaction between FPGA/SW Need different Hardware-design techniques for FPGA accelerators More flexibility needed: reconfigurations common Soft Connections bring flexibility to interfaces Make it easier to have a fluid set of modules which interact with the controller Logical topology != Physical topology Designer needs help with both
Thank You!
Extra Slides
17 The Controller’s Services Commands: Receive “start” or “pause” from software Controller distributes to all interested hardware modules Params: Receive dynamic command line values Controller distributes to interested hardware modules Events: Software can enable, disable Controller aggregates, sends to software Stats: Software requests dump periodically Controller passes on request, aggregates responses Assertions: Controller passes failures on to software Debug: Controller passes info on to software
18 Ultimately we want many distributed “services” throughout the FPGA talking to software They communicate at different rates It makes sense for the variable/rare services to share the same interconnect on the FPGA Flexibility of communication == Easier development Today: Development plan and issues Making “Gateware” more like Software CommonVariableRare Events Loads/Stores Debugging Messages Stats Assertion failures
19 Review: Soft Connections Point-to-Point “Smart” Synthesis Boundaries Client/Server mkSend “fet2dec” mkRecv “fet2dec” send() recv() mkClient “funcp_fet” makeReq() getReq() mkServer getResp() makeResp() outg Compiler Log: “Dangling Send fet2dec [3] {Inst}” send “fet2dec” send() try_xfer() xfer_ack() … try_xfer() xfer_ack() B A mkB addDanglingSend(mkB.outg[3], “fet2dec”, “Inst”);
20 Proposed Primitive: One-To-Many A “Broadcast” Send mkBcast “start_prog” mkRecv “start_prog” broadcast() recv() mkRecv “start_prog” recv() mkRecv “start_prog” recv() mkRecv “start_prog” recv() All rules and registers inserted during static elaboration (don’t know how many receivers during instantiation) rule when (all r == 1): all r <= 0 q.deq() when (r[0] == 0): try_xfer(q.first()) if (ack) r[0] <= 1 when (r[1] == 0): try_xfer(q.first()) if (ack) r[1] <= 1 when (r[2] == 0): try_xfer(q.first()) if (ack) r[2] <= 1 when (r[3] == 0): try_xfer(q.first()) if (ack) r[3] <= 1 Tougher alternative: many FIFOs Standard receive modules
21 Proposed Primitive: Many-to-One A “listener” receive mkListener “debug_out” listen() ID + data mkSend “debug_out” send() mkSend “debug_out” send() mkSend “debug_out” send() mkSend “debug_out” send() rule when (q0.notEmpty): try_xfer(q0.first(), 0) if (ack) q0.deq() rule when (q1.notEmpty): try_xfer(q1.first(), 1) if (ack) q1.deq() rule when (q2.notEmpty): try_xfer(q2.first(), 2) if (ack) q2.deq() rule when (q3.notEmpty): try_xfer(q3.first(), 3) if (ack) q3.deq() All rules inserted during static elaboration (don’t know IDs during instantiation) Standard send modules Is a fairness guarantee needed?
22 Proposed Primitive: Hub Servers Hub Server, Distributed Clients 1 Many-to-One Connection Reverse is many One-to-One connections Remove the ID and send it to the appropriate destination mkClient “mem_load” makeReq() getReq() mkHub Server getResp() makeResp() mkClient “mem_load” makeReq() getResp() ID + data Standard Client modules
23 Proposed Primitive: Hub Client Hub Client, Distributed Servers 1 One-to-Many Connection 1 Many-to-One Connection mkHub Client “stats_count” broadcastReq() getReq() mkServer getResp() makeResp() getReq() mkServer makeResp() “stats_count” ID + data Standard Server modules Ability to send to individuals as well?