ESL: System Level Design Bluespec ESEPro: ESL Synthesis Extenstions for SystemC Rishiyur S. Nikhil CTO, Bluespec, Inc. ( Lecture 16 Delivered by Arvind March 16, 2007 (Only a subset of Nikhil’s slides are included)
2 Rishiyur Nikhil, Bluespec, Inc. Not avail. early; slower sim; HW-accurate explore architectures (for speed, area. power) refine HW Implementation implements Software The central ESL design problem Avail. early; very fast sim; not HW-accurate (timing, area) Early software implements First HW model(s) HW/SW interface (e.g., register read/write) Early models Required: uniform computational model (single paradigm), plus higher level than RTL, even for implementation
3 Rishiyur Nikhil, Bluespec, Inc. Another ESL design problem Reuse (models and implementations) SoC 1SoC 2SoC n Required: powerful parameterization and powerful module interface semantics
4 Rishiyur Nikhil, Bluespec, Inc. Bluespec enables ESL Rules and Rule-based Interfaces provide a uniform computational model suitable both for high-level system modeling as well as for HW implementation Atomic Transaction semantics are very powerful for expressing complex concurrency – Formal and informal reasoning about correctness – Automatic synthesis of complex control logic to manage concurrency Map naturally to HW (“state machines); synthesizable; no mental shifting of gears during refinement Can be mixed with regular SystemC, TLM, and C++, for mixed- model and whole-system modeling Enables Design-by-Refinement; Design-by-Contract BSV: Bluespec SystemVerilog ESEPro: Bluspec’s ESL Synthesis Extensions to SystemC
5 Rishiyur Nikhil, Bluespec, Inc. Rule Concurrent Semantics “Untimed” semantics: “Timed”, or “Clock Scheduled” semantics (Bluespec scheduling technology) Forever: Execute any enabled rule In each clock: Execute a subset of enabled rules (in parallel, but consistent with untimed semantics)
6 Rishiyur Nikhil, Bluespec, Inc. Bluespec Tools Architecture Scheduling Optimization RTL Generation Static Checking Power Optimization Parsing BSV (SystemVerilog*) ESEPro (SystemC*) RTL gcc systemc.h, esl.h.exe Common Synthesis Engine ESEComp and BSC Bluesim ESE and ESEPro Rapid, Source-Level Simulation and Interactive Debug of BSV Cycle-Accurate w/Verilog sim Blueview Debug Untimed & Timed sim synthesis
7 Rishiyur Nikhil, Bluespec, Inc. Outline Limitations of SystemC in modeling SoCs ESEPro’s Rule-based Interfaces Model-to-implementation refinement with SystemC and ESEPro modules Seamless interoperation of SystemC TLM and ESEPro modules ESEPro-to-RTL synthesis An example
8 Rishiyur Nikhil, Bluespec, Inc. Example illustrating why modeling hardware-accurate complex concurrency is difficult in standard SystemC (threads and events)
9 Rishiyur Nikhil, Bluespec, Inc. A 2x2 switch, with stats Spec: Packets arrive on two input FIFOs, and must be switched to two output FIFOs Certain “interesting packets” must be counted Determine Queue +1 Count certain packets
10 Rishiyur Nikhil, Bluespec, Inc. The first version of the SystemC code is easy Determine Queue +1 Count certain packets void thread1 () {while (true) { Pkt x = in0->first(); in0->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c++;} } void thread2 () {while (true) { Pkt x = in1->first(); in1->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c++;} } first(), deq() block if input fifo is empty; enq() blocks if output fifo is full. It all works fine because of “cooperative parallelism”
11 Rishiyur Nikhil, Bluespec, Inc. Cooperative parallelism model The two increments to the counter do not need to be protected with “locks” because of SystemC’s definition of parallelism as cooperative, i.e., Threads only switch at “wait()” statements Threads do not interleave But real hardware has real parallelism! Gap between model and implementation Further, cooperative multithreading also makes it hard to simulate models in parallel (e.g., on a modern multi-core or SMP machine) This code would have problems with preemptive parallelism
12 Rishiyur Nikhil, Bluespec, Inc. There could be some subtle mistakes Determine Queue +1 Count certain packets void thread1 () {while (true) { int tmp = c ; Pkt x = in0->first(); in0->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c = tmp + 1 ;} } void thread2 () {while (true) { int tmp = c ; Pkt x = in1->first(); in1->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c = tmp + 1 ;} } If the threads interleave due to blocking of first(), deq(), enq(), c will be incorrectly updated (non-atomically) Cooperative parallelism Atomicity
13 Rishiyur Nikhil, Bluespec, Inc. Hardware has additional “resource contention” constraints Each output fifo can be enq’d by only one process at a time (in the same clock) Need arbitration if both processes want to enq() on the same fifo simultaneously SystemC’s cooperative multitasking makes it easy to ignore this, but much harder to model this accurately Determine Queue +1 Count certain packets Accurately modeling this makes the code messier
14 Rishiyur Nikhil, Bluespec, Inc. Hardware has additional “resource contention” constraints The counter can be incremented by only one process at a time Need arbitration if both want to increment SystemC’s cooperative multitasking makes it easy to ignore this, but much harder to model this accurately Determine Queue +1 Count certain packets Accurately modeling this makes the code messier
15 Rishiyur Nikhil, Bluespec, Inc. Hardware has additional “resource contention” constraints No intermediate buffering a process should transfer a packet only when both its input fifo and its output fifo are ready, and it has priority on its output fifo and the counter SystemC’s blocking methods make it easy to ignore this, but much harder to model this accurately Determine Queue +1 Count certain packets Accurately modeling this makes the code messier
16 Rishiyur Nikhil, Bluespec, Inc. Hardware typically has additional “resource contention” constraints These constraints must be modeled in order to model HW performance accurately (latencies, bandwidth) In SystemC, this exposes full/empty tests on fifos, adds locks/semaphores, polling of locks/semaphores, … The code becomes a mess If we want synthesizability, it more and more resembles writing RTL in SystemC notation
17 Rishiyur Nikhil, Bluespec, Inc. Limitations of SystemC/C++ Accurate SoC modeling involves lots of concurrency and dynamic, fine-grain resource sharing Because these are the characteristics of HW Most blocks in an SoC are HW; a few blocks (e.g., processor, DSP) involve software (typically C, C++) “Threads and Events” (SystemC’s concurrency model) are far too low-level for this Require tedious, explicit management of concurrent access to shared state Weak semantics for module composition Does not scale to large systems They are the source of the majority of bugs in RTL and SystemC (race conditions, inconsistent state, protocol errors, …) Instead, advanced SW systems (e.g., Operating Systems, Database Systems, Transaction Processing Systems) use Atomic Transactions to manage complex concurrency
18 Rishiyur Nikhil, Bluespec, Inc. Other issues with SystemC/C++ No early feedback on HW implementability during modeling, because of distance of SystemC semantics from HW Threads, stacks, dynamic allocation, events, locks, global variables, undisciplined instantaneous access to global/remote data Undisciplined access to shared resources No credible synthesis from a sequential, thread- based model of computation (except for loop-and- array computational kernels) The design has to be re-implemented in RTL
19 Rishiyur Nikhil, Bluespec, Inc. Literature on problems with threads (and the advantages of atomicity) The Problem with Threads, Edward A. Lee, IEEE Computer, 39:5, pp 33-42, May 2006 Why threads are a bad idea (for most purposes), John K. Ousterhout, Invited Talk, USENIX Technical Conference, January 1996 Composable memory transactions, Tim Harris, Simon Marlow, Simon Peyton Jones and Maurice Herlihy, in ACM Conf. on Principles and Practice of Parallel Programming (PPoPP), Atomic Transactions, Nancy A. Lynch, Michael Merritt, William E. Weihl and Alan Fekete, Morgan Kaufman, San Mateo, CA, 1994, 476 pp. … and more …
20 Rishiyur Nikhil, Bluespec, Inc. 2x2 switch: the meat of the ESEPro code Determine Queue +1 Count certain packets ESL_RULE (r0); Pkt x = in0->first(); in0->deq(); if (x.dest == 0) out0->enq(x); else out1->enq(x); if (count(x)) c++; } ESL_RULE (r1); Pkt x = in1->first(); in1->deq(); if (x.dest == 0) out0->enq(x); else out1->enq(x); if (count(x)) c++; } Atomicity of rules captures all the “resource contention” constraints of hardware implementation; further, this code is synthesizable to RTL as written.
21 Rishiyur Nikhil, Bluespec, Inc. Managing change Specs always change. Imagine: Some packets are multicast (go to both FIFOs) Some packets are dropped (go to no FIFO) More complex arbitration –FIFO collision: in favor of r1 –Counter collision: in favor of r2 –Fair scheduling Several counters for several kinds of interesting packets Non exclusive counters (e.g., TCP IP) M input FIFOs, N output FIFOs (parameterized) What if these changes are required 6 months after original coding? With Rules these are easy, because the source code remains uncluttered by all the complex control and mux logic atomicity ensures correctness
22 Rishiyur Nikhil, Bluespec, Inc. Outline Limitations of SystemC in modeling SoCs ESEPro’s Rule-based Interfaces Model-to-implementation refinement with SystemC and ESEPro modules Seamless interoperation of SystemC TLM and ESEPro modules ESEPro-to-RTL synthesis An example
23 Rishiyur Nikhil, Bluespec, Inc. Interfaces: raising the level of abstraction (while preserving Rule semantics) Interfaces can also contain other interfaces We use this to build a hierarchy of interfaces Get/Put Client/Server … These capture common interface design patterns There is no HW overhead to such abstraction Connections between standard interfaces can be packaged (and used, and reused) “Connectable” interfaces All these are synthesizable
24 Rishiyur Nikhil, Bluespec, Inc. Get and Put Interfaces Provide simple methods for getting data from a module or putting data into it Easy to connect together template ESL_INTERFACE ( Get ) { ESL_METHOD_ACTIONVALUE_INTERFACE ( get, T ); } template ESL_INTERFACE ( Put ) { ESL_METHOD_ACTION_INTERFACE ( put, T x ); } get put
25 Rishiyur Nikhil, Bluespec, Inc. Get and Put Interfaces Get and Put are just interface specifications Many different kinds of modules can provide Get and Put interfaces E.g., a FIFO’s enq() can be viewed as a put() operation, and a FIFO’s first()/deq() can be viewed as a get() operation
26 Rishiyur Nikhil, Bluespec, Inc. Interface transformers/transactors Because of the abstractions of interfaces and modules, it is easy to write interface transformers/transactors This example is from the ESEPro library, transforming a FIFO interface into a Get interface ESL_MODULE_TEMPLATE ( fifoToGet, Get, T ) { FIFO *f; ESL_METHOD_ACTIONVALUE (get, true, T) { T temp = f->first(); f->deq(); return temp; } ESL_CTOR ( fifoToGet, FIFO *ff ) : f ( ff ) { ESL_END_CTOR; } };
27 Rishiyur Nikhil, Bluespec, Inc. Interface transformers/transactors Another example from the ESEPro library, transforming a FIFO interface into a Put interface: ESL_MODULE_TEMPLATE ( fifoToPut, Put, T ) { FIFO *f; ESL_METHOD_ACTION (put, true, T x) { f->enq (x); } ESL_CTOR ( fifoToPut, FIFO *ff ) : f ( ff ) { ESL_END_CTOR; } };
28 Rishiyur Nikhil, Bluespec, Inc. Nested interfaces An interface can not only contain methods, but also nested interfaces template ESL_INTERFACE ( Server ) { ESL_SUBINTERFACE ( request, Put ); ESL_SUBINTERFACE ( response, Get ); } get put
29 Rishiyur Nikhil, Bluespec, Inc. Sub-interfaces: using transformers The ESEPro library provides functions to convert FIFOs to Get/Put ESL_MODULE ( mkCache, CacheIfc ) { FIFO *p2c; FIFO *c2p; … rules expressing cache logic … ESL_CTOR ( mkCache, …) { request = new fifoToPut (“req”, p2c); response = new fifoToGet (“rsp”, c2p); } Absolutely no difference in the HW! typedef Server CacheIfc; get put mkCache
30 Rishiyur Nikhil, Bluespec, Inc. client Client/Server interfaces Get/Put pairs are very common, and duals of each other, so the library defines Client/Server interface types for this purpose ESL_INTERFACE ( Client ) { ESL_SUBINTERFACE (request, Get ); ESL_SUBINTERFACE (response, Put ); }; ESL_INTERFACE ( Server ) { ESL_SUBINTERFACE ( request, Put ); ESL_SUBINTERFACE ( response, Get ); }; get server get put req_tresp_t
31 Rishiyur Nikhil, Bluespec, Inc. Client/Server interfaces ESL_INTERFACE ( CacheIfc ) { ESL_SUBINTERFACE ( ipc, Server ); ESL_SUBINTERFACE ( icm, Client ); }; ESL_MODULE ( mkCache, CacheIfc ) { // from / to processor FIFO *p2c; FIFO *c2p; // to / from memory FIFO *c2m; FIFO *m2c; … rules expressing cache logic … ESL_CTOR (mkCache ) { … ipc = fifosToServer (p2c, c2p); icm = fifosToClient (c2m, m2c); ESL_END_CTOR; } mkCache get put server client get put get put server client get put mkMem mkProcessor
32 Rishiyur Nikhil, Bluespec, Inc. Connecting Get and Put A module m1 providing a Get interface can be connected to a module m2 providing a Put interface with a simple rule ESL_MODULE ( mkTop, …) { Get *m1; Put *m2; ESL_RULE ( connect, true ) { x = m1->get(); m2->put (x); // note implicit conditions } get put
33 Rishiyur Nikhil, Bluespec, Inc. “Connectable” interface pairs There are many pairs of types that are duals of each other Get/Put, Client/Server, YourTypeT1/YourTypeT2, … The ESEPro library defines an overloaded, templated module mkConnection which encapsulates the connections between such duals The ESEPro library predefines the implementation of mkConnection for Get/Put, Client/Server, etc. Because overloading in C++ is extensible, you can overload mkConnection to work on your own interface types T1 and T2
34 Rishiyur Nikhil, Bluespec, Inc. mkConnection Using these interface facilities, assembling systems becomes very easy ESL_MODULE ( mkTopLevel, …) { // instantiate subsystems Client *p; Cache_Ifc *c; Server *m; // instantiate connections new mkConnection, Server > (“p2c”, p, c->ipc); new mkConnection, Server > (“c2m”, c->icm, m); } mkCache get put server (ipc) client (icm) get put get put server client get put mkMem mkProcessor
35 Rishiyur Nikhil, Bluespec, Inc. Outline Limitations of SystemC in modeling SoCs ESEPro’s Rule-based Interfaces Model-to-implementation refinement with SystemC and ESEPro modules Seamless interoperation of SystemC TLM and ESEPro modules ESEPro-to-RTL synthesis An example
36 Rishiyur Nikhil, Bluespec, Inc. Rules and Levels of abstraction PV (Programmer’s View) PVT (PV with Timing) AV (Architect’s View) CA (Cycle-accurate) IM (Implementation) AL/FL (Algorithm/Function level) Untimed Rules (no clocks) Clocked Rules (scheduled) Rules, C, C++, Matlab, …
37 Rishiyur Nikhil, Bluespec, Inc. Module structure A system model can contain a mixture of SystemC modules and ESEPro modules Typical SystemC modules: CPU ISS models Behavioral models C++ code targeted for behavioral synthesis Existing SystemC IP Typical ESEPro modules: Complex control Requiring HW-realistic architectural exploration Processor (App/ISS) DSP (App/ISS) DMA Mem Controller DRAM model Interconnect L2 cache Codec model SystemC Rule-based SystemC Legend SoC Model
38 Rishiyur Nikhil, Bluespec, Inc. Simulation flow Processor (ISS/App) DSP (ISS/App) DMA Mem Controller DRAM model Interconnect L2 cache Codec model SystemC Rule-based SystemC Legend System Model ESL class defs/libs + ESL core SystemC Standard SystemC tools (gcc, OSCI sim, gdb, …) + TLM core SystemC class defs/libs + TLM + TLM class defs/libs
39 Rishiyur Nikhil, Bluespec, Inc. Synthesizable subset: ESEPro Rule-based modules much higher level than RTL already validated in BSV Bluespec synthesis tool RTL Synthesis flow Processor (ISS) DSP (App) DMA Mem Controller DRAM model Interconnect L2 cache Codec model System Model Verilog sim RTL synthesis, Physical design Tapeout SystemC Rule-based SystemC Legend
40 Rishiyur Nikhil, Bluespec, Inc. System refinement Using ESEPro ESEPro modules can be introduced early as they can be written at a very high level, can interface to TLM modules, and can themselves be refined System-level testbenches can be reused at all levels SystemC modules with standard TLM interfaces interoperate seamlessly with ESEPro modules Behavioral models, Design IP, Verification IP, … More information at: Website also has a free distribution called “ESE”
41 Rishiyur Nikhil, Bluespec, Inc. Mixing models: all combinations TLM Master ESEPro Master SystemC Rule-based SystemC Legend TLM Slave ESEPro Slave ESEPro Slave ESEPro Master TLM MasterTLM Slave TLM Master and Slave are taken unmodified from OSCI TLM distribution examples Replace Master Replace Slave Replace Master
42 Rishiyur Nikhil, Bluespec, Inc. Structure of TLM modules in demo (from OSCI_TLM/examples/example_3_2) TLM master write (addr, data) read (addr, data &) basic_initiator_port RSP = transport (REQ) TLM slave write (addr, data) read (addr, data &) basic_slave_base 20 transport () is a basic TLM interface call
43 Rishiyur Nikhil, Bluespec, Inc. TLM master TLM master and ESEPro slave ESEPro slave write (addr, data) read (addr, data &) Server write (addr, data) read (addr, data &) basic_initiator_port 20 transport () mkConnection (channel)
44 Rishiyur Nikhil, Bluespec, Inc. Example: ESEPro SoC model for synthesis (from ST/GreenSoCs “TAC” model) M S = Master interface = Slave interface (< 1000 lines of source code) Router Initiator 0Initiator 1 Target 0Timer Respond to timer interrupt Target 1 Set timer M M MMM SSS S S M
45 Rishiyur Nikhil, Bluespec, Inc. SoC Model: Behavior Initiators repeatedly do read/write transactions to Targets, via Router At startup, Initiator 1 writes to Timer registers via Router, starting the timer When Timer’s time period expires, generates an interrupt to Initiator 1
46 Rishiyur Nikhil, Bluespec, Inc. SynthesisSimulation SoC Model in ESEPro (from ST/GreenSocs “TAC” model) Standard SystemC tools (gcc, OSCI sim, gdb, …) core SystemC class defs/libs ESL class defs/libs + ESL RTL Verilog sim Bluespec synthesis tool ESEComp™ESEPro™ This capability is unique to ESEComp Cycle Accurate Magma synthesis Synthesis example
47 Rishiyur Nikhil, Bluespec, Inc. Side-by-side simulation comparison Cycle 12 Target[1]: got request from initiator[1], addr is 1001 Target[1]: sending response, data 1011 Target[0]: got request from initiator[0], addr is 4 Target[0]: sending response, data 14 Initiator_with_intr_in[1]: forwarding req, addr = 1003 Initiator[0]: got response addr 2, data 12 Initiator[0]: sending req, addr = Cycle 13 Timer: generating interrupt Initiator[1]: sending req, addr = Cycle 14 Target[1]: got request from initiator[0], addr is 1005 Target[1]: sending response, data 1015 Target[0]: got request from initiator[1], addr is 2 Target[0]: sending response, data 12 Initiator_with_intr_in[1]: forwarding req, addr = 4 Initiator[1]: got response addr 0, data 10 Initiator[0]: got response addr 1003, data 1013 Initiator[0]: sending req, addr = Cycle 15 Initiator_with_intr_in[1] received interrupt Initiator[1]: sending req, addr = 1005 Cycle 12 Initiator[0]: sending req, addr = 6 Initiator[0]: got response addr 2, data 12 Target[0]: got request from initiator[0], addr is 4 Target[0]: sending response, data 14 Target[1]: got request from initiator[1], addr is 1001 Target[1]: sending response, data 1011 Initiator_with_intr_in[1]: forwarding req, addr = Cycle 13 Initiator[1]: sending req, addr = 4 Timer: generating interrupt Cycle 14 Initiator[0]: sending req, addr = 1007 Initiator[0]: got response addr 1003, data 1013 Initiator[1]: got response addr 0, data 10 Target[0]: got request from initiator[1], addr is 2 Target[0]: sending response, data 12 Target[1]: got request from initiator[0], addr is 1005 Target[1]: sending response, data 1015 Initiator_with_intr_in[1]: forwarding req, addr = Cycle 15 Initiator[1]: sending req, addr = 1005 Initiator_with_intr_in[1] received interrupt SystemC Simulation Verilog (Generated) Simulation Cycle Accurate (order of messages within each cycle varies, but that’s ok—from parallel actions)
48 Rishiyur Nikhil, Bluespec, Inc. SoC Router: Magma Synthesis Results ESEComp’s Verilog output run through Magma’s synthesis tools TSMC 0.18 µm libraries Design easily meets 400 MHz Thanks