Presentation is loading. Please wait.

Presentation is loading. Please wait.

Communication Modeling for System-Level Design Andrew B. Kahng #,* Kambiz Samadi * CSE # and ECE * Departments,

Similar presentations


Presentation on theme: "Communication Modeling for System-Level Design Andrew B. Kahng #,* Kambiz Samadi * CSE # and ECE * Departments,"— Presentation transcript:

1 Communication Modeling for System-Level Design Andrew B. Kahng #,* abk@cs.ucsd.edu Kambiz Samadi * kambiz@vlsicad.ucsd.edu CSE # and ECE * Departments, UCSD November 24, 2008 abk@cs.ucsd.edu kambiz@vlsicad.ucsd.edu

2 ISSOC-2008 2 Outline  Motivation  Communication Synthesis for Network-on-Chip  Network-on-Chip Architecture Modeling  Buffered Interconnect Model  Router Power and Area Model  Bus Architecture Modeling  Conclusions

3 ISSOC-2008 3 Motivation  Focus of design process is shifting from “computation” to “communication”  Device and interconnect performance scaling mismatches cause breakdown of traditional across-chip communication  System-level designers require accurate, yet simple models to bridge planning and implementation stages  Today’s system-level performance, power modeling suffers:  Ad hoc selection of models  Poor balance between accuracy and simplicity  Lack of model extensibility across future technology nodes  Due to design performance / power constraints, early-stage design exploration has become crucial  Our Goal: Develop accurate models that are easily usable by system-level design early in the design cycle

4 ISSOC-2008 4 Communication Synthesis for Network-on-Chip  Given  An input specification as a set of communication constraints  A library of communication components  An objective function (e.g., power, area, delay)  Find  A network-on-chip implementation as a composition of library components that  Satisfies the specification  Minimizes the cost function  Communication Synthesis Infrastructure (COSI)  Based on the Platform-Based Design methodology  Takes specification and library descriptions in XML format  Produces a variety of outputs, including a cycle accurate SystemC implementation of the optimal network-on-chip

5 ISSOC-2008 5 Application Implementation Constraints Propagation Point-to-Point Specification On-Chip Communication Library Perf. / Cost Abstractions Synthesis Synthesis Result Constraint-Driven Communication Synthesis

6 ISSOC-2008 6 Buffered Interconnect Model  Components  Repeater delay model  Separate models for intrinsic delay, output slew, input capacitance  Wire delay model  Accounts for coupling capacitance impact on wire delay  Repeater power model  Accounts for sub-threshold and gate leakages  Repeater area model  Derived from existing cell layouts (can be extrapolated)  Wire area model  Derived from wire width and spacing (can be extrapolated) Local Interconnect Device ITRS PTM Min. Inverter R d C in I off t intrinsic MASTAR Interconnect Chapter SPICE Sim. Interconnect T H ILD W min S min ε ILD TIERS(L,I,SG,G) Intermediate Global Semi-global LEF/ITF. lib Automatic Extraction Automatic Extraction Technology parameter extraction flows.  Inputs for repeater delay calculation  Delay and slew values for a set of input slew and load capacitance values (obtained from Liberty / SPICE)  Input capacitance for different repeater size (Liberty, PTM)  Inputs for wire delay calculation  Wire dimensions (ITRS/PTM, LEF, ITF)  Inter-wire spacings for global and intermediate layers (ITRS/PTM, LEF, ITF)  Inputs for power calculation  Input capacitance (Liberty, PTM)  Wire parasitics (computed in wire delay calculation)

7 ISSOC-2008 7 Repeater and Wire Models  delay = i(slew in ) + r(slew in ) * C L  r(s) = f(size, slew in )  slew out = f(slew in,C L )  wire delay = Elmore Intrinsic Delay Model – i(slew in ) Drive Res Model – r(slew in )  Repeater area and power linear with repeater size  Predictions extend down to 16nm  Delay model is < 15% of PrimeTime

8 ISSOC-2008 8 Impact on System-Level Design  Testcases  VPROC: video processor with 42 cores and 128-bit datawidth  dVOPD: dual video object plane decoder with 26 cores and 128-bit datawidth  Original model (Orig.) underestimates power compared to the Proposed Model (Prop.)  Original Model is very optimistic in delay  becomes more critical as technology scales and the chip size becomes larger

9 ISSOC-2008 9 ORION2.0: Accurate NoC Router Models circuit implementation & buffering scheme SRAM and register FIFO MUX-tree and Matrix crossbar different arbitration scheme hybrid buffering scheme architectural parameters # of ports; # of buffers # of xbar ports; # of VC voltage, frequency interconnect parameters device parameters scaling factors for future technologies … technology parameters FIFO Arbiter Crossbar Clock Link Leakage Dynamic Area ORION2.0 – NEW !  Built on top of ORION1.0  Provides, previously missing, power subcomponents  Provides significant accuracy improvement vs. ORION1.0  Uses our automatic flows to obtain technology inputs  To appear in DATE-2009 (A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi)

10 ISSOC-2008 10 Validation and Significance Assessment  Validation: Two Intel NoC Chips  (1) Intel 80-core Teraflops, and (2) Intel SCC  ORION2.0 offers significant accuracy improvement  System-level Impact: COSI-OCC  ORION2.0 models lead to better-performing NoC: (1) less # hops, and (2) less # routers  Relative power due to additional port not as high in ORION2.0 vs. 1.0

11 ISSOC-2008 11 AMBA Models  Signal Bus Modeling:  system-level interconnect model (described earlier)  Logic Modeling (multiplexers, decoders, and arbiter):  Block latency based on gate delay model (cf. Carloni et al. ASPDAC’08)  Dynamic power is computed after measuring the switching capacitance  Leakage power is computed from average device leakages  Area is computed from cell areas of logic gates

12 ISSOC-2008 12 AMBA Modeling and Bus vs. NoC Study  Delay, power, area models within 11% of physical implementation  Functional forms verified against physical implementation of AMBA-AHB controller  Bus vs. NoC study enables design space explorations of heterogeneous communication fabrics technology & design style min. width, spacing, thickness dielectric thickness, constant device drive res, cap, leakage width/spacing, buffering scheme AMBA Model Delay Leakage Dynamic Area floorplan location of all masters, slaves bit widths of all masters, slaves optionally, locations of arbiter, decoder, and multiplexers transaction read and write length address progression

13 ISSOC-2008 13 Conclusions and Future Directions  Accurate models can drive effective system-level exploration  Reproducible methodology for extracting inputs to models  Modeling at different levels of abstractions  protocol encapsulation (e.g., hand-shaking for AMBA bus allocation)  buses, pipelined rings (e.g. EIB in IBM Cell)  routers, network interfaces  FIFOs, queues, crossbar switches (ORION2.0)  Extending to other technologies  3D IC integration (i.e., TSV modeling, multi-layer router modeling, etc.)

14 ISSOC-2008 14 Backup Slides

15 ISSOC-2008 15 Communication Synthesis Key Elements  Specification of input constraints  Set of IP cores: area and interface  End-to-end communication requirements between pairs of IP cores: latency and throughput  Characterization of library of components  Interface types, max number of ports  Max capacities: bandwidth, latency, max distance  Performance and cost model  Component instantiation and parallel composition  Rename, set parameters of library components  Composition based on algebra on quantities (including type compatibility)

16 ISSOC-2008 16  Synthesis of optimal network-on-chip  Return valid composition that meets input constraints and  Minimizes the objective function (e.g., power dissipation) (Original Specification) Platform Instance 1 Platform Instance 2 Communication Synthesis Example

17 ISSOC-2008 17  COSI is a public-domain software package for NoC synthesis http://embedded.eecs.berkeley.edu/cosi/ COSI: Communication Synthesis Infrastructure

18 ISSOC-2008 18 Dynamic and Leakage Power Models  Leakage Power: Subthreshold and Gate  From 65nm and beyond gate leakage becomes significant  I ’ sub (i,s) and I ’ gate (i,s) are subthreshold and gate leakage currents per unit transistor width for a specific technology  W sub (i,s) and W gate (i,s) are the effective widths of component i at input state s for subthreshold and gate leakage, respectively  Key circuit components INVx1, NAND2x1, NOR2x1, and DFF  Dynamic Power: Switching Capacitance  Clock power:  P clk =  C clk  V dd 2  f  C clk = C sram-fifo + C pipeline-registers + C register-fifo + C wiring  Physical Links: due to charging and discharging of capacitive load  P d =  C load  V dd 2  f; C load = C ground + C coupling + C input  Register-based FIFO: implemented as shift registers  Other components: we use ORION 1.0 models

19 ISSOC-2008 19 Area Model  As number of cores increases, the area occupied by communication components becomes significant (19% of total tile area in the Intel 80-core Teraflops Chip)  Gate area model by Yoshida et al. (DAC’04)  Link area model by Carloni et al. (ASPDAC’08)  We model FIFO, crossbar switch, and arbiter areas using the adopted gate area model Area arbiter = (Area NOR2x1.2(R-1)R) + (Area DFF.(R(R-1)/2)) + (Area INVx1.R) Matrix Arbiter


Download ppt "Communication Modeling for System-Level Design Andrew B. Kahng #,* Kambiz Samadi * CSE # and ECE * Departments,"

Similar presentations


Ads by Google