Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +*, Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments {abk, billlin,
Outline Motivation Our work: Overview Methodology Flit-level power estimation Summary 2
NoC Modeling So Far… (ORION) 3 Arbiter XBAR BUF I BUFE BUFW BUFN BUFSLink SRC Link SINK ORION1.0 (2002) 6NOR + 2INV + DFF ORION2.0 (2009) 6NOR + 2INV + DFF Leakage power Clock power
What Is The Problem? RTL code mismatch Logic transformation and technology mapping mismatch 4 Arbiter XBAR BUF I BUFE BUFW BUFN BUFSLink SRC Link SINK 6NOR + 2INV + DFF
How Bad Is It? Router RTL generators: Netmaker – Cambridge, UK Stanford NoC - Stanford 460% 89% Why such large errors? Assumed logic template inaccurate Control logic not modeled Implementation details missing 5
Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 6
P - #Ports V - #VCs B - #BUFs F – Flit-width Key idea: No assumed logic template Component models derived from actual RTL synthesized with cell libraries We Propose: Step 1 Derive router component block parametric models from post-synthesis netlists PVBF# Instances ~P 2 PVBF# Instances ~F XBAR ~ P 2 F 7
We Propose: Step 2 Automatic fitting of models with post-P&R power and area 8 XBAR ~ P 2 F PVBFArea LSQR XBAR area = a 1. P 2 F + a 0 Key idea: Capture implementation details using automatic regression fit Characterization performed only once and usable for multiple design space explorations
Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 9
Model Development Two RTL generators: –Netmaker (Cambridge, UK) –Stanford NoC SP&R tools: –Cadence RC & Synopsys DC for hierarchical synthesis to analyze each block –Cadence SOC Encounter for P&R NoC router RTL generators Impl params: Clock Frequency µArch params: P, V, B, F Synthesis and P&R: DC/RC, SOCE Analysis of blocks: XBAR, SW & VC arbiter, Input & Output buffers New models for each component block ComponentModel XBARP2FP2F SWVC9(P 2 V 2 + P 2 + PV – P) InBUF180PV + 2PVBF + 2P 2 VB + 3PVB + 5P 2 B + P 2 + PF + 15P OutBUF25P + 80PV CLKCTRL0.02(SWVC + InBUF + OutBUF) 10
Overall Methodology Manual –Quick and easy –Misses implementation details BasicRegression fit Manual Estimates for gate count ORION_NEW models LSQR Technology Library Cell area Cell leakage Pin cap. Internal energy Area Power: leakage, internal, switching Post P&R data per block Std. cell count & area Leakage power Internal power Switching power LSQR –Accurate (captures implementation details) –One-time overhead (generation of P&R training data points) 11
POWER 6.5x reduction Results: Area And Power 12 AREA 4x reduction Methodology scales across technologies, router RTL generators
Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 13
Flit-level Power Estimation Dynamic power estimation using flit-level bit encodings Have integrated with full-system NoC simulator (GARNET) Post-P&R router netlist Testbench Gate-level simulation VCD Power analysis Power Report Regression fit ORION_NEW models Flit-level power model GARNET gem5 Flit-level power estimates 14
Results: Flit-level Power Accurate estimation of flit-level dynamic power x reduction
Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 16
Summary New hybrid modeling methodology: relax the template mindset –Explicitly models control and data signals –Captures RTL and implementation details Using proposed parametric regression methodology, worst-case estimation errors reduced by a factor of –6.5x from ORION2.0 for power –4x from ORION2.0 for area We propose an application of our methodology for flit-level dynamic power modeling and integration with GARNET –3.6x worst-case error reduction in dynamic power estimation Ongoing: Non-parametric modeling of post-P&R power and area 17
Thank You ! 18
Back up 19
Regression analysis approach Multi-step regression fit –Step 1: Fit instances of each router component with post-layout instance counts a 1. Insts model + a 0 = Insts tool Step 2a: Fit area of each router component with post-layout area b 1. Insts R model + b 0 = Area tool Insts R model = a 1. Insts model + a 0 Step 2b: Fit power of each router component with post-layout power (leakage, internal, switching separately) {c 5, d 5, e 5 }. Insts R model XBAR + {c 4, d 4, e 4 }.Insts R model SWVC + {c 3, d 3, e 3 }.Insts R model InBUF + {c 2, d 2, e 2 }.Insts R model OutBUF + {c 1, d 1, e 1 }.Insts R model CLKCTRL + {c 0, d 0, e 0 } = {P leak tool,P int tool, P SW tool } 20
Related work Architecture templates –ORION2.0 Gate-level analytical models Parametric regression –Pre- and post-layout power estimation –RTL simulations Non-parametric regression –MARS NoC Modeling Regression model Parametric Non- parametric ORION_NEW + regression; flit-level Circuit model Arch templates Analytical Significant Departure: Relax the “template” mindset Control Tool 21
Results Avg. estimation error in # instances reduced from 109.5% to 8.8% –Avg. estimation error in area reduced to 9.8% –Avg estimation error in power reduced to 4.58% 22