Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +*, Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments.

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

EELE 367 – Logic Design Module 4 – Combinational Logic Design with VHDL Agenda 1.Decoders/Encoders 2.Multiplexers/Demultiplexers 3.Tri-State Buffers 4.Comparators.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Improved On-Chip Analytical Power and Area Modeling Andrew B. Kahng Bill Lin Kambiz Samadi University of California, San Diego January 20, 2010.
Fast FPGA Resource Estimation Paul Schumacher & Pradip Jha Xilinx, Inc.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
CSE241 Formal Verification.1Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 6: Formal Verification.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration Andrew B. Kahng ¶ Bin Li ‡ Li-Shiuan Peh ‡ Kambiz Samadi.
Power-Aware Placement
Communication Modeling for System-Level Design Andrew B. Kahng #,* Kambiz Samadi * CSE # and ECE * Departments,
On the Relevance of Wire Load Models Kenneth D. Boese, Cadence Design Systems, San Jose Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Stefanus Mantik,
Pass Transistor Logic Cell Library Group Members: Keith Benson Kofi Inkabi Ashley Nozine.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
On Modeling and Sensitivity of Via Count in SOC Physical Implementation Kwangok Jeong Andrew B. Kahng.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
1 32-bit parallel load register with clock gating ECE Department, 200 Broun Hall, Auburn University, Auburn, AL 36849, USA Lan Luo ELEC.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Selective Gate-Length Biasing for Cost-Effective Runtime Leakage Control Puneet Gupta 1 Andrew B. Kahng 1 Puneet Sharma 1 Dennis Sylvester 2 1 ECE Department,
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Using VHDL VHDL used for Simulation Synthesis.
Signal Integrity Methodology on 300 MHz SoC using ALF libraries and tools Wolfgang Roethig, Ramakrishna Nibhanupudi, Arun Balakrishnan, Gopal Dandu Steven.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
Supply Voltage Biasing in Synopsys Andy Whetzel University of Virginia 1.
Andrew B. Kahng‡†, Mulong Luo†, Siddhartha Nath†
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
José Vicente Escamilla José Flich Pedro Javier García 1.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
ECE 545 Project 1 Part IV Key Scheduling Final Integration List of Deliverables.
ECO Methodology for Very High Frequency Microprocessor Sumit Goswami, Srivatsa Srinath, Anoop V, Ravi Sekhar Intel Technology, Bangalore, India Introduction.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
ICCD Conversion Driven Design of Binary to Mixed Radix Circuits Ashur Rafiev, Julian Murphy, Danil Sokolov, Alex Yakovlev School of EECE, Newcastle.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
1/8/ L20 Project Step 8 - Data Path Copyright Joanne DeGroat, ECE, OSU1 State Machine Design with an HDL A methodology that works for documenting.
This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
-1- UC San Diego / VLSI CAD Laboratory High-Dimensional Metamodeling for Prediction of Clock Tree Synthesis Outcomes Andrew B. Kahng, Bill Lin and Siddhartha.
ECE 545 Project 2 Specification Part I. Adjust your synthesizable code for Project 1 in such a way that it complies with the following requirements: a.
Module 1.2 Introduction to Verilog
Slide 1 2. Verilog Elements. Slide 2 Why (V)HDL? (VHDL, Verilog etc.), Karen Parnell, Nick Mehta, “Programmable Logic Design Quick Start Handbook”, Xilinx.
Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics Puneet Gupta, University of California, Los Angeles Andrew B. Kahng, University of California,
Northeastern U N I V E R S I T Y 1 Design and Test of Fault Tolerant Quantum Dot Cellular Automata Electrical and Computer Department.
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
2011/IX/27SEU protection insertion in Verilog for the ABCN project 1 Filipe Sousa Francis Anghinolfi.
1 CS 151: Digital Design Chapter 3: Combinational Logic Design 3-1Design Procedure CS 151: Digital Design.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
FEV And Netlists Erik Seligman CS 510, Lecture 5, January 2009.
Equivalence checking Prof Shobha Vasudevan ECE 598SV.
ASIC Design Methodology
On the Relevance of Wire Load Models
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Revisiting and Bounding the Benefit From 3D Integration
332:437 Lecture 7 Verilog Hardware Description Language Basics
Chapter 3 – Combinational Logic Design
332:437 Lecture 7 Verilog Hardware Description Language Basics
332:437 Lecture 7 Verilog Hardware Description Language Basics
Low Power Digital Design
Measuring the Gap between FPGAs and ASICs
Presentation transcript:

Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +*, Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments {abk, billlin,

Outline Motivation Our work: Overview Methodology Flit-level power estimation Summary 2

NoC Modeling So Far… (ORION) 3 Arbiter XBAR BUF I BUFE BUFW BUFN BUFSLink SRC Link SINK ORION1.0 (2002) 6NOR + 2INV + DFF ORION2.0 (2009) 6NOR + 2INV + DFF Leakage power Clock power

What Is The Problem? RTL code mismatch Logic transformation and technology mapping mismatch 4 Arbiter XBAR BUF I BUFE BUFW BUFN BUFSLink SRC Link SINK 6NOR + 2INV + DFF

How Bad Is It? Router RTL generators: Netmaker – Cambridge, UK Stanford NoC - Stanford 460% 89% Why such large errors? Assumed logic template inaccurate Control logic not modeled Implementation details missing 5

Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 6

P - #Ports V - #VCs B - #BUFs F – Flit-width  Key idea: No assumed logic template  Component models derived from actual RTL synthesized with cell libraries We Propose: Step 1 Derive router component block parametric models from post-synthesis netlists PVBF# Instances ~P 2 PVBF# Instances ~F XBAR ~ P 2 F 7

We Propose: Step 2 Automatic fitting of models with post-P&R power and area 8 XBAR ~ P 2 F PVBFArea LSQR XBAR area = a 1. P 2 F + a 0  Key idea: Capture implementation details using automatic regression fit  Characterization performed only once and usable for multiple design space explorations

Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 9

Model Development Two RTL generators: –Netmaker (Cambridge, UK) –Stanford NoC SP&R tools: –Cadence RC & Synopsys DC for hierarchical synthesis to analyze each block –Cadence SOC Encounter for P&R NoC router RTL generators Impl params: Clock Frequency µArch params: P, V, B, F Synthesis and P&R: DC/RC, SOCE Analysis of blocks: XBAR, SW & VC arbiter, Input & Output buffers New models for each component block ComponentModel XBARP2FP2F SWVC9(P 2 V 2 + P 2 + PV – P) InBUF180PV + 2PVBF + 2P 2 VB + 3PVB + 5P 2 B + P 2 + PF + 15P OutBUF25P + 80PV CLKCTRL0.02(SWVC + InBUF + OutBUF) 10

Overall Methodology Manual –Quick and easy –Misses implementation details BasicRegression fit Manual Estimates for gate count ORION_NEW models LSQR Technology Library Cell area Cell leakage Pin cap. Internal energy Area Power: leakage, internal, switching Post P&R data per block Std. cell count & area Leakage power Internal power Switching power LSQR –Accurate (captures implementation details) –One-time overhead (generation of P&R training data points) 11

POWER 6.5x reduction Results: Area And Power 12 AREA 4x reduction Methodology scales across technologies, router RTL generators

Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 13

Flit-level Power Estimation Dynamic power estimation using flit-level bit encodings Have integrated with full-system NoC simulator (GARNET) Post-P&R router netlist Testbench Gate-level simulation VCD Power analysis Power Report Regression fit ORION_NEW models Flit-level power model GARNET gem5 Flit-level power estimates 14

Results: Flit-level Power Accurate estimation of flit-level dynamic power x reduction

Motivation Our work: Overview Methodology Flit-level power estimation Summary Outline 16

Summary New hybrid modeling methodology: relax the template mindset –Explicitly models control and data signals –Captures RTL and implementation details Using proposed parametric regression methodology, worst-case estimation errors reduced by a factor of –6.5x from ORION2.0 for power –4x from ORION2.0 for area We propose an application of our methodology for flit-level dynamic power modeling and integration with GARNET –3.6x worst-case error reduction in dynamic power estimation Ongoing: Non-parametric modeling of post-P&R power and area 17

Thank You ! 18

Back up 19

Regression analysis approach Multi-step regression fit –Step 1: Fit instances of each router component with post-layout instance counts a 1. Insts model + a 0 = Insts tool  Step 2a: Fit area of each router component with post-layout area b 1. Insts R model + b 0 = Area tool Insts R model = a 1. Insts model + a 0  Step 2b: Fit power of each router component with post-layout power (leakage, internal, switching separately) {c 5, d 5, e 5 }. Insts R model XBAR + {c 4, d 4, e 4 }.Insts R model SWVC + {c 3, d 3, e 3 }.Insts R model InBUF + {c 2, d 2, e 2 }.Insts R model OutBUF + {c 1, d 1, e 1 }.Insts R model CLKCTRL + {c 0, d 0, e 0 } = {P leak tool,P int tool, P SW tool } 20

Related work Architecture templates –ORION2.0 Gate-level analytical models Parametric regression –Pre- and post-layout power estimation –RTL simulations Non-parametric regression –MARS NoC Modeling Regression model Parametric Non- parametric ORION_NEW + regression; flit-level Circuit model Arch templates Analytical Significant Departure: Relax the “template” mindset Control Tool 21

Results Avg. estimation error in # instances reduced from 109.5% to 8.8% –Avg. estimation error in area reduced to 9.8% –Avg estimation error in power reduced to 4.58% 22