Physical Synthesis Ing. Pullini Antonio

Slides:

Advertisements

Similar presentations

© 2013 IBM Corporation Use of Hierarchical Design Methodologies in Global Infrastructure of the POWER7+ Processor Brian Veraa Ryan Nett.

Advertisements

EE141 © Digital Integrated Circuits 2nd Wires 1 The Wires Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital Integrated Circuits: A Design.

Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.

Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.

Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.

EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.

Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.

Clock Design Adopted from David Harris of Harvey Mudd College.

ECE 124a/256c VLSI RC(L) Interconnect Models Forrest Brewer Wayne Burleson, Atul Maheshwari.

The Design Process Outline Goal Reading Design Domain Design Flow

Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.

Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by  = 0.7x) Device delay shrinks by  –Wire geometries.

EE4271 VLSI Design Interconnect Optimizations Buffer Insertion.

Interconnect Optimizations. A scaling primer Ideal process scaling: –Device geometries shrink by S  = 0.7x) Device delay shrinks by s –Wire geometries.

04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!

On the Relevance of Wire Load Models Kenneth D. Boese, Cadence Design Systems, San Jose Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Stefanus Mantik,

The Wire Scaling has seen wire delays become a major concern whereas in previous technology nodes they were not even a secondary design issue. Wire parasitic.

A Timing-Driven Soft-Macro Resynthesis Method in Interaction with Chip Floorplanning Hsiao-Pin Su 1 2 Allen C.-H. Wu 1 Youn-Long Lin 1 1 Department of.

S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 14: Interconnects Prof. Sherief Reda Division of Engineering, Brown University.

04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)

Interconnect Optimizations

From Compaq, ASP- DAC00. Power Consumption Power consumption is on the rise due to: - Higher integration levels (more devices & wires) - Rising clock.

Lecture #25a OUTLINE Interconnect modeling

On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.

University of Toronto Pre-Layout Estimation of Individual Wire Lengths Srinivas Bodapati (Univ. of Illinois) Farid N. Najm (Univ. of Toronto)

Signal Integrity Methodology on 300 MHz SoC using ALF libraries and tools Wolfgang Roethig, Ramakrishna Nibhanupudi, Arun Balakrishnan, Gopal Dandu Steven.

Hierarchical Physical Design Methodology for Multi-Million Gate Chips Session 11 Wei-Jin Dai.

ECE 424 – Introduction to VLSI Design

2013 DAC Designer/User Track Presentation Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor Visvesh Sathe 3, Padelis Papadopoulos.

A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.

DSM Design and Verification Flow

TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.

EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 7 Programmable.

Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.

EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.

Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.

ECO Methodology for Very High Frequency Microprocessor Sumit Goswami, Srivatsa Srinath, Anoop V, Ravi Sekhar Intel Technology, Bangalore, India Introduction.

CAD for Physical Design of VLSI Circuits

ASIC Design Flow – An Overview Ing. Pullini Antonio

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.

Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.

EE141 © Digital Integrated Circuits 2nd Wires 1 Digital Integrated Circuits A Design Perspective The Interconnect Jan M. Rabaey Anantha Chandrakasan Borivoje.

EE415 VLSI Design 1 The Wire [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

ASIC, Customer-Owned Tooling, and Processor Design Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths That Lead EDA Astray.

Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.

ECE 260B – CSE 241A /UCB EECS Kahng/Keutzer/Newton Physical Design Flow Read Netlist Initial Placement Placement Improvement Cost Estimation Routing.

CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.

Dec 1, 2003 Slide 1 Copyright, © Zenasis Technologies, Inc. Flex-Cell Optimization A Paradigm Shift in High-Performance Cell-Based Design A.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

1 Interconnect/Via. 2 Delay of Devices and Interconnect.

DEVICES AND DESIGN : ASIC. DEFINITION Any IC other than a general purpose IC which contains the functionality of thousands of gates is usually called.

Introduction to Clock Tree Synthesis

Interconnect/Via.

Chapter 4: Secs ; Chapter 5: pp

EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.

RTL Design Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout manual design a b s q 0 1 d clk.

-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.

COE 360 Principles of VLSI Design Delay. 2 Definitions.

Introduction to ASICs ASIC - Application Specific Integrated Circuit

ASIC Design Methodology

The Interconnect Delay Bottleneck.

332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew

Wire Indctance Consequences of on-chip inductance include:

Computer Evolution and Performance

Presentation transcript:

Physical Synthesis Ing. Pullini Antonio

Delay of Interconnects Scaling Trends WLM and their limits Physical Synthesis Outline

Interconnects

past present future Wire resistance and metal migration force lower resistance and therefore ‘taller’ geometry Capacitance couples to neighbors Total capacitance does not get smaller!

Early Models Wire width feature size Older technology had wide wires More cross-section area implies less resistance and more capacitance. Model wire only with capacitance L H W 

However... With scaling, width of wire reduced. Resistance of the wire no longer negligible. Lumped RC is good enough approximation. L H W 

Interconnect Resistance Ohm’s Law: Resistance of wire proportional to wire length (L) and inversely proportional to cross-section(HW)‏ resistivity is the property of the material. L H W

Interconnect Capacitance Capacitance of a wire = f (Shape, Distance to surrounding wires, Distance to the substrate )‏ Estimating Capacitance is a matter of determining where the field lines go. To get an accurate estimate electric field solvers (2D or 3D) are used. E.g. Fastcap or Rafael

Area Capacitance Dielectric Substrate Current W L H t di Electric Fields

Fringing Capacitance Conductor Fringing Fields  H w + w  W-H/2

Todays Interconnect

Timing Models Older Lumped RC no more valid At todays frequencies wavelenght comparable to wire length We need to consider interconnect as a distributed sistem

Elmore Delay Elmore analyzed the distributed model and came up with the figures for delay. V in V out R1R1 C1C1 R2R2 C2C2 12 R i-1 C i-1 i-1 RiRi CiCi i R N-1 C N-1 N-1 RNRN CNCN N

Wire Model Assume: Wire modeled by N equal-length segments For large values of N:

Ideal Scaling Ideal process scaling: –Device geometries shrink by  = 0.7x)‏ Device delay shrinks by  –Wire geometries shrink by  but h shrink less R/  :  /(w .h) = r/  Cc/  : (h).  /(S  ) = cc/  C/  :  (w  ).(l  )  tdi  ‏ R/  rise, C/  scale, Cc/  rise SGD h w l S ll h SS ww

Interconnect Role Short interconnect –Used to connect nearby cells, R driver >> R interconnect –Minimize wire C, i.e., use short minwidth wires Medium to long-distance (“global”) interconnect –R driver  R interconnect –Size wires to tradeoff area vs. delay –Increasing width  Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem “Fat” wires –Thicker cross-sections in higher metal layers –Useful for reducing delays for global wires –Inductance issues, sharing of limited resource

Block Scaling Block area often stays same # cells, # nets doubles Global interconnect lengths don’t shrink Local interconnect lengths shrink

Microprocessor Interconnect Global Interconnect S Local = S Technology S Global = S Die Source: Intel

Interconnect Delay Scaling Delay of a wire of length l : –  int = (rl)(cl) = rcl 2 (first order)‏ Local interconnects : –  int = (r/  )(c)(l  ) 2 = rcl 2  – Local interconnect delay unchanged (compare to faster devices)‏ Global interconnects : –  int = (r/  )(c/  )(l) 2 = (rcl 2 )/  2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers Interconnect delay increasingly more dominant

Delay with technology scaling

Old Style Design Flow

Problems WLMs are statistical – Inaccurate RTL is written without knowledge of physical hierarchy – Top level nets – Pins on physical blocks – Obstruction, blockages and power straps – RAM, Macro locations Constraints are estimated

More issues... Handoff – Synthesized netlist is thrown “over-the-wall” to the vendor/foundry, can take weeks to get results

Wire Load Models (WLM)‏ With traditional flows, all nets with the same fanout have the same estimated interconnect delay during front-end design

Estimated vs. Real After placement, it is obvious that nets with the same fanout will not have the same interconnect delay

Placement Issues Place/Route – Congestion due to flat placement – Large number of ECOs perturbs design Good flat placement can be destroyed by a large ECO Convergence becomes a moving target

More problems with loops... Feedback to Synthesis – Huge SDF and set_load files Synthesis – On the critical path, if synthesis needs to “rip up” the path, new nets may be based on CWLM – Capacity and runtime…throwing a huge (full chip) problem to synthesis

Towards timing closure

Where is the problem? Timing after routing is only slightly different than after placement Unacuracy is between synthesis and placement

Possible solutions? Get Physical Information Earlier – Understand placement of macros, RAMs, power, etc., and better constraints for synthesis Hierarchical Logical/Physical Design – Can isolate timing problems more efficiently, ECOs can be done with minimal impact Timing Predictability and Analysis – Can understand impact of design tradeoffs Put Synthesis and Placement Together – Eliminates the manual iteration loop altogether

Need for Physical Synthesis

Outline: High End ASIC Design Reference point: “typical” design Use of hierarchy in design Handoff from logic design to physical design Physical tie-ins to synthesis Noise problems

Borderline “SOC” – Video Graphic Chip – Network interface/router chip – 0.18 u technology, layers Design – 5 large blocks, each with ~12 RAMS 5K pins 250K instances 5 global clocks, 200 derived/gated clocks 27K lines of timing constraints – set_output_delay clock Clk1 -rise -min -add_delay [get_ports {MemWriteBus[3]} – set_false_path -setup -rise -from [get_pins {GR_FE_STAGE1_CNTRL_MISC24BIT_REG_1_16A/CK}] Care abouts – Correctness – Timing convergence – On time delivery Typical ASIC Chip

ASIC Flow: Got Hierarchy? All chips have hard blocks in them Percent of design starts that are hierarchical increases yearly – methodology: insulates sub-projects – tool reasons: capacity – IP reasons: may not have control over some blocks high-end ASIC chips are “hierarchical” and have soft blocks that are placed independently

Partitioning Partition design so that top level logic modules = physical clusters Physical sub-blocks are flattened in floorplanning

Logic to Physical Flow (simplified)‏ Design Planner Chip Finishing Chip Assembly Chip RTL Planning Synthesis1, Floorplan Generation, Chip Level Time Budgeting Block Implementation Synthesis2 & Placement Block routing Chip Integration Chip Timing Closure: Pins, Buffers, Global Routing Finalize Top Level Routing, Extraction, Address (or ignore) Signal Integrity Issues, LVS+DRC Synthesis/Place RTL GDSII

Black Box from Initial RTL model

Tying Logical to Physical Blocks

Getting More Physical

Realistic Timing Numbers after Block Level Design

Constraints Parasitics Key to Timing Closure: Time Budgeting Applied at many levels: chip, block, path... Start after entities are identified, not necessarily defined

Block Implementation: Core of Physical Synthesis Physical Synthesis – turn generic logic gates with no placement into – optimized gates with detailed placement Requires context from chip: – shape – obstructions – pin positions – timing constraints, etc

Physical Synthesis Flow RTL to Placed Netlist

Timing Effect of Physical Synthesis After Normal P&R After Normal P&R + Post Optimization After Wroute with PhysicalCompiler Placement After PhysicalCompiler Placement Positive Negative 700K gates 0.25u 10 ns 32 RAMs 700K gates 0.25u 10 ns 32 RAMs

Physical Synthesis Details

Timing Calculation Calculations use pin-to-pin Steiner Routing Each net is calculated individually No wireload models are used

Timing Calculation (Cont.)‏

Integration of Physical Synthesis: Linking with Test Path reordering can be chosen to reduce wiring congestion

Physical Synthesis & Power Analysis

Phy. Synthesis & Power Optimization Power Consumed = 2.4W Power Consumed = 1.2W - Gates can be resynthesized based on wire length - Gated clocks can be inserted by proximity

Assembly of Blocks into chip

Block B Block A I/O Block A Chip Level Timing Closure