Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physical Synthesis Ing. Pullini Antonio

Similar presentations


Presentation on theme: "Physical Synthesis Ing. Pullini Antonio"— Presentation transcript:

1 Physical Synthesis Ing. Pullini Antonio antonio.pullini@epfl.ch

2 Delay of Interconnects Scaling Trends WLM and their limits Physical Synthesis Outline

3 Interconnects

4 past present future Wire resistance and metal migration force lower resistance and therefore ‘taller’ geometry Capacitance couples to neighbors Total capacitance does not get smaller!

5 Early Models Wire width feature size Older technology had wide wires More cross-section area implies less resistance and more capacitance. Model wire only with capacitance L H W 

6 However... With scaling, width of wire reduced. Resistance of the wire no longer negligible. Lumped RC is good enough approximation. L H W 

7 Interconnect Resistance Ohm’s Law: Resistance of wire proportional to wire length (L) and inversely proportional to cross-section(HW)‏ resistivity is the property of the material. L H W

8 Interconnect Capacitance Capacitance of a wire = f (Shape, Distance to surrounding wires, Distance to the substrate )‏ Estimating Capacitance is a matter of determining where the field lines go. To get an accurate estimate electric field solvers (2D or 3D) are used. E.g. Fastcap or Rafael

9 Area Capacitance Dielectric Substrate Current W L H t di Electric Fields

10 Fringing Capacitance Conductor Fringing Fields  H w + w  W-H/2

11 Todays Interconnect

12 Timing Models Older Lumped RC no more valid At todays frequencies wavelenght comparable to wire length We need to consider interconnect as a distributed sistem

13 Elmore Delay Elmore analyzed the distributed model and came up with the figures for delay. V in V out R1R1 C1C1 R2R2 C2C2 12 R i-1 C i-1 i-1 RiRi CiCi i R N-1 C N-1 N-1 RNRN CNCN N

14 Wire Model Assume: Wire modeled by N equal-length segments For large values of N:

15 Ideal Scaling Ideal process scaling: –Device geometries shrink by  = 0.7x)‏ Device delay shrinks by  –Wire geometries shrink by  but h shrink less R/  :  /(w .h) = r/  Cc/  : (h).  /(S  ) = cc/  C/  :  (w  ).(l  )  tdi  ‏ R/  rise, C/  scale, Cc/  rise SGD h w l S ll h SS ww

16 Interconnect Role Short interconnect –Used to connect nearby cells, R driver >> R interconnect –Minimize wire C, i.e., use short minwidth wires Medium to long-distance (“global”) interconnect –R driver  R interconnect –Size wires to tradeoff area vs. delay –Increasing width  Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem “Fat” wires –Thicker cross-sections in higher metal layers –Useful for reducing delays for global wires –Inductance issues, sharing of limited resource

17 Block Scaling Block area often stays same # cells, # nets doubles Global interconnect lengths don’t shrink Local interconnect lengths shrink

18 Microprocessor Interconnect Global Interconnect S Local = S Technology S Global = S Die Source: Intel

19 Interconnect Delay Scaling Delay of a wire of length l : –  int = (rl)(cl) = rcl 2 (first order)‏ Local interconnects : –  int = (r/  )(c)(l  ) 2 = rcl 2  – Local interconnect delay unchanged (compare to faster devices)‏ Global interconnects : –  int = (r/  )(c/  )(l) 2 = (rcl 2 )/  2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers Interconnect delay increasingly more dominant

20 Delay with technology scaling

21 Old Style Design Flow

22 Problems WLMs are statistical – Inaccurate RTL is written without knowledge of physical hierarchy – Top level nets – Pins on physical blocks – Obstruction, blockages and power straps – RAM, Macro locations Constraints are estimated

23 More issues... Handoff – Synthesized netlist is thrown “over-the-wall” to the vendor/foundry, can take weeks to get results

24 Wire Load Models (WLM)‏ With traditional flows, all nets with the same fanout have the same estimated interconnect delay during front-end design

25 Estimated vs. Real After placement, it is obvious that nets with the same fanout will not have the same interconnect delay

26 Placement Issues Place/Route – Congestion due to flat placement – Large number of ECOs perturbs design Good flat placement can be destroyed by a large ECO Convergence becomes a moving target

27 More problems with loops... Feedback to Synthesis – Huge SDF and set_load files Synthesis – On the critical path, if synthesis needs to “rip up” the path, new nets may be based on CWLM – Capacity and runtime…throwing a huge (full chip) problem to synthesis

28 Towards timing closure

29 Where is the problem? Timing after routing is only slightly different than after placement Unacuracy is between synthesis and placement

30 Possible solutions? Get Physical Information Earlier – Understand placement of macros, RAMs, power, etc., and better constraints for synthesis Hierarchical Logical/Physical Design – Can isolate timing problems more efficiently, ECOs can be done with minimal impact Timing Predictability and Analysis – Can understand impact of design tradeoffs Put Synthesis and Placement Together – Eliminates the manual iteration loop altogether

31 Need for Physical Synthesis

32 Outline: High End ASIC Design Reference point: “typical” design Use of hierarchy in design Handoff from logic design to physical design Physical tie-ins to synthesis Noise problems

33 Borderline “SOC” – Video Graphic Chip – Network interface/router chip – 0.18 u technology, 6 - 8 layers Design – 5 large blocks, each with ~12 RAMS 5K pins 250K instances 5 global clocks, 200 derived/gated clocks 27K lines of timing constraints – set_output_delay 4000 -clock Clk1 -rise -min -add_delay [get_ports {MemWriteBus[3]} – set_false_path -setup -rise -from [get_pins {GR_FE_STAGE1_CNTRL_MISC24BIT_REG_1_16A/CK}] Care abouts – Correctness – Timing convergence – On time delivery Typical ASIC Chip

34 ASIC Flow: Got Hierarchy? All chips have hard blocks in them Percent of design starts that are hierarchical increases yearly – methodology: insulates sub-projects – tool reasons: capacity – IP reasons: may not have control over some blocks high-end ASIC chips are “hierarchical” and have soft blocks that are placed independently

35 Partitioning Partition design so that top level logic modules = physical clusters Physical sub-blocks are flattened in floorplanning

36 Logic to Physical Flow (simplified)‏ Design Planner Chip Finishing Chip Assembly Chip RTL Planning Synthesis1, Floorplan Generation, Chip Level Time Budgeting Block Implementation Synthesis2 & Placement Block routing Chip Integration Chip Timing Closure: Pins, Buffers, Global Routing Finalize Top Level Routing, Extraction, Address (or ignore) Signal Integrity Issues, LVS+DRC Synthesis/Place RTL GDSII

37 Black Box from Initial RTL model

38 Tying Logical to Physical Blocks

39 Getting More Physical

40 Realistic Timing Numbers after Block Level Design

41 Constraints Parasitics Key to Timing Closure: Time Budgeting Applied at many levels: chip, block, path... Start after entities are identified, not necessarily defined

42 Block Implementation: Core of Physical Synthesis Physical Synthesis – turn generic logic gates with no placement into – optimized gates with detailed placement Requires context from chip: – shape – obstructions – pin positions – timing constraints, etc

43 Physical Synthesis Flow RTL to Placed Netlist

44 Timing Effect of Physical Synthesis After Normal P&R After Normal P&R + Post Optimization After Wroute with PhysicalCompiler Placement After PhysicalCompiler Placement Positive Negative 700K gates 0.25u 10 ns 32 RAMs 700K gates 0.25u 10 ns 32 RAMs

45 Physical Synthesis Details

46 Timing Calculation Calculations use pin-to-pin Steiner Routing Each net is calculated individually No wireload models are used

47 Timing Calculation (Cont.)‏

48 Integration of Physical Synthesis: Linking with Test Path reordering can be chosen to reduce wiring congestion

49 Physical Synthesis & Power Analysis

50 Phy. Synthesis & Power Optimization Power Consumed = 2.4W Power Consumed = 1.2W - Gates can be resynthesized based on wire length - Gated clocks can be inserted by proximity

51 Assembly of Blocks into chip

52 Block B Block A I/O Block A Chip Level Timing Closure


Download ppt "Physical Synthesis Ing. Pullini Antonio"

Similar presentations


Ads by Google