Download presentation
Presentation is loading. Please wait.
1
Physical Synthesis Ing. Pullini Antonio antonio.pullini@epfl.ch
2
Delay of Interconnects Scaling Trends WLM and their limits Physical Synthesis Outline
3
Interconnects
4
past present future Wire resistance and metal migration force lower resistance and therefore ‘taller’ geometry Capacitance couples to neighbors Total capacitance does not get smaller!
5
Early Models Wire width feature size Older technology had wide wires More cross-section area implies less resistance and more capacitance. Model wire only with capacitance L H W
6
However... With scaling, width of wire reduced. Resistance of the wire no longer negligible. Lumped RC is good enough approximation. L H W
7
Interconnect Resistance Ohm’s Law: Resistance of wire proportional to wire length (L) and inversely proportional to cross-section(HW) resistivity is the property of the material. L H W
8
Interconnect Capacitance Capacitance of a wire = f (Shape, Distance to surrounding wires, Distance to the substrate ) Estimating Capacitance is a matter of determining where the field lines go. To get an accurate estimate electric field solvers (2D or 3D) are used. E.g. Fastcap or Rafael
9
Area Capacitance Dielectric Substrate Current W L H t di Electric Fields
10
Fringing Capacitance Conductor Fringing Fields H w + w W-H/2
11
Todays Interconnect
12
Timing Models Older Lumped RC no more valid At todays frequencies wavelenght comparable to wire length We need to consider interconnect as a distributed sistem
13
Elmore Delay Elmore analyzed the distributed model and came up with the figures for delay. V in V out R1R1 C1C1 R2R2 C2C2 12 R i-1 C i-1 i-1 RiRi CiCi i R N-1 C N-1 N-1 RNRN CNCN N
14
Wire Model Assume: Wire modeled by N equal-length segments For large values of N:
15
Ideal Scaling Ideal process scaling: –Device geometries shrink by = 0.7x) Device delay shrinks by –Wire geometries shrink by but h shrink less R/ : /(w .h) = r/ Cc/ : (h). /(S ) = cc/ C/ : (w ).(l ) tdi R/ rise, C/ scale, Cc/ rise SGD h w l S ll h SS ww
16
Interconnect Role Short interconnect –Used to connect nearby cells, R driver >> R interconnect –Minimize wire C, i.e., use short minwidth wires Medium to long-distance (“global”) interconnect –R driver R interconnect –Size wires to tradeoff area vs. delay –Increasing width Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem “Fat” wires –Thicker cross-sections in higher metal layers –Useful for reducing delays for global wires –Inductance issues, sharing of limited resource
17
Block Scaling Block area often stays same # cells, # nets doubles Global interconnect lengths don’t shrink Local interconnect lengths shrink
18
Microprocessor Interconnect Global Interconnect S Local = S Technology S Global = S Die Source: Intel
19
Interconnect Delay Scaling Delay of a wire of length l : – int = (rl)(cl) = rcl 2 (first order) Local interconnects : – int = (r/ )(c)(l ) 2 = rcl 2 – Local interconnect delay unchanged (compare to faster devices) Global interconnects : – int = (r/ )(c/ )(l) 2 = (rcl 2 )/ 2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers Interconnect delay increasingly more dominant
20
Delay with technology scaling
21
Old Style Design Flow
22
Problems WLMs are statistical – Inaccurate RTL is written without knowledge of physical hierarchy – Top level nets – Pins on physical blocks – Obstruction, blockages and power straps – RAM, Macro locations Constraints are estimated
23
More issues... Handoff – Synthesized netlist is thrown “over-the-wall” to the vendor/foundry, can take weeks to get results
24
Wire Load Models (WLM) With traditional flows, all nets with the same fanout have the same estimated interconnect delay during front-end design
25
Estimated vs. Real After placement, it is obvious that nets with the same fanout will not have the same interconnect delay
26
Placement Issues Place/Route – Congestion due to flat placement – Large number of ECOs perturbs design Good flat placement can be destroyed by a large ECO Convergence becomes a moving target
27
More problems with loops... Feedback to Synthesis – Huge SDF and set_load files Synthesis – On the critical path, if synthesis needs to “rip up” the path, new nets may be based on CWLM – Capacity and runtime…throwing a huge (full chip) problem to synthesis
28
Towards timing closure
29
Where is the problem? Timing after routing is only slightly different than after placement Unacuracy is between synthesis and placement
30
Possible solutions? Get Physical Information Earlier – Understand placement of macros, RAMs, power, etc., and better constraints for synthesis Hierarchical Logical/Physical Design – Can isolate timing problems more efficiently, ECOs can be done with minimal impact Timing Predictability and Analysis – Can understand impact of design tradeoffs Put Synthesis and Placement Together – Eliminates the manual iteration loop altogether
31
Need for Physical Synthesis
32
Outline: High End ASIC Design Reference point: “typical” design Use of hierarchy in design Handoff from logic design to physical design Physical tie-ins to synthesis Noise problems
33
Borderline “SOC” – Video Graphic Chip – Network interface/router chip – 0.18 u technology, 6 - 8 layers Design – 5 large blocks, each with ~12 RAMS 5K pins 250K instances 5 global clocks, 200 derived/gated clocks 27K lines of timing constraints – set_output_delay 4000 -clock Clk1 -rise -min -add_delay [get_ports {MemWriteBus[3]} – set_false_path -setup -rise -from [get_pins {GR_FE_STAGE1_CNTRL_MISC24BIT_REG_1_16A/CK}] Care abouts – Correctness – Timing convergence – On time delivery Typical ASIC Chip
34
ASIC Flow: Got Hierarchy? All chips have hard blocks in them Percent of design starts that are hierarchical increases yearly – methodology: insulates sub-projects – tool reasons: capacity – IP reasons: may not have control over some blocks high-end ASIC chips are “hierarchical” and have soft blocks that are placed independently
35
Partitioning Partition design so that top level logic modules = physical clusters Physical sub-blocks are flattened in floorplanning
36
Logic to Physical Flow (simplified) Design Planner Chip Finishing Chip Assembly Chip RTL Planning Synthesis1, Floorplan Generation, Chip Level Time Budgeting Block Implementation Synthesis2 & Placement Block routing Chip Integration Chip Timing Closure: Pins, Buffers, Global Routing Finalize Top Level Routing, Extraction, Address (or ignore) Signal Integrity Issues, LVS+DRC Synthesis/Place RTL GDSII
37
Black Box from Initial RTL model
38
Tying Logical to Physical Blocks
39
Getting More Physical
40
Realistic Timing Numbers after Block Level Design
41
Constraints Parasitics Key to Timing Closure: Time Budgeting Applied at many levels: chip, block, path... Start after entities are identified, not necessarily defined
42
Block Implementation: Core of Physical Synthesis Physical Synthesis – turn generic logic gates with no placement into – optimized gates with detailed placement Requires context from chip: – shape – obstructions – pin positions – timing constraints, etc
43
Physical Synthesis Flow RTL to Placed Netlist
44
Timing Effect of Physical Synthesis After Normal P&R After Normal P&R + Post Optimization After Wroute with PhysicalCompiler Placement After PhysicalCompiler Placement Positive Negative 700K gates 0.25u 10 ns 32 RAMs 700K gates 0.25u 10 ns 32 RAMs
45
Physical Synthesis Details
46
Timing Calculation Calculations use pin-to-pin Steiner Routing Each net is calculated individually No wireload models are used
47
Timing Calculation (Cont.)
48
Integration of Physical Synthesis: Linking with Test Path reordering can be chosen to reduce wiring congestion
49
Physical Synthesis & Power Analysis
50
Phy. Synthesis & Power Optimization Power Consumed = 2.4W Power Consumed = 1.2W - Gates can be resynthesized based on wire length - Gated clocks can be inserted by proximity
51
Assembly of Blocks into chip
52
Block B Block A I/O Block A Chip Level Timing Closure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.