Physical Synthesis Ing. Pullini Antonio
Delay of Interconnects Scaling Trends WLM and their limits Physical Synthesis Outline
Interconnects
past present future Wire resistance and metal migration force lower resistance and therefore ‘taller’ geometry Capacitance couples to neighbors Total capacitance does not get smaller!
Early Models Wire width feature size Older technology had wide wires More cross-section area implies less resistance and more capacitance. Model wire only with capacitance L H W
However... With scaling, width of wire reduced. Resistance of the wire no longer negligible. Lumped RC is good enough approximation. L H W
Interconnect Resistance Ohm’s Law: Resistance of wire proportional to wire length (L) and inversely proportional to cross-section(HW) resistivity is the property of the material. L H W
Interconnect Capacitance Capacitance of a wire = f (Shape, Distance to surrounding wires, Distance to the substrate ) Estimating Capacitance is a matter of determining where the field lines go. To get an accurate estimate electric field solvers (2D or 3D) are used. E.g. Fastcap or Rafael
Area Capacitance Dielectric Substrate Current W L H t di Electric Fields
Fringing Capacitance Conductor Fringing Fields H w + w W-H/2
Todays Interconnect
Timing Models Older Lumped RC no more valid At todays frequencies wavelenght comparable to wire length We need to consider interconnect as a distributed sistem
Elmore Delay Elmore analyzed the distributed model and came up with the figures for delay. V in V out R1R1 C1C1 R2R2 C2C2 12 R i-1 C i-1 i-1 RiRi CiCi i R N-1 C N-1 N-1 RNRN CNCN N
Wire Model Assume: Wire modeled by N equal-length segments For large values of N:
Ideal Scaling Ideal process scaling: –Device geometries shrink by = 0.7x) Device delay shrinks by –Wire geometries shrink by but h shrink less R/ : /(w .h) = r/ Cc/ : (h). /(S ) = cc/ C/ : (w ).(l ) tdi R/ rise, C/ scale, Cc/ rise SGD h w l S ll h SS ww
Interconnect Role Short interconnect –Used to connect nearby cells, R driver >> R interconnect –Minimize wire C, i.e., use short minwidth wires Medium to long-distance (“global”) interconnect –R driver R interconnect –Size wires to tradeoff area vs. delay –Increasing width Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem “Fat” wires –Thicker cross-sections in higher metal layers –Useful for reducing delays for global wires –Inductance issues, sharing of limited resource
Block Scaling Block area often stays same # cells, # nets doubles Global interconnect lengths don’t shrink Local interconnect lengths shrink
Microprocessor Interconnect Global Interconnect S Local = S Technology S Global = S Die Source: Intel
Interconnect Delay Scaling Delay of a wire of length l : – int = (rl)(cl) = rcl 2 (first order) Local interconnects : – int = (r/ )(c)(l ) 2 = rcl 2 – Local interconnect delay unchanged (compare to faster devices) Global interconnects : – int = (r/ )(c/ )(l) 2 = (rcl 2 )/ 2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers Interconnect delay increasingly more dominant
Delay with technology scaling
Old Style Design Flow
Problems WLMs are statistical – Inaccurate RTL is written without knowledge of physical hierarchy – Top level nets – Pins on physical blocks – Obstruction, blockages and power straps – RAM, Macro locations Constraints are estimated
More issues... Handoff – Synthesized netlist is thrown “over-the-wall” to the vendor/foundry, can take weeks to get results
Wire Load Models (WLM) With traditional flows, all nets with the same fanout have the same estimated interconnect delay during front-end design
Estimated vs. Real After placement, it is obvious that nets with the same fanout will not have the same interconnect delay
Placement Issues Place/Route – Congestion due to flat placement – Large number of ECOs perturbs design Good flat placement can be destroyed by a large ECO Convergence becomes a moving target
More problems with loops... Feedback to Synthesis – Huge SDF and set_load files Synthesis – On the critical path, if synthesis needs to “rip up” the path, new nets may be based on CWLM – Capacity and runtime…throwing a huge (full chip) problem to synthesis
Towards timing closure
Where is the problem? Timing after routing is only slightly different than after placement Unacuracy is between synthesis and placement
Possible solutions? Get Physical Information Earlier – Understand placement of macros, RAMs, power, etc., and better constraints for synthesis Hierarchical Logical/Physical Design – Can isolate timing problems more efficiently, ECOs can be done with minimal impact Timing Predictability and Analysis – Can understand impact of design tradeoffs Put Synthesis and Placement Together – Eliminates the manual iteration loop altogether
Need for Physical Synthesis
Outline: High End ASIC Design Reference point: “typical” design Use of hierarchy in design Handoff from logic design to physical design Physical tie-ins to synthesis Noise problems
Borderline “SOC” – Video Graphic Chip – Network interface/router chip – 0.18 u technology, layers Design – 5 large blocks, each with ~12 RAMS 5K pins 250K instances 5 global clocks, 200 derived/gated clocks 27K lines of timing constraints – set_output_delay clock Clk1 -rise -min -add_delay [get_ports {MemWriteBus[3]} – set_false_path -setup -rise -from [get_pins {GR_FE_STAGE1_CNTRL_MISC24BIT_REG_1_16A/CK}] Care abouts – Correctness – Timing convergence – On time delivery Typical ASIC Chip
ASIC Flow: Got Hierarchy? All chips have hard blocks in them Percent of design starts that are hierarchical increases yearly – methodology: insulates sub-projects – tool reasons: capacity – IP reasons: may not have control over some blocks high-end ASIC chips are “hierarchical” and have soft blocks that are placed independently
Partitioning Partition design so that top level logic modules = physical clusters Physical sub-blocks are flattened in floorplanning
Logic to Physical Flow (simplified) Design Planner Chip Finishing Chip Assembly Chip RTL Planning Synthesis1, Floorplan Generation, Chip Level Time Budgeting Block Implementation Synthesis2 & Placement Block routing Chip Integration Chip Timing Closure: Pins, Buffers, Global Routing Finalize Top Level Routing, Extraction, Address (or ignore) Signal Integrity Issues, LVS+DRC Synthesis/Place RTL GDSII
Black Box from Initial RTL model
Tying Logical to Physical Blocks
Getting More Physical
Realistic Timing Numbers after Block Level Design
Constraints Parasitics Key to Timing Closure: Time Budgeting Applied at many levels: chip, block, path... Start after entities are identified, not necessarily defined
Block Implementation: Core of Physical Synthesis Physical Synthesis – turn generic logic gates with no placement into – optimized gates with detailed placement Requires context from chip: – shape – obstructions – pin positions – timing constraints, etc
Physical Synthesis Flow RTL to Placed Netlist
Timing Effect of Physical Synthesis After Normal P&R After Normal P&R + Post Optimization After Wroute with PhysicalCompiler Placement After PhysicalCompiler Placement Positive Negative 700K gates 0.25u 10 ns 32 RAMs 700K gates 0.25u 10 ns 32 RAMs
Physical Synthesis Details
Timing Calculation Calculations use pin-to-pin Steiner Routing Each net is calculated individually No wireload models are used
Timing Calculation (Cont.)
Integration of Physical Synthesis: Linking with Test Path reordering can be chosen to reduce wiring congestion
Physical Synthesis & Power Analysis
Phy. Synthesis & Power Optimization Power Consumed = 2.4W Power Consumed = 1.2W - Gates can be resynthesized based on wire length - Gated clocks can be inserted by proximity
Assembly of Blocks into chip
Block B Block A I/O Block A Chip Level Timing Closure