Lecture 21: Packaging, Power, & Clock

Slides:



Advertisements
Similar presentations
Packaging.
Advertisements

Topics Electrical properties of static combinational gates:
Digital Integrated Circuits A Design Perspective
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,
Lecture 21: Packaging, Power, & Clock
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 Complex Digital Circuits Design Lecture 2: Timing Issues; [Adapted from Rabaey’s Digital Integrated.
Digital Integrated Circuits A Design Perspective
EE466: VLSI Design Lecture 11: Wires
EE 447 VLSI Design Lecture 5: Wires. EE 447VLSI Design 6: Wires2 Outline Introduction Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering.
Introduction to CMOS VLSI Design Lecture 20: Package, Power, and I/O
Digital Integrated Circuits A Design Perspective
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
Lecture 8: Clock Distribution, PLL & DLL
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 15: Interconnects & Wire Engineering Prof. Sherief Reda Division of Engineering,
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 22: Material Review Prof. Sherief Reda Division of Engineering, Brown University.
Interconnessioni e parassiti1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Interconnessioni e parassiti.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Temporizzazioni e sincronismo1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Temporizzazioni e sincronizzazione.
Putting it all together— Chip Level Issues
IC packaging and Input - output signals
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 13.1 EE4800 CMOS Digital IC Design & Analysis Lecture 13 Packaging, Power and Clock Distributions.
Modern VLSI Design 4e: Chapter 7 Copyright  2008 Wayne Wolf Topics Global interconnect. Power/ground routing. Clock routing. Floorplanning tips. Off-chip.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
© Digital Integrated Circuits 2nd Interconnect Impact of Interconnect Parasitics Reduce Robustness Affect Performance Increase delay Increase power dissipation.
Modern VLSI Design 2e: Chapter 3 Copyright  1998 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
ECE 124a/256c Advanced VLSI Design Forrest Brewer.
Introduction to CMOS VLSI Design Lecture 25: Package, Power, Clock, and I/O David Harris Harvey Mudd College Spring 2007.
Design Economics. IC designer should able to predict the cost and the time to design a particular IC. This guides the choice of implementation strategy.
ELEN654 ATMATM l Signal Integrity Issues » Capacitive Coupling, Resistance, Inductance » Cross talk l Routability design, Coding, and other Design Measures.
© Digital Integrated Circuits 2nd Interconnect Digital Integrated Circuits A Design Perspective Coping with Interconnect Jan M. Rabaey Anantha Chandrakasan.
© Digital Integrated Circuits 2nd Interconnect ECE 558/658 : Lecture 20 Interconnect Design (Chapter 9) Clock distribution (Chapter ) Atul Maheshwari.
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Pseudo-nMOS gates. n DCVS logic. n Domino gates. n Design-for-yield. n Gates as IP.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 33: November 20, 2013 Crosstalk.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Lecture 14: Wires. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 14: Wires2 Outline  Introduction  Interconnect Modeling –Wire Resistance –Wire Capacitance.
Modern VLSI Design 3e: Chapter 7 Copyright  1998, 2002 Prentice Hall PTR Topics n Power/ground routing. n Clock routing. n Floorplanning tips. n Off-chip.
Introduction to Clock Tree Synthesis
Interconnect/Via.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 19: Timing Issues; Introduction to Datapath.
MICROPROCESSOR DESIGN1 IR/Inductive Drop Introduction One component of every chip is the network of wires used to distribute power from the input power.
High Speed Properties of Digital Gates, Copyright F. Canavero, R. Fantino Licensed to HDT - High Design Technology
Power Distribution Copyright F. Canavero, R. Fantino Licensed to HDT - High Design Technology.
IC packaging and Input - output signals
Topics Off-chip connections..
Digital Integrated Circuits A Design Perspective
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Lecture 7: Power.
COPING WITH INTERCONNECT
Lecture 7: Power.
Presentation transcript:

Lecture 21: Packaging, Power, & Clock

Outline Packaging Power Distribution Clock Distribution 21: Package, Power, and Clock

Packages Package functions Electrical connection of signals and power from chip to board Little delay or distortion Mechanical connection of chip to board Removes heat produced on chip Protects chip from mechanical damage Compatible with thermal expansion Inexpensive to manufacture and test 21: Package, Power, and Clock

Package Types Through-hole vs. surface mount 21: Package, Power, and Clock

Chip-to-Package Bonding Traditionally, chip is surrounded by pad frame Metal pads on 100 – 200 mm pitch Gold bond wires attach pads to package Lead frame distributes signals in package Metal heat spreader helps with cooling 21: Package, Power, and Clock

Advanced Packages Bond wires contribute parasitic inductance Fancy packages have many signal, power layers Like tiny printed circuit boards Flip-chip places connections across surface of die rather than around periphery Top level metal pads covered with solder balls Chip flips upside down Carefully aligned to package (done blind!) Heated to melt balls Also called C4 (Controlled Collapse Chip Connection) 21: Package, Power, and Clock

LGA Package 1 1366 gold-plated pads 21: Package, Power, and Clock

Package Parasitics Use many VDD, GND in parallel Inductance, IDD 21: Package, Power, and Clock

Heat Dissipation 60 W light bulb has surface area of 120 cm2 Itanium 2 die dissipates 130 W over 4 cm2 Chips have enormous power densities Cooling is a serious challenge Package spreads heat to larger surface area Heat sinks may increase surface area further Fans increase airflow rate over surface area Liquid cooling used in extreme cases ($$$) 21: Package, Power, and Clock

Thermal Resistance DT = qjaP DT: temperature rise on chip qja: thermal resistance of chip junction to ambient P: power dissipation on chip Thermal resistances combine like resistors Series and parallel qja = qjp + qpa Series combination 21: Package, Power, and Clock

Example Your chip has a heat sink with a thermal resistance to the package of 4.0° C/W. The resistance from chip to package is 1° C/W. The system box ambient temperature may reach 55° C. The chip temperature must not exceed 100° C. What is the maximum chip power dissipation? (100-55 C) / (4 + 1 C/W) = 9 W 21: Package, Power, and Clock

Temperature Sensor Monitor die temperature and throttle performance if it gets too hot Use a pair of pnp bipolar transistors Vertical pnp available in CMOS Voltage difference is proportional to absolute temp Measure with on-chip A/D converter 21: Package, Power, and Clock

Power Distribution Power Distribution Network functions Carry current from pads to transistors on chip Maintain stable voltage with low noise Provide average and peak power demands Provide current return paths for signals Avoid electromigration & self-heating wearout Consume little chip area and wire Easy to lay out 21: Package, Power, and Clock

Power Requirements VDD = VDDnominal – Vdroop Want Vdroop < +/- 10% of VDD Sources of Vdroop IR drops L di/dt noise IDD changes on many time scales 21: Package, Power, and Clock

IR Drop A chip draws 24 W from a 1.2 V supply. The power supply impedance is 5 mW. What is the IR drop? IDD = 24 W / 1.2 V = 20 A IR drop = (20 A)(5 mW) = 100 mV 21: Package, Power, and Clock

IR Introduced Noise 21: Package, Power, and Clock

Power Distribution 21: Package, Power, and Clock

Power Distribution Low level distribution is in metal 1. Power has to be strapped in higher layers of metal. The spacing is set by IR drop, electromigration, and inductive effects. Always use multiple contacts on straps. 21: Package, Power, and Clock

Power and Ground Distribution 21: Package, Power, and Clock

3 Metal Layers (EV4) 21: Package, Power, and Clock

4 Metal Layers (EV5) 21: Package, Power, and Clock

6 Metal Layers (EV6) 21: Package, Power, and Clock

Power Supply Droop 21: Package, Power, and Clock

L di/dt Noise 21: Package, Power, and Clock

L di/dt Noise A 1.2 V chip switches from an idle mode consuming 5W to a full-power mode consuming 53 W. The transition takes 10 clock cycles at 1 GHz. The supply inductance is 0.1 nH. What is the L di/dt droop? DI = (53 W – 5 W)/(1.2 V) = 40 A Dt = 10 cycles * (1 ns / cycle) = 10 ns L di/dt droop = (0.1 nH) * (40 A / 10 ns) = 0.4 V 21: Package, Power, and Clock

Dealing with L di/dt Separate power pins for I/O pads and chip core. Multiple power and ground pins. Careful selection of positions of power and ground pins on package. Increase rise and fall times as much as possible. Schedule current consuming transitions. Use advanced packaging technologies. Use decoupling capacitances on the board. Use decoupling capacitances on chip. 21: Package, Power, and Clock

Choosing the Right Pin 21: Package, Power, and Clock

Decoupling Capacitance 21: Package, Power, and Clock

Bypass Capacitors Need low supply impedance at all frequencies Ideal capacitors have impedance decreasing with w Real capacitors have parasitic R and L Leads to resonant frequency of capacitor 21: Package, Power, and Clock

De-coupling Capacitor Ratios EV4 total effective switching capacitance = 12.5nF 128nF of de-coupling capacitance de-coupling/switching capacitance ~ 10x EV5 13.9nF of switching capacitance 160nF of de-coupling capacitance EV6 34nF of effective switching capacitance 320nF of de-coupling capacitance -- not enough! Source: B. Herrick (Compaq)

EV6 De-coupling Capacitance Design for Idd= 25 A @ Vdd = 2.2 V, f = 600 MHz 0.32-µF of on-chip de-coupling capacitance was added Under major busses and around major gridded clock drivers Occupies 15-20% of die area 1-µF 2-cm2 Wirebond Attached Chip Capacitor (WACC) significantly increases “Near-Chip” de-coupling 160 Vdd/Vss bondwire pairs on the WACC minimize inductance Source: B. Herrick (Compaq)

EV6 WACC Source: B. Herrick (Compaq)

Power System Model Power comes from regulator on system board Board and package add parasitic R and L Bypass capacitors help stabilize supply voltage But capacitors also have parasitic R and L Simulate system for time and frequency responses 21: Package, Power, and Clock

Frequency Response Multiple capacitors in parallel Large capacitor near regulator has low impedance at low frequencies But also has a low self-resonant frequency Small capacitors near chip and on chip have low impedance at high frequencies Choose caps to get low impedance at all frequencies 21: Package, Power, and Clock

Example: Pentium 4 Power supply impedance for Pentium 4 Spike near 100 MHz caused by package L Step response to sudden supply current chain 1st droop: on-chip bypass caps 2nd droop: package capacitance 3rd droop: board capacitance [Xu08] [Wong06] 21: Package, Power, and Clock

Distributed Model 21: Package, Power, and Clock

Charge Pumps Sometimes a different supply voltage is needed but little current is required 20 V for Flash memory programming Negative body bias for leakage control during sleep Generate the voltage on-chip with a charge pump 21: Package, Power, and Clock

Energy Scavenging Ultra-low power systems can scavenge their energy from the environment rather than needing batteries Solar calculator (solar cells) RFID tags (antenna) Tire pressure monitors powered by vibrational energy of tires (piezoelectric generator) Thin film microbatteries deposited on the chip can store energy for times of peak demand 21: Package, Power, and Clock

Capacitive Cross Talk

Capacitive Cross Talk Dynamic Node DD CLK C XY Y C Y In 1 X In PDN 2 In 2.5 V 3 0 V CLK 3 x 1 mm overlap: 0.19 V disturbance

Capacitive Cross Talk Driven Node 0.5 0.45 0.4 tr↑ X 0.35 C R XY Y 0.3 V X Y tXY = RY(CXY+CY) 0.25 C Y 0.2 V (Volt) 0.15 0.1 0.05 0.2 0.4 0.6 0.8 1 t (nsec) Keep time-constant smaller than rise time

Dealing with Capacitive Cross Talk Avoid floating nodes Protect sensitive nodes Make rise and fall times as large as possible Differential signaling Do not run wires together for a long distance Use shielding wires Use shielding layers

Shielding Shielding wire GND Shielding V layer GND Substrate ( GND ) DD layer GND Substrate ( GND )

Cross Talk and Performance - When neighboring lines switch in opposite direction of victim line, delay increases DELAY DEPENDENT UPON ACTIVITY IN NEIGHBORING WIRES Cc Miller Effect - Both terminals of capacitor are switched in opposite directions (0  Vdd, Vdd  0) - Effective voltage is doubled and additional charge is needed (from Q=CV)

Impact of Cross Talk on Delay r is ratio between capacitance to GND and to neighbor

Dealing with Cross-Talk Evaluate and improve Constructive layout generation Predictable structures Avoid worst case patterns 21: Package, Power, and Clock

Structured Predictable Interconnect Example: Dense Wire Fabric ([Sunil Kathri]) Trade-off: Cross-coupling capacitance 40x lower, 2% delay variation Increase in area and overall capacitance Also: FPGAs, VPGAs

Clock Distribution On a small chip, the clock distribution network is just a wire And possibly an inverter for clkb On practical chips, the RC delay of the wire resistance and gate load is very long Variations in this delay cause clock to get to different elements at different times This is called clock skew Most chips use repeaters to buffer the clock and equalize the delay Reduces but doesn’t eliminate skew 21: Package, Power, and Clock

Example 21: Package, Power, and Clock

Example Skew comes from differences in gate and wire delay With right buffer sizing, clk1 and clk2 could ideally arrive at the same time. But power supply noise changes buffer delays clk2 and clk3 will always see RC skew 21: Package, Power, and Clock

Clock Uncertainties 21: Package, Power, and Clock

Clock Nonidealities Clock skew Clock jitter Spatial variation in temporally equivalent clock edges; deterministic + random, tSK Clock jitter Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle-to-cycle (short-term) tJS Long term tJL Variation of the pulse width Important for level sensitive clocking

Review: Skew Impact Ideally full cycle is available for work Skew adds sequencing overhead Increases hold time too 21: Package, Power, and Clock

Solutions Reduce clock skew Careful clock distribution network design Plenty of metal wiring resources Analyze clock skew Only budget actual, not worst case skews Local vs. global skew budgets Tolerate clock skew Choose circuit structures insensitive to skew 21: Package, Power, and Clock

Clock Dist. Networks Ad hoc Grids H-tree Hybrid 21: Package, Power, and Clock

H-Trees Fractal structure Gets clock arbitrarily close to any point Matched delay along all paths Delay variations cause skew A and B might see big skew 21: Package, Power, and Clock

More realistic H-tree [Restle98]

Itanium 2 H-Tree Four levels of buffering: Primary driver Repeater Second-level clock buffer Gater Route around obstructions 21: Package, Power, and Clock

Itanium 2 Repeaters 21: Package, Power, and Clock

Spines 21: Package, Power, and Clock

Pentium IV Clock Spines 21: Package, Power, and Clock

Pentium IV Clock Spines 21: Package, Power, and Clock

Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die 21: Package, Power, and Clock

The Grid System No rc-matching Large power

Alpha Clock Grids 21: Package, Power, and Clock

Example: DEC Alpha 21164

21164 Clocking Clock waveform Location of clock driver on die EE141 21164 Clocking tcycle= 3.3ns 2 phase single wire clock, distributed globally 2 distributed driver channels Reduced RC delay/skew Improved thermal distribution 3.75nF clock load 58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation trise = 0.35ns tskew = 150ps Clock waveform pre-driver final drivers Location of clock driver on die

Clock Skew in Alpha Processor

EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS EE141 EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS trise = 0.35ns tskew = 50ps tcycle= 1.67ns Global clock waveform 2 Phase, with multiple conditional buffered clocks 2.8 nF clock load 40 cm final driver width Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking

21264 Clocking

(20% to 80% Extrapolated to 0% to 100%) EE141 EV6 Clock Results ps 300 305 310 315 320 325 330 335 340 345 ps 5 10 15 20 25 30 35 40 45 50 GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%)

EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains EE141 EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains + widely dispersed drivers + DLLs compensate static and low-frequency variation + divides design and verification effort - DLL design and verification is added work + tailored clocks

Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid Ex: IBM Power4, PowerPC H-tree drives 16-64 sector buffers Buffers drive total of 1024 points All points shorted together with grid 21: Package, Power, and Clock

Clock Gaters 21: Package, Power, and Clock

Adaptive Deskewing 21: Package, Power, and Clock

Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 1) Completion is ensured by careful timing analysis 2) Ordering of events is implicit in logic Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol

Self-Timed Pipelined Datapath

Completion Signal Generation

Completion Signal Generation

Completion Signal in DCVSL 21: Package, Power, and Clock

Self-Timed Adder

Completion Signal Using Current Sensing 21: Package, Power, and Clock