Washington State University

Slides:



Advertisements
Similar presentations
Topics Electrical properties of static combinational gates:
Advertisements

1 Power Management for High- speed Digital Systems Tao Zhao Electrical and Computing Engineering University of Idaho.
Introduction to CMOS VLSI Design Sequential Circuits
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
L06 – Clocks Spring /18/05 Clocking.
1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,
Lecture 21: Packaging, Power, & Clock
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
VLSI Design CMOS Transistor Theory. EE 447 VLSI Design 3: CMOS Transistor Theory2 Outline Introduction MOS Capacitor nMOS I-V Characteristics pMOS I-V.
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Interconnessioni e parassiti1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Interconnessioni e parassiti.
EE4800 CMOS Digital IC Design & Analysis
Lecture 7: Power.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Digital Integrated Circuits for Communication
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 CSE370, Lecture 16 Lecture 19 u Logistics n HW5 is due today (full credit today, 20% off Monday 10:29am, Solutions up Monday 10:30am) n HW6 is due Wednesday.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 13.1 EE4800 CMOS Digital IC Design & Analysis Lecture 13 Packaging, Power and Clock Distributions.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Modern VLSI Design 4e: Chapter 7 Copyright  2008 Wayne Wolf Topics Global interconnect. Power/ground routing. Clock routing. Floorplanning tips. Off-chip.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Lecture 2 1 Computer Elements Transistors (computing) –How can they be connected to do something useful? –How do we evaluate how fast a logic block is?
ASIC Design Flow – An Overview Ing. Pullini Antonio
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Chapter 07 Electronic Analysis of CMOS Logic Gates
Washington State University
Modern VLSI Design 2e: Chapter 3 Copyright  1998 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
EE141 © Digital Integrated Circuits 2nd Wires 1 Digital Integrated Circuits A Design Perspective The Interconnect Jan M. Rabaey Anantha Chandrakasan Borivoje.
EE4800 CMOS Digital IC Design & Analysis
© Digital Integrated Circuits 2nd Interconnect ECE 558/658 : Lecture 20 Interconnect Design (Chapter 9) Clock distribution (Chapter ) Atul Maheshwari.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Modern VLSI Design 3e: Chapter 7 Copyright  1998, 2002 Prentice Hall PTR Topics n Power/ground routing. n Clock routing. n Floorplanning tips. n Off-chip.
Introduction to Clock Tree Synthesis
Interconnect/Via.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
June clock data Q-flop Flop dataQ clock Flip-flop is edge triggered. It transfers input data to Q on clock rising edge. Memory Elements.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Clocking System Design
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
MICROPROCESSOR DESIGN1 IR/Inductive Drop Introduction One component of every chip is the network of wires used to distribute power from the input power.
Digital Integrated Circuits A Design Perspective
Lecture 11: Sequential Circuit Design
The Interconnect Delay Bottleneck.
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Timing Analysis 11/21/2018.
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
COPING WITH INTERCONNECT
Presentation transcript:

Washington State University EE 587 SoC Design & Test Partha Pande School of EECS Washington State University pande@eecs.wsu.edu

SoC Physical Design Issues Power and Clock Distribution Lecture 4 SoC Physical Design Issues Power and Clock Distribution

Overview HJS - Chapter 11 Power Grid and Clock Design Reading HJS - Chapter 11 Power Grid and Clock Design For background information refer to chapter 5 of the text book

Purpose of Power Distribution Goal of power distribution system is to deliver the required current across the chip while maintaining the voltage levels necessary for proper operation of logic circuits Must route both power and ground to all gates Design Challenges: How many power and ground pins should we allocate? Which layers of metal should be used to route power/ground? How wide should be make the wire to minimize voltage drops and reliability problems How do we maintain VDD and Gnd within noise budget? How do we verify overall power distribution system?

Design Example Target Impedance of the power grid For a supply voltage of 1.2 V and a supply current of 100 A, with a 10 % noise budget the power grid impedance can be 1.2 mΩ

Power Distribution Issues - IR Drop Narrow line widths increase metal line resistance As current flows through power grid, voltage drops occur Actual voltage supplied to transistors is less than Vdd Impacts speed and functionality Need to choose wire widths to handle current demands of each segment n1 n2 n5 n8 n7 n3 n4 n6 Vdd < Vdd

Block Interaction yields IR Drop Plots courtesy of Simplex Solutions, Inc.

Power Grid Issues – Static IR Drop Block placement and global power routing determines IR drop on the chip Possible solutions Rearrange blocks More Vdd pins Connect bottom portion of grid to top portion Plot courtesy of Simplex Solutions, Inc.

Power Grid Issues – Static IR Drop If we connect bottom portion of grid to top portion, the IR drop is reduced significantly However, this is only one part of the problem We must also examine electromigration Plot courtesy of Simplex Solutions, Inc.

Power Grid Issues - Electromigration As current flows down narrow wires, metal begins to migrate Metal lines break over time due to metal fatigue Based on average/peak current density Need to widen wires enough to avoid this phenomenon n3 n4 n2 n1 n8 n5 n6 n7

Case Study – IR and EM Tradeoff

Ldi/dt Effects in the Power Supply In addition to IR drop, power system inductance is also an issue Inductance may be due to power pin, power bump or power grid Overall voltage drop is: Vdrop = IR + Ldi/dt Distribute decoupling capacitors (decaps) liberally throughout design Capacitors store up charge Can provide instantaneous source of current for switching

On-chip Decoupling Caps On-chip decaps help to stabilize the power grid voltage First line of defense against noise which can extend beyond 10GHz Simple Example: Drop across inductors = 2 x L x di/dt = 2 x 0.2nH x 20mA/100ps = 80mV (problematic if supply is 1.2V) Actual power pad or bump may need to support thousands of inverters Use capacitors to supply instantaneous charge to inverters

Making a Decoupling Cap Decaps are basically NMOS transistors. Top plate is polysilicon, bottom-plate is inverted channel, insulator is gate oxide. Connect poly to Vdd and source/drain to Vss Low-frequency capacitance is roughly COX W L. Since these are large capacitance to be used at high frequencies, more accurate representation is needed

How much Decoupling Cap? To estimate required decap value, run SPICE on patch of chip area with power grid, part of logic block, and sprinkle of decaps Amount of decap depends on: Acceptable ripple on Vdd-Vss (typically 10% noise budget) Switching activity of logic circuits (usually need 10X switched cap) Current provided by power grid (di/dt) Required frequency response (high frequency operation) How much decap exists ( non-switching diffusion, gate, wire caps)

Simulation with Decoupling Caps Plot center region of grid Local Vdd-Vss in worst-case location in patch (center) First dip can be dealt with by a low-frequency on-chip voltage regulator and low-frequency decaps Steady-state ripple is controlled by high-frequency decoupling caps Adjust location of decaps until the ripple is within noise budget

Designing Power Distribution Floorplanner should be aware of IR+Ldi/dt drop and EM problems and design accordingly Requires knowledge of current distributions and voltage drop constraints of blocks being placed Provide adequate number of VDD and Gnd pins Route power distribution system according to current demands of the blocks Widen wires based on expected current density in branches Distribute decoupling capacitors liberally throughout design Verify full chip with IR/EM tools

Reducing the Effects of IR drop and Ldi/dt Stagger the firing of buffers (bad idea: increases skew) Use different power grid tap points for clock buffers (but it makes routing more complicated for automated tools) Use smaller buffers (but it degrades edge rates/increases delay) Make power busses wider (requires area but should do it) Use more Vdd/Vss pins; adjust locations of Vdd/Vss pins Put in power straps where needed to deliver current Place decoupling capacitors wherever there is free space Integrate decoupling capacitors into buffer cells These caps act as decoupling caps when they are not switching

Power Routing Examples Block A Block B Block A Block B Single Trunk Multiple Trunks

Simple Routing Examples Block A Block B Block A Block B Double-Ended Connections Wider Trunks

Interleaved Power/Ground Routing Interleaved Vdd/Vss

Flip-Flop and Clock Design Flip-flops and latches are used to gate signals in sequential logic designs. The critical parameters of setup and hold times Clock design is also a complex issue in DSM due to RC delay components in the interconnect and the power dissipation. We will look at clock trees, H-trees and clock grids. An overall examination of the issues of clocks skew, IR drop and signal integrity, and how to manage them using circuit techniques. Let’s start by revisiting the expected limits of clock speed

Clocked D Flip-flop D Qn+1 0 0 1 1 data output D Q Clk Q CK Very useful FF Widely used in IC design for temporary storage of data May be edge-triggered (Flip-flop) or level-sensitive (transparent latch) D Qn+1 0 0 1 1 data output D Q Clk Q CK

Latch vs. Flip-flop Latch (level-sensitive, transparent) When the clock is high it passes In value to Out When the clock is low, it holds value that In had when the clock fell Flip-Flop (edge-triggered, non transparent) On the rising edge of clock (pos-edge trig), it transfers the value of In to Out It holds the value at all other times. CLK CLK

Alternative View Clocks serve to slow down signals that are too fast Flip-flops / latches act as barriers With a latch, a signal can propagate through while the clock is high With a Flip-flop, the signal only propagates through on the rising edge Flip-flops consist of two latch like elements (master and slave latch)

Clocking Overhead T T T T + T T Flip Flop Latch D in D in setup Clk FF and Latches have setup and hold times that must be satisfied: Flip Flop will work won’t work Latch may work D in D in T setup T Clk hold Clk T hold Qout Qout T + T T setup clk-q d-q If Din arrives before setup time and is stable after the hold time, FF will work; if Din arrives after hold time, it will fail; in between, it may or may not work; FF delays the slowest signal by the setup + clk-q delay in the worst case Latch has small setup and hold times; but it delays the late arriving signals by Td-q

Clock Skew Not all clocks arrive at the same time, i.e., they may be skewed. SKEW = mismatch in the delays between arrival times of clock edges at FF’s SKEW causes two problems: Tclk-q Tsetup F l o p Logic Late Early T cycle = T d +T setup + T clk-q skew when T hold > T d=0 Fix critical path The cycle time gets longer by the skew Shows up as a SETUP time violation The part can get the wrong answer Insert buffer Delay elements Shows up as a HOLD time violation

Overhead for a Clock CMOS FO4 delay is roughly 425ps/um x Leff For 0.13um, FO4 delay ≈ 40 - 50ps For a 1GHz clock, this allows < 20 FO4 gate delays/cycle Clock overhead (including margins for setup/hold) 2 FF/Latches cost about 2-3 FO4 delays skew costs approximately 2-3 FO4 delays Overhead of clock is roughly 4-6 FO4 delays 14-16 FO4 delays left to work with for logic Need to reduce skew and FF cost Tcycle Skew Tclk-q Tlogic CLOCK

Signal Integrity Issues at FF’s What happens if a glitch occurs in a clock signal? Flip-flop captures and propagates incorrect data Could view any signal that, if glitched, could cause a logic upset as a “clock” signal Need to space out clocks/signals or shield them Positive-Edge Triggered Flip-Flop D Q Clk Clk

Signal Integrity Issues at FF’s What happens if a glitch occurs in data signal? Flip-flop captures and propagates incorrect data Need to insure that data signal is stable during FF setup time Shielding with stable signals or spacing is needed Positive-Edge Triggered Flip-Flop D Q Clk Clk

Sources of Clock Skew Main sources: 1. Imbalance between different paths from clock source to FF’s interconnect length determines RC delays capacitive coupling effects cause delay variations buffer sizing number of loads driven 2. Process variations across die interconnect and devices have different statistical variations Secondary Sources: 1. IR drop in power supply 2. Ldi/dt drop in supply

IR Drop Impacts on Clock Skew Ideal Vdd - Low delay - Low skew Skew Delay (latency) Conservative Vdd - High delay - Low skew Actual IR drop impact - delay about 5-15% larger - skew about 25-30% larger

Power dissipation in Clocks Significant power dissipation can occur in clocks in high-performance designs: clock switches on every cycle so P= CV2f (i.e., a=1) clock capacitance can be ~nF range, say 1nF = 1000pF assuming a power supply of 1.8V, CV = 1800pC of charge if clock switches every 2ns (500MHz), that’s 0.9A for VDD = 1.8V, P=IV=0.9(1.8)=1.6W in the clock circuit alone Much of the power (and the skew) occurs in the final drivers due to the sizing up of buffers to drive the flip-flops Key to reducing the power is to examine equation CV2f and reduce the terms wherever possible VDD is usually given to us; would not want to reduce swing due to coupling noise, etc. Look more closely at C and f

Reducing Power in Clocking Gated Clocks: can gate clock signals through AND gate before applying to flip-flop; this is more of a total chip power savings all clock trees should have the same type of gating whether they are used or not, and at the same level - total balance Reduce overall capacitance (again, shielding vs. spacing) (a) higher total cap./less area (b) lower cap./ more area Tradeoff between the two approaches due to coupling noise approach (a) is better for inductive noise; (b) is better for capacitive noise Signal 1 clock Signal 2 shield clock shield

Clock Design Objectives Now that we understand the role of the clock and some of the key issues, how do we design it? Minimize the clock skew (in presence of IR drop) Minimize the clock delay (latency) Minimize the clock power (and area) Maximize noise immunity (due to coupling effects) Maximize the clock reliability (signal EM) Problems that we will have to deal with Routing the clock to all flip-flops on the chip Driving unbalanced loading, which will not be known until the chip is nearly completed On-chip process/temperature variations

Multi-stage clock tree Clock Design Tree Secondary clock drivers Multi-stage clock tree Minimal area cost Requires clock-tree management Use a large superbuffer to drive downstream buffers Balancing may be an issue Main clock driver

H-Tree Clock Configurations Place clock root at center of chip and distribute as an H structure to all areas of the chip Clock is delayed by an equal amount to every section of the chip Local skew inside blocks is kept within tolerable limits

Grid Clock Configurations Greater area cost Easier skew control Increased power consumption Electromigration risk increased at drivers Severely restricts floorplan and routing

Clock Design and Verification Many design styles Low-speed designs: regular signals, symmetric tree Medium-speed designs: balanced H-tree High-speed designs Balanced buffered H-tree Grid Clock verification is more complex in DSM RC Interconnect delays Signal integrity (capacitive coupling, inductance) IR drop Signal Electromigration Clock Jitter time-domain variation of a given clock signal due to random noise, IR drop, temperature, etc. Jitter Clock edge

Clock Design Today Old methodology Advanced Clock Verification Route clock Route rest of nets Extract clock parasitics Perform timing verification Balance clock by “snaking” route in reserved areas IR Drop and Ldi/dt effects Coupling capacitance Electromigration checks Full-chip skew/slew analysis Jitter analysis Inductance Effects Process variations

Good Practices in Clock Design Try to achieve the lowest Latency (Super Buffer/H-tree) Control transition times (keep edge rates sharp) Use 1 type of clock buffer for good matching (except perhaps in the last leg where you need to have adjustable buffers) Have min/max line lengths for good matching Determine whether spacing or shielding provides better tradeoff Use integral decoupling in buffers to reduce IR and Ldi/dt