Download presentation
Presentation is loading. Please wait.
1
A Novel Clock Distribution and Dynamic De-skewing Methodology Arjun Kapoor – University of Colorado at Boulder Nikhil Jayakumar – Texas A&M University, College Station Sunil P. Khatri – Texas A&M University, College Station
2
Introduction Clock Distribution critical in ICs. Clock Distribution critical in ICs. In typical ICs, clock is distributed to several sites on the IC from one central clock signal. In typical ICs, clock is distributed to several sites on the IC from one central clock signal. Requirement is to minimize skew between these sites. Requirement is to minimize skew between these sites. One of the available networks – H-Tree One of the available networks – H-Tree Zero skew without considering process variations Zero skew without considering process variations With diminishing feature size, increasing die size, intra-die variations lead to increased skew across a die. With diminishing feature size, increasing die size, intra-die variations lead to increased skew across a die.
3
Previous Approaches – Hierachical H-tree De-skew Phase detectors located on the domain boundaries of each leg of the H-tree. Phase detectors located on the domain boundaries of each leg of the H-tree. Possible worst case skew between 2 neighboring leaves can be as high as (2n+1)D where, Possible worst case skew between 2 neighboring leaves can be as high as (2n+1)D where, D = guardband of the phase detector n = number of levels - “A Design for Digital Dynamic Clock Deskew”, Dike et.al.
4
Previous Approaches – Mesh Deskew Phase detectors used between each pair of leaf nodes of the H-tree. Phase detectors used between each pair of leaf nodes of the H-tree. Clock skew between neighboring leaves is now = D (guardband of phase detector). Clock skew between neighboring leaves is now = D (guardband of phase detector). Clock skew across die is still high - Clock skew across die is still high - mD between any 2 leaf nodes where, m = number of phase detectors between the 2 leaf nodes - “A Design for Digital Dynamic Clock Deskew”, Dike et.al.
5
Our Approach Clock signal is returned from leaf nodes. Clock signal is returned from leaf nodes. Single phase detector at center of tree. Single phase detector at center of tree. All returned clock signals are compared with the same delayed reference signal. All returned clock signals are compared with the same delayed reference signal. De-skewing can be done at boot-up time or dynamically during free cycles. De-skewing can be done at boot-up time or dynamically during free cycles.
6
Our Approach Use a modified buffered H-tree. Use a modified buffered H-tree. Have buffers at each level. Have buffers at each level. Not typically done due to process variation in buffers. Not typically done due to process variation in buffers. Wire width sizing reversed. Wire width sizing reversed. Typical H-tree – width decreases with level. Typical H-tree – width decreases with level. Our H-tree – width increases with level to make sure buffer at each level sees same load. Our H-tree – width increases with level to make sure buffer at each level sees same load. We utilize clock shield wires and one phase detector. We utilize clock shield wires and one phase detector.
7
Network Topology Clock assumed to be routed on metal 6. Typical H-tree requires clock wire and 2 shield wires on either side. We use an additional return wire of same width as clock wire.
8
The H-Tree Each section of the H- tree has tri-stateable inverters in both the forward and return clock networks. Forward network – always ON. Return network – only sections on path to be deskewed turned ON.
9
Wire Widths Traditional H-tree: Wire widths larger at center, narrower near leaf nodes – necessary to ensure clean signals at leaf nodes. Our H-tree: Wire widths larger near leaf nodes and narrower at center – to ensure each buffer sees same load. Sizes(in microns) derived for 20mm x 20mm die. 1GHz targeted clock frequency. Level Traditional H-tree Our clock tree LengthWidth LengthWidth 1 500050 50001.5 2 500020 50001.5 3 25006 25003 4 25003 25003 5 12501.5 12506 6 12501.5 12506
10
Deskewing Operation We use only one phase detector unlike previous deskewing methods. We use only one phase detector unlike previous deskewing methods. Clock signal returned from each node compared with a single reference signal. Clock signal returned from each node compared with a single reference signal. Single phase detector at chip center Single phase detector at chip center Largest skew (after deskewing) between any 2 nodes is not a function of the phase detector – phase detector accuracy/guardband unimportant. Largest skew (after deskewing) between any 2 nodes is not a function of the phase detector – phase detector accuracy/guardband unimportant. Required delay achieved using tune-able capacitor bank. Required delay achieved using tune-able capacitor bank.
11
Deskewing Operation Deskewing performed at slower clock rate Deskewing performed at slower clock rate Slower clock required for phase detector to work. Slower clock required for phase detector to work. Minimize cross-talk Minimize cross-talk When clock signal returns on return path, forward path should be stable. When clock signal returns on return path, forward path should be stable. Ensure that half the time period of the clock > round trip delay of the clock signal. Ensure that half the time period of the clock > round trip delay of the clock signal. Return path is grounded (acts as shield) during non-deskew mode Return path is grounded (acts as shield) during non-deskew mode
12
Tune-able Bank at Leaf Nodes Capacitors are binary weighted to facilitate precise control of delay. Capacitors are binary weighted to facilitate precise control of delay. Resistor added to increase the incremental delay per capacitor. Resistor added to increase the incremental delay per capacitor. Value of resistor chosen such that slew rate of last segment is not appreciably changed and incremental delay is as desired. Value of resistor chosen such that slew rate of last segment is not appreciably changed and incremental delay is as desired.
13
The Phase detector Condition LAG: O is low at T1 and high at T2 -> A lags B, phase detector not tripped. Phase detector said to be tripped when condition LAG does not hold. Delay is incrementally increased till the LAG condition FAILS to hold (phase detector trips). Guardband of phase detector is hence unimportant
14
Communicating with Tune-able Banks and Tri-stateable Buffers. Use a 2 wire serial communication scheme. Use a 2 wire serial communication scheme. Use shift registers at each tune-able bank, tristate- able buffer. Use shift registers at each tune-able bank, tristate- able buffer. At most 6 bits required to address each tristate-able node of a 6 level H-tree network. At most 6 bits required to address each tristate-able node of a 6 level H-tree network. 7 bits required for a 7 bit capacitor bank. 7 bits required for a 7 bit capacitor bank. First assert reset signal (derived from the signal wires) – then send a 6 bit address (to address the correct capacitance bank, return path). Next send 7- bit data (capacitance value) First assert reset signal (derived from the signal wires) – then send a 6 bit address (to address the correct capacitance bank, return path). Next send 7- bit data (capacitance value)
15
Addressing Mechanism 01 00 1011 01 010 000 110 100 011 001 111 101 3-level H-tree up, right = 1 up, right = 1 down, left = 0 down, left = 0
16
m-bit Decoder to Address the Tristate-able Buffers Serial shift registers serially shift in ‘m’ bits of the address (m is the level in the H-tree at which the tri-state buffer is located). Serial shift registers serially shift in ‘m’ bits of the address (m is the level in the H-tree at which the tri-state buffer is located). Clocking stopped by last Flip-flop. Clocking stopped by last Flip-flop. Combinational logic checks if the m-bits in the shift register match the address of the tri-state buffer. Combinational logic checks if the m-bits in the shift register match the address of the tri-state buffer. HIT signal generated if all m-bits are in and address is a match HIT signal generated if all m-bits are in and address is a match
17
7-bit Decoder for Selecting Capacitance Value Data shifted in serially (similar to the scheme used to address the tri- state buffers). Data shifted in serially (similar to the scheme used to address the tri- state buffers). HIT signal from the decoder of the last tristate-able buffer produces a reset pulse HIT signal from the decoder of the last tristate-able buffer produces a reset pulse Clocking stopped by last Flip-flop (let go again only when the next HIT signal arrives). Clocking stopped by last Flip-flop (let go again only when the next HIT signal arrives).
18
Overall Operation of the Serial Communication Scheme Follow the sequence of: Follow the sequence of: Serial-reset – transmit address – transmit-data sequence Serial-reset – transmit address – transmit-data sequence Each such sequence requires 13 clock cycles Each such sequence requires 13 clock cycles Each leaf node requires at most 2 7 (for a 7-bit capacitor bank) such sequences. Each leaf node requires at most 2 7 (for a 7-bit capacitor bank) such sequences. With deskew done at 100Mhz, a 6-level H-tree (64 leaf nodes) would be deskewed in about 1ms. With deskew done at 100Mhz, a 6-level H-tree (64 leaf nodes) would be deskewed in about 1ms.
19
Experimental Results Initial Skew 115 ps After dynamic de-skew skew reduced to 3ps Simulated process variations (t ox,µ, l eff, V T ) Simulated process variations (t ox,µ, l eff, V T ) - values as suggested by: “Characterization and modelling of clock skew with process variations”, Zarkesh-Ha et.al. “Characterization and modelling of clock skew with process variations”, Zarkesh-Ha et.al.
20
….Experimental Results Compared against traditional (non-buffered) H- tree with no deskew mechanism (operating at 1Ghz). Compared against traditional (non-buffered) H- tree with no deskew mechanism (operating at 1Ghz). 7.9% lower power in our network 7.9% lower power in our network Many small buffers used. Many small buffers used. Wire loads involved are smaller (improvement would be higher for higher frequencies). Wire loads involved are smaller (improvement would be higher for higher frequencies). Category Orig. Area Our Area Ovh. Wiring 1.635x10 6 2.21x10 6 34.86% Central Ck Driver 480– 24.56% Regenerators1843218432 TS inverters –4408 TS controllers –307 Capacitancecontrollers–410 Capacitors–4880
21
Conclusions We have a novel clock distribution network with dynamic de-skewing capability We have a novel clock distribution network with dynamic de-skewing capability We can de-skew nodes that are skewed by 300ps down to 3ps We can de-skew nodes that are skewed by 300ps down to 3ps We do this with a 7.9% power reduction and 34% area overhead when compared to a traditional H-tree We do this with a 7.9% power reduction and 34% area overhead when compared to a traditional H-tree
22
Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.