1 Jieyi Long, Ja Chun Ku, Seda Ogrenci Memik, Yehea Ismail Dept. of EECS, Northwestern Univ. SACTA: A Self-Adjusting Clock Tree Architecture to Cope with.

Slides:



Advertisements
Similar presentations
CS 140 Lecture 11 Sequential Networks: Timing and Retiming Professor CK Cheng CSE Dept. UC San Diego 1.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
OCV-Aware Top-Level Clock Tree Optimization
1 COMP541 Flip-Flop Timing Montek Singh Oct 6, 2014.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Modern VLSI Design 4e: Chapter 5 Copyright  2008 Wayne Wolf Topics n Performance analysis of sequential machines.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Lecture #34 Page 1 ECE 4110–5110 Digital System Design Lecture #34 Agenda 1.Timing 2.Clocking Techniques Announcements 1.n/a.
PERFORMANCE OPTIMIZATION OF SINGLE-PHASE LEVEL-SENSITIVE CIRCUITS BARIS TASKIN AND IVAN S. KOURTEV UNIVERSITY OF PITTSBURGH DEPARTMENT OF ELECTRICAL ENGINEERING.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
Technical Seminar on Timing Issues in Digital Circuits
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
Subthreshold Logic Energy Minimization with Application- Driven Performance EE241 Final Project Will Biederman Dan Yeager.
Power-Aware Placement
Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Intel Corp. Folsom, CA 95630, USA Vishwani D. Agrawal Department of ECE Auburn University,
CSE241 L4 System.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 04: Static Timing Analysis.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 Temperature-Aware Resource Allocation and Binding in High Level Synthesis Authors: Rajarshi Mukherjee, Seda Ogrenci Memik, and Gokhan Memik Presented.
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
 Thermal Variation: temperature has a direct impact on the delay of CMOS gates; thermal variation might cause timing failures  Process Variation: process.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
1 Process-Variation Tolerant Design Techniques for Multiphase Clock Generation Manohar Nagaraju +, Wei Wu*, Cameron Charles # + University of Washington,
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
1 Interconnect and Packaging Lecture 8: Clock Meshes and Shunts Chung-Kuan Cheng UC San Diego.
Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,
Outline Introduction: BTI Aging and AVS Signoff Problem
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Interconnect/Via.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Sequential Networks: Timing and Retiming
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
1 COMP541 Sequential Logic Timing Montek Singh Sep 30, 2015.
Static Timing Analysis
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Yanqing Zhang University of Virginia On Clock Network Design for Sub- threshold Circuitry 1.
Lecture 11: Sequential Circuit Design
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
Sequential circuit design with metastability
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
COMP541 Flip-Flop Timing Montek Singh Feb 23, 2010.
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Topics Performance analysis..
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Post-Silicon Calibration for Large-Volume Products
COMP541 Sequential Logic Timing
Presentation transcript:

1 Jieyi Long, Ja Chun Ku, Seda Ogrenci Memik, Yehea Ismail Dept. of EECS, Northwestern Univ. SACTA: A Self-Adjusting Clock Tree Architecture to Cope with Temperature Variation

2 Outline Introduction Motivation SACTA architecture skew buffer design optimization Experimental results Conclusion

3 Introduction Temperature impacts affect transistor and interconnect delay cause timing violation Existing techniques temperature insensitive clock tree [1] robust clock scheduling [3] razor technology [4] each having pros and cons

4 Introduction On-chip temperature variation input data dependent spatial and temporal variation hard to predict at design time dynamic architecture is highly desired Requirements small reaction time reasonable overhead

5 Introduction On-chip temperature variation input data dependent spatial and temporal variation hard to predict at design time dynamic architecture is highly desired Requirements small reaction time reasonable overhead

6 Introduction On-chip temperature variation input data dependent spatial and temporal variation hard to predict at design time dynamic architecture is highly desired Requirements small reaction time reasonable overhead

7 Introduction On-chip temperature variation input data dependent spatial and temporal variation hard to predict at design time dynamic architecture is highly desired Requirements small reaction time reasonable overhead

8 Motivation Motivation a one dimensional pipeline combinational logic blocks act like springs temperature acts like forces applied on the springs R1R1 R2R2 R3R3 θ /ºC x clk

9 x Motivation Motivation a one dimensional pipeline combinational logic blocks act like springs temperature acts like forces applied on the springs what if the clock skews act like springs also? R1R1 R2R2 R3R3 clk θ /ºC

10 Motivation Clock skews x i : clock signal arrival time at register R i D i,i+1 = T c-q +T logic(max) +T int +T setup d i,i+1 = T c-q +T logic(min) +T int +T hold Clock Skew Constraints – d i,i+1 ≤ x i – x i+1 ≤ T cp – D i,i+1 R1R1 R2R2 R3R3 clk setup time constraint hold time constraint

11 Motivation Clock skew constraints – d i,i+1 ≤ x i – x i+1 ≤ T cp – D i,i+1 Observation d i,i+1, –(x i – x i+1 ) and D i,i+1 should be made to have the same dependency on temperature R1R1 R2R2 R3R3 clk

12 Motivation How does d i,i+1 and D i,i+1 depend on temperature? HSPICE simulation v.s. linear model we only need to make the clock skews linearly dependant on temperature

13 Motivation Constraints revisited assuming the operating temperature ranging between θ min and θ max the constraints form a quadrangle we only need to couple x i – x i + 1 with the local temperature θ i,i+1, and make it a line lying strictly within the the quadrangle θ max tdtd T cp – D i,i+1 (θ i,i+1 ) – d i,i+1 (θ i,i+1 ) (x i – x i + 1 )(θ i,i+1 ) θ min – d i,i+1 ≤ x i – x i+1 ≤ T cp – D i,i+1 θ

14 Architecture SACTA: Architecture Self-Adjusting Clock Tree Architecture x i – x i+1 = (f i – f i+1 – s i ) + k i (Δθ), where Δθ = θ max – θ Automatic Temperature Adjustable (ATA) skew buffer Temperature-insensitive (fixed) skew buffer f1f1 fifi f i+1 fnfn RiRi R i+1 RnRn R1R1 s 1 -k 1 Δθs i -k i Δθs i+1 -k i+1 Δθ clk

15 SACTABuffer SACTA: Skew Buffer Design Fixed skew buffer bias the gates to Zero Temperature Coefficient point V ZTC V dd M1M1 M2M2 M3M3 M4M4 I ZTC V ZTC + – V ZTC Ref min-size V ZTC Fixed Skew Buffer W min, [L min, 5L min ]

16 ATA skew buffer SACTA: Skew Buffer Design Fixed Buffers V dd V ZTC Ref min-size W min, [L min, 5L min ]

17 SACTAOptimization SACTA: Optimization Optimizing the clock tree f i and s i positively related to the overhead minimizing the sum of f i and s i Constraints skew buffer design constraints: s i ≥ s min, f i ≥ f min, k i – λs i = 0 timing correctness: for θ = θ max, θ min, –d i,i+1 (θ)≤(x i –x i+1 )(θ)≤T cp – D i,i+1 (θ) tdtd T cp – D i,i+1 (θ i,i+1 ) – d i,i+1 (θ i,i+1 ) (x i – x i + 1 )(θ i,i+1 ) θ min θ max

18 SACTA optimization formulation SACTAOptimization SACTA: Optimization M INIMIZE Σ s i + Σ f i s.t. f i – s i – f i+1 ≤ T cp – D i,i+1 f i – s i – f i+1 ≥ – d i,i+1 f i – s i + k i Δθ M – f i+1 ≤ T cp – D i,i+1 + Γ i,i+1 Δθ M f i – s i + k i Δθ M – f i+1 ≥ – d i,i+1 + γ i,i+1 Δθ M k i – λs i = 0 s i ≥ s min, f i, f i+1 ≥ f min i = 1, 2, …, n-1

19 Transforming the problem into a network flow formulation defining four new variables f i Δ = f i – f min s i Δ = s i – s min u i = f i – s i – f i+1 + d i,i+1 v i = f i – s i (1-λΔθ M ) – f i+1 + d i,i+1 – γ i,i+1 Δθ M the optimization problem can be rewritten as Optimization SACTA: Optimization

20 Optimization SACTA: Optimization M INIMIZE Σs i Δ + Σf i Δ s.t. – f i Δ + s i Δ + f i+1 Δ + u i = d i,i+1 + s min – (λΔθ M )s i Δ – u i + v i = – γ i,i+1 Δθ M – (λΔθ M ) s min 0 ≤ u i ≤ T cp – D i,i+1 + d i,i+1 0 ≤ v i ≤ T cp – D i,i+1 + d i,i+1 + (Γ i,i+1 – γ i,i+1 )Δθ M s i Δ, f i Δ, f i+1 Δ ≥ 0 i = 1, 2, …, n-1 Generalized min-cost flow formulation Balanced Condition Bounds on the Flows

21 Optimization SACTA: Optimization Balance Condition: – f i Δ + s i Δ + f i+1 Δ + u i = d i,i+1 + s min – (λΔθ M )s i Δ – u i + v i = – γ i,i+1 Δθ M – (λΔθ M ) s min Graph based depiction of the constraints 0, T cp – D i,i+1 + d i,i+1, u i 1, +∞, f i Δ 1, +∞, f i+1 Δ pipi qiqi cost, capacity, flow pq 1, +∞, s i Δ 0, T cp – D i,i+1 +d i,i+1 +(Γ i,i+1 – γ i,i+1 ) Δθ M, v i Bounds on the Flows: 0 ≤ v i ≤ T cp – D i,i+1 + d i,i+1 + (Γ i,i+1 – γ i,i+1 )Δθ M 0 ≤ u i ≤ T cp – D i,i+1 + d i,i+1 s i Δ, f i Δ, f i+1 Δ ≥ 0

22 Optimization SACTA: Optimization Graph based depiction of the constraints 0, T cp – D i,i+1 + d i,i+1, u i cost, capacity, flow 1, +∞, f i Δ 1, +∞, f i+1 Δ pipi qiqi pq 1, +∞, s i 0, T cp – D i,i+1 +d i,i+1 +(Γ i,i+1 – γ i,i+1 ) Δθ M, v i p n-1 q n-1 p1p1 p2p2 p3p3 q1q1 q2q2 q3q3 w

23 Experimental Results Experiments six different systolic pipelines both balanced and unbalanced pipelines are examined targeting range θ max = 125 ºC, θ min = 25 ºC

24 Experimental Results Experimental results uniform temperature distribution maximum permissible temperature six different pipelines T/°C

25 Experimental Results Experimental results uniform temperature distribution (125 ºC) relative performance improvement six different pipelines RP

26 Experimental Results Experimental results various temperature profiles X: timing error, Y: no timing error Thermal Profile/ºCPipelines w/o SACTAPipelines w/ SACTA s1s1 s2s2 s3s3 s4s4 s5s5 PBPURBRUFBFUPBPURBRUFBFU XXXXXXYYYYYY XXYYXXYYYYYY XXYYXXYYYYYY XXXXXXXYXYXY XXXXXXXYXXXX

27 Experimental Results Experimental results hardware overhead PBPURBRUFBFU On-Tree Inv Num Pipeline Cell Num

28 Conclusions Temperature variation affects circuit timing Dynamic architectures are required SACTA architecture, skew buffer design, optimization SACTA enhances system robustness and performance hardware overhead of SACTA is small

29