Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.

Slides:



Advertisements
Similar presentations
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
OCV-Aware Top-Level Clock Tree Optimization
Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs Kirill Minkovich and Jason Cong VLSI CAD Lab Computer Science Department University.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Post-Placement Voltage Island Generation for Timing-Speculative Circuits Rong Ye†, Feng Yuan†, Zelong Sun†, Wen-Ben Jone§ and Qiang Xu†‡
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
Assume array size is 256 (mult: 4ns, add: 2ns)
Abhijit Davare 1, Qi Zhu 1, Marco Di Natale 2, Claudio Pinello 3, Sri Kanajan 2, Alberto Sangiovanni-Vincentelli 1 1 University of California, Berkeley.
Dual Graph-Based Hot Spot Detection Andrew B. Kahng 1 Chul-Hong Park 2 Xu Xu 1 (1) Blaze DFM, Inc. (2) ECE, University of California at San Diego.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Automatic Verification of Timing Constraints Asli Samir – JTag course 2006.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Dynamic lot sizing and tool management in automated manufacturing systems M. Selim Aktürk, Siraceddin Önen presented by Zümbül Bulut.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
 Thermal Variation: temperature has a direct impact on the delay of CMOS gates; thermal variation might cause timing failures  Process Variation: process.
Pei-Ci Wu Martin D. F. Wong On Timing Closure: Buffer Insertion for Hold-Violation Removal DAC’14.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
1 Jieyi Long, Ja Chun Ku, Seda Ogrenci Memik, Yehea Ismail Dept. of EECS, Northwestern Univ. SACTA: A Self-Adjusting Clock Tree Architecture to Cope with.
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
CMSC 345 Fall 2000 Unit Testing. The testing process.
CAD for Physical Design of VLSI Circuits
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Low Latency Clock Domain Transfer for Simultaneously Mesochronous, Plesiochronous and Heterochronous Interfaces Wade Williams Philip Madrid, Scott C. Johnson.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
1. Placement of Digital Microfluidic Biochips Using the T-tree Formulation Ping-Hung Yuh 1, Chia-Lin Yang 1, and Yao-Wen Chang 2 1 Dept. of Computer Science.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Sequential Networks: Timing and Retiming
1 COMP541 Sequential Logic Timing Montek Singh Sep 30, 2015.
Clocking System Design
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
The University of British Columbia
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
A High Performance SoC: PkunityTM
Post-Silicon Calibration for Large-Volume Products
Timing Analysis and Optimization of Sequential Circuits
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design of Self-Adjusting Pipelines

2 Outline Introduction Introduction Self-Adjusting Pipeline (SAP) Self-Adjusting Pipeline (SAP) Systematic Design Framework of SAP Systematic Design Framework of SAP Experiment: Microprocessor Pipeline Experiment: Microprocessor Pipeline Conclusions Conclusions

3 Introduction Aggressive Scaling Down Aggressive Scaling Down –Process variation (PV) –Circuit behaviors are harder to predict Steadily Increasing Integration Capacity Steadily Increasing Integration Capacity –Complexity of the designs amplified from generation to generation from generation to generation Challenges for Traditional CAD Flow Challenges for Traditional CAD Flow –Synthesis: has to be conservative –Simulation: challenging due to the variations –Verification: very tricky due to the complexity Novel Design Methodologies are Methodologies areNeeded!

4 Introduction Self-Adjusting Architecture Self-Adjusting Architecture –A promising methodology to address the above mentioned challenges –Offers a way handle uncertainty after manufacturing manufacturing –Still in its infancy –Automated design tools Examples of Self-Adjusting Architectures Examples of Self-Adjusting Architectures –Razor [Ernst et al. MICRO2003] –SACTA [Long et al. ICCAD2007]

5 Self-Adjusting Pipeline (SAP) Impact of Process Variation Impact of Process Variation Due to PV, balanced designs are actually NOT balanced! clk R1R1 R2R2 R3R3 Traditionally, pipeline stages are designed to have the same nominal delay (i.e., balanced) The impacts of PV on different stages are different

6 Self-Adjusting Pipeline (SAP) Impact of Process Variation Impact of Process Variation FMAX Model [Bowman et al, TCAD 2007]: With the same nominal delay, pipeline stages having 1)larger number of independent critical paths and 2)smaller logic depth have higher probability to have longer delay clk R1R1 R2R2 More Vulnerable Less Vulnerable R3R3

7 Self-Adjusting Pipeline (SAP) Allocate Execution Time on a Need Basis Allocate Execution Time on a Need Basis Our solution: Create dynamic clock skews to satisfy the actual need of the stages clk R1R1 R2R2 R3R3

8 Self-Adjusting Pipeline (SAP) How to obtain the actual execution time of each stage? How to obtain the actual execution time of each stage? Razor: Detect after execution Razor: Detect after execution Our solution: Measure and predict! Our solution: Measure and predict! clk R1R1 R2R2 R3R3

9 Self-Adjusting Pipeline (SAP) How to fix the timing error? How to fix the timing error? Our solution: Measure and predict! Our solution: Measure and predict! –We predict the error before it manifests itself, so we might have time to fix it clk R1R1 R2R2 R3R3

10 T max Self-Adjusting Pipeline (SAP) Two supporting circuit elements Two supporting circuit elements –Delay sensor [based on Ghosh et al. TCAD 2007] –Adjustable skew buffer Delay Sensor TDTD V REF CLK P Sawtooth TAH

11 Self-Adjusting Pipeline (SAP) Two supporting circuit elements Two supporting circuit elements –Delay sensor [based on Ghosh et al. TCAD 2007] –Adjustable skew buffer Adjustable Skew Buffer

12 Systematic Design Framework of SAP Objective Function: Average Performance Objective Function: Average Performance Frequency bins: [f 1, f 2 ], …, [f i, f i+1 ], …, [f n, f n+1 ] Yield y i of bin [f i, f i+1 ] : The fraction of chips falling into the bin Speed Binning Speed Binning Performance Metric of a Set of Chips Performance Metric of a Set of Chips BP = ∑ i =1 f i · y i BP = ∑ i =1 f i · y i Batch Performance [Das et al., ASGI 2007]

13 Systematic Design Framework of SAP Variables Variables Cannot be too early in the stage, otherwise the prediction will not be accurate Cannot be too late in the stage, otherwise we do not have time to fix the error Locations of the delay sensors: Locations of the delay sensors: Cannot be too small, otherwise the timing error in the first stage is not fixed Nominal delay of the adjustable skew buffer Nominal delay of the adjustable skew buffer Cannot be too large, otherwise there might be a timing error in the second stage

14 Systematic Design Framework of SAP Automated Delay Sensor Insertion and Clock Skew Buffer Configuration Given: 1) a two back-to-back pipeline stages, where the first stage is more vulnerable to process variation than the second more vulnerable to process variation than the second 2) max tolerable delay of each internal node in the pipeline Determine: 1) the location of the delay sensors, 2) nominal delay of the adjustable skew buffers, such that the Batched Performance is maximized. Problem Definition

15 Systematic Design Framework of SAP Mixed-Integer Programming Formulation Directed Acyclic Graph (DAG) representation of the pipelines Directed Acyclic Graph (DAG) representation of the pipelines –Gates → vertices –Registers → primary I/O vertices –Interconnects → directed edges

16 Systematic Design Framework of SAP Mixed-Integer Programming Formulation Directed Acyclic Graph (DAG) representation of the pipelines Directed Acyclic Graph (DAG) representation of the pipelines –Primary Path: a path between a primary input and a primary output vertices Coverage Requirement: each primary path must be covered by one and only one delay sensor Coverage Requirement: each primary path must be covered by one and only one delay sensor –The edges with delay sensor on it form a cut of the DAG

17 Systematic Design Framework of SAP Mixed-Integer Programming Formulation We assign a decision variable x i to each vertex We assign a decision variable x i to each vertex –The decision variables specifies the locations of the sensors: A sensor is on edge (v i, v j ), iff x i – x j = 1

18 Systematic Design Framework of SAP Mixed-Integer Programming Formulation Constraints specifying the Coverage Requirement Constraints specifying the Coverage Requirement x i – x j ≥ 0, for each edge (v i, v j ) x p = 1, for all v p in PI 1 x p = 0, for all v p in PI 2 x q = 0, for all v q in PO 1 or PO 2

19 Systematic Design Framework of SAP Mixed-Integer Programming Formulation Forbidden Vertex Set V F Forbidden Vertex Set V F –The delay from the underlying vertex and any primary output is less than the worst case delay of the OR-MUX chain x f = 0, for all v p in V F

20 Systematic Design Framework of SAP Mixed-Integer Programming Formulation Objective Function: Batch Performance Objective Function: Batch Performance Pr(f) : The probability that the pipeline stage meets the timing constraints at frequency f Pr(f) : The probability that the pipeline stage meets the timing constraints at frequency f BP = ∑ i =1 f i · y i = ∑ i =1 f i · (Pr(f i+1 ) - Pr(f i )) = ∑ i =1 f i · (Pr(f i+1 ) - Pr(f i ))

21 Systematic Design Framework of SAP Mixed-Integer Programming Formulation Analysis of Pr(f) Analysis of Pr(f) We consider two situations: 1)At least one error is predicted 2)No error is predicted Some definitions: D i : accumulative delay at node i D i m : the maximum tolerable delay at node i α(x) = 1 if x > 0, 0 otherwise

22 Systematic Design Framework of SAP Mixed-Integer Programming Formulation At least one error is predicted At least one error is predicted If the sensor on edge (v i, v j ) detects an error α((x i - x j )(D i - D i m )) = 1 At least one sensor detects an error R 1 = ∑ α((x i - x j )(D i - D i m )) > 0 (v i, v j )

23 Systematic Design Framework of SAP Mixed-Integer Programming Formulation At least one error is predicted At least one error is predicted The skew buffer will be reconfigured to generate a skew of amount δ For stage 1, the effective clock cycle time becomes (1/f +δ), we should have For stage 2, the effective clock cycle time becomes (1/f –δ) For each v k ∈ PO 1, α(1/f + δ – D k ) = 1 For each v k ∈ PO 2, α(1/f – δ – D k ) = 1

24 Systematic Design Framework of SAP Mixed-Integer Programming Formulation At least one error is predicted At least one error is predicted The skew buffer will be reconfigured to generate a skew of amount δ Timing correctness requirement: R 2 = ( ∏ α(1/f + δ – D k ) ) ( ∏ α(1/f – δ – D k ) ) = 1 v k ∈ PO 1 v k ∈ PO 2

25 Systematic Design Framework of SAP Mixed-Integer Programming Formulation At least one error is predicted At least one error is predicted The probability of error being fixed: Pr(R 1 > 0 and R 2 = 1)

26 Systematic Design Framework of SAP Mixed-Integer Programming Formulation No error is predicted No error is predicted R 3 = ( ∏ α(1/f – D k ) ) ( ∏ α(1/f – D k ) ) = 1 v k ∈ PO 1 v k ∈ PO 2 there is actually no timing error:

27 Systematic Design Framework of SAP Mixed-Integer Programming Formulation The probability that the pipeline executed correctly Pr(f) The probability that the pipeline executed correctly Pr(f) Pr(R 1 > 0 and R 2 = 1) + Pr(R 1 = 0 and R 3 = 1)

28 Systematic Design Framework of SAP Mixed-Integer Programming Formulation s.t. x i – x j ≥ 0, for each edge (v i, v j ) x p = 1, for all v p in PI 1 x p = 0, for all v p in PI 2 x q = 0, for all v q in PO 1 or PO 2 max ∑ i =1 f i · (Pr(f i+1 ) - Pr(f i )) x f = 0, for all v p in V F x i = 0 or x i = 1

29 Systematic Design Framework of SAP Simulated Annealing Solving the MIP Formulation Solution Space X = {x 1, x 2, …, x n, } satisfying the constraints of the MIP formulation Initial Solution x i = 1 iff v i belongs to PI 1 x i = 1 iff v i belongs to PI 1

30 Systematic Design Framework of SAP Simulated Annealing Solving the MIP Formulation Solution Perturbation M 0 (x j, X) (0 to 1 toggle): i) keep the value of x i for each i != j ; ii) change the value of x j from 0 to 1 if 1) x j = 0 and x i = 1 for change the value of x j from 0 to 1 if 1) x j = 0 and x i = 1 for each edge (v i, v j ), and 2) x j does not belong to V F each edge (v i, v j ), and 2) x j does not belong to V F

31 Systematic Design Framework of SAP Simulated Annealing Solving the MIP Formulation Solution Perturbation M 0 (x j, X) (0 to 1 toggle): i) keep the value of x i for each i != j ; ii) change the value of x j from 0 to 1 if 1) x j = 0 and x i = 1 for change the value of x j from 0 to 1 if 1) x j = 0 and x i = 1 for each edge (v i, v j ), and 2) x j does not belong to V F each edge (v i, v j ), and 2) x j does not belong to V F

32 Systematic Design Framework of SAP Simulated Annealing Solving the MIP Formulation Solution Perturbation M 1 (x i, X) (1 to 0 toggle): i) keep the value of x j for each j != i ; ii) change the value of x j from 1 to 0 if 1) x j = 0 and x i = 1 for change the value of x j from 1 to 0 if 1) x j = 0 and x i = 1 for each edge (v i, v j ) each edge (v i, v j )

33 Systematic Design Framework of SAP Simulated Annealing Solving the MIP Formulation Solution Perturbation M 1 (x i, X) (1 to 0 toggle): i) keep the value of x j for each j != i ; ii) change the value of x j from 1 to 0 if 1) x j = 0 and x i = 1 for change the value of x j from 1 to 0 if 1) x j = 0 and x i = 1 for each edge (v i, v j ) each edge (v i, v j )

34 Experiment: Microprocessor Pipeline DEC Alpha-like 6 Stage Pipeline DEC Alpha-like 6 Stage Pipeline –Cache and IF are next to each other Cache has a lot of critical paths, each consisting of small number of gates Cache has a lot of critical paths, each consisting of small number of gates IF has just a few critical paths, each consisting of large number of gates IF has just a few critical paths, each consisting of large number of gates According to FMAX model [Bowman et al, TCAD 2007], the delay of the cache tends to be longer According to FMAX model [Bowman et al, TCAD 2007], the delay of the cache tends to be longer IFMAPIQREGALUCache More Vulnerable Less Vulnerable

35 Experiment: Microprocessor Pipeline DEC Alpha-like 6 Stage Pipeline DEC Alpha-like 6 Stage Pipeline –Cache and IF are next to each other It will be beneficial to create dynamic clock skew when the cache does needs longer execution time It will be beneficial to create dynamic clock skew when the cache does needs longer execution time IFMAPIQREGALUCache More Vulnerable Less Vulnerable

36 Experiment: Microprocessor Pipeline Setup The critical paths of the Cache and IF are extracted from the Verilog code of OpenSPARC processor We assume 45nm technology IFMAPIQREGALUCache More Vulnerable Less Vulnerable

37 Experiment: Microprocessor Pipeline Results: the average frequency increases from 1.989GHz to 2.178GHz (9.5% improvement)

38 Conclusions We identified the challenges in modern VLSI cad tool design We identified the challenges in modern VLSI cad tool design We proposed to leverage Self-Adjusting Pipeline to solve the problems We proposed to leverage Self-Adjusting Pipeline to solve the problems We propose a systematic Design Framework of SAP We propose a systematic Design Framework of SAP Application: Microprocessor Pipeline Application: Microprocessor Pipeline Experimental results illustrates the effectiveness of our approach Experimental results illustrates the effectiveness of our approach

39