Chapter 5b Stochastic Circuit Optimization

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
OCV-Aware Top-Level Clock Tree Optimization
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
The continuous scaling trends of smaller devices, higher operating frequencies, lower power supply voltages, and more functionalities for integrated circuits.
ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Power-Aware Placement
Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design Hao Yu Berkeley Design Chunta Chu and Lei He EE Department.
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations Yiyu Shi*, Jinjun Xiong +, Chunchen Liu* and Lei He* *Electrical.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Xin-Wei Shih and Yao-Wen Chang.  Introduction  Problem formulation  Algorithms  Experimental results  Conclusions.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong 1, Hao Yu 2, and Lei He 1 1 Electrical Engineering.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Stochastic Current Prediction Enabled Frequency Actuator for Runtime Resonance Noise Reduction Yiyu Shi*, Jinjun Xiong +, Howard Chen + and Lei He* *Electrical.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
EE 201C Modeling of VLSI Circuits and Systems
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
WoPANets: Decision-support Tool for real-time Networks Design
Memory Segmentation to Exploit Sleep Mode Operation
Ioannis E. Venetis Department of Computer Engineering and Informatics
Chapter 4b Process Variation Modeling
Chapter 7 – Specialized Routing
On-Chip Power Network Optimization with Decoupling Capacitors and Controlled-ESRs Wanping Zhang1,2, Ling Zhang2, Amirali Shayan2, Wenjian Yu3, Xiang Hu2,
The minimum cost flow problem
Chapter 2 Interconnect Analysis
Chapter 2 Interconnect Analysis
Chapter 5b Stochastic Circuit Optimization
2 University of California, Los Angeles
Jinghong Liang,Tong Jing, Xianlong Hong Jinjun Xiong, Lei He
Chapter 5a On-Chip Power Integrity
Performance Optimization Global Routing with RLC Crosstalk Constraints
CSE245: Computer-Aided Circuit Simulation and Verification
Chapter 2 Interconnect Analysis Delay Modeling
Yiyu Shi*, Jinjun Xiong+, Howard Chen+ and Lei He*
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
Impact of Parameter Variations on Multi-core chips
Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs
EE 201C Modeling of VLSI Circuits and Systems
Post-Silicon Calibration for Large-Volume Products
EDA Lab., Tsinghua University
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
EE 201C Modeling of VLSI Circuits and Systems TR 12-2pm
Reducing Clock Skew Variability via Cross Links
Multiport, Multichannel Transmission Line: Modeling and Synthesis
Yiyu Shi*, Jinjun Xiong+, Chunchen Liu* and Lei He*
Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs
Clock Tree Routing With Obstacles
EE384Y: Packet Switch Architectures II
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Chapter 5b Stochastic Circuit Optimization Prof. Lei He Electrical Engineering Department University of California, Los Angeles URL: eda.ee.ucla.edu Email: lhe@ee.ucla.edu

Outline on-chip decap allocation with stochastic current model Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning

Clock Tree Synthesis in Synchronous Circuits Clock signals synchronize data transfer between functional elements in synchronous design Different clock structures exist [Tree, Mesh, Hybrid, etc] Clock skew is the delay difference between two sinks of clock tree Clock skew becomes one of the most significant concerns in clock tree synthesis for high performance designs PLL MEM-ctrll Sys Disp AUDIO VIDEO Source Intel For synchronized designs, data transfer between functional elements are synchronized by clock signals. In terms of topology, clock signal can be delivered by clock tree, clock mesh or hybrid clock network. In this work, we concern ourselves on clock tree synthesis only. One important issue in clock synthesis is Clock skew, which is the maximum difference in the arrival time of a clock signal at two different components. Clock skew forces designers to use a large time period between clock pulses. This makes the system slower. So, in addition to other objectives, clock skew should be minimized during clock routing. The right diagram shows the clock skew vs. clock frequency. The main observation is that as the frequency becomes higher, the skew is more comparable to the frequency. In fact, Clock skew becomes the NO.1 concern in clock tree synthesis for high performance designs.

Methodologies for Clock Skew Minimization The sources of skew Un-balanced clock distribution Process, supply voltage and temperature (PVT) variation Uncertainty from loading Methodologies Active de-skew circuit using micro-controller Passive balanced embedding by CAD algorithms Variation-induced skew needs to be considered! s4 a b s1 s2 s3 s0 v The High-performance design is achieved by DSM or heterogeneous integration. It has two trends. One is to design for high-speed constrained by signal/power/thermal integrity. The other is to design for robustness under process/vdd/temperature variation. They bring the following new challenges for CAD. First, the design for high-speed introduces strong electromagnetic couplings. Second, the design in deep submicron results in a distributed circuit model with large number of nets and ports. Moreover, the integration is usually heterogeneous and hence results in a structured model with multi-physics In addition, the variation and modification introduce a large number of perturbations or parameters to a nominal design. It challenges the circuit level simulation because a detailed verification and automation will never finish. A fast simulator becomes a need. Embedding s0 s1 s3 s4 s2 a b v Topo-Gen

Outline Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Variation Sources: Spatial & Temporal Temperature Correlations Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning

Spatial Temperature Variation Induced Skew Spatial variant: Non-uniform power density generates on-chip temperature gradient Clock tree embedding considering the spatial temperature variation: TACO Ignore the time-variant temperature under different workloads Due to the distribution of different functional units, the power density is non-uniform over the chip. The left figure shows the Intel dual-core architecture. The power dissipation for a core is 15 times larger than that for a cache. Such non-uniform power density may cause significant on-chip temperature gradient as shown in the right figure. There is one piece of work presented in 05 considering such spatial temperature variation. However they ignore the temporal variation of the temperature due to different workloads.

Temporal Temperature Variation Induced Skew Significant different temperature maps from two SPEC2000 applications: Ammp, Gzip Dilemma: Optimizing skew for one application hurts the other…. If we apply different applications on the same chip, the temperature maps may be significantly different. We can achieve zero-skew in the left figure by selecting a good layout under the current temperature map, both source-to-sink paths delay values are 7ns. However, for the same clock tree layout, when the application changes, the on-chip temperature will change as well, which make the S->A delay as 2ns while S->B delay as 6ns, and the skew becomes 4ns instead of zero as before. And now we are actually in a dilemma that optimizing clock skew for one application may result very bad skew for the other. In fact, that’s exactly the problem we are trying to solve in this work!

Given: To find: Problem Formulation The source, sinks and an initial embedding of the clock tree Each region is modeled by mean and variance for temperature, and correlation between variations To find: An re-embedding of the clock tree To Minimize the worst case skew under all temperature variations Formally, we formulate this problem as follows. Given the source, sinks and an initial embedding of the clock tree, Each region is modeled by mean and variance for temperature, and correlation (co-variance) between variations. We try to find an re-embedding of the clock tree so that we can minimize the worst case skew under all temperature variations. The figure shows the result for one of our test designs, the black wires are the original clock embedding and red wires show the difference between the re-embedded tree and the original one.

Correlations in Temperature Variation Spatial and Temporal Correlation: Strong correlations exist between temperature for different workloads and different regions on chip Resource sharing between workloads cause temporal correlation Considering temperature correlations during optimization can compress searching space! (i,j) Correlation between area i and j By power-thermal simulation, we extract the correlation between temperature values for different workloads and different regions on chip. The following figure shows the extracted correlation map by a sequence inputs from 6 SPEC2000 applications. The element (i,j) in this map denotes the correlation strength between sub-region i and j under different workloads. We can observe strong correlation between temperature values for different workloads and different regions on chip. In fact, the correlation of temperature variance > 0.8 between most chip regions. By studying correlations, we can reduce the searching space in our algorithm since the same rules can be applied to those tree nodes with strong correlations.

Outline Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning

Re-embedding Process (An example) y a b c v Perturbation option Sink Let’s first see an example for our perturbation based algorithm. Given a clock tree topology as shown in the left and its embedding in the right. For each merging point, say x here, we consider several perturbation options, for each of which, we calculate the skew after doing such a perturbation Original merging point

Re-embedding Process (An example) y a b c v New merging point

Delay, Skew Calculation for Clock Tree The clock tree is a SIMO linear system Cares impulse responds in each sinks Perturbed Modified Nodal Analysis (MNA) x is for source, sinks and merging point L selects sink responses Defining a new state variable with both nominal (x) and perturbed state variables (Δx) Structured and parameterized state matrix The number of perturbation configurations I=5N is huge! (N is number of merging points)

Compressing State Matrix by Temperature Correlation Motivations Spatial and temporal correlation of the temperature values excludes the need to exhaustively calculate all perturbation combinations Highly correlated merging points should be perturbed in the same fashion Solution Clustering merging points based on correlation strength Perform the same perturbation for all points within one cluster

Merging Points Clustering by Temperature Correlation Objective Given correlation matrix C of them, a low-rank matrix, N >> K Partition N merging points into K clusters Maximize the correlation strength within each of K clusters C

Merging Points Clustering by Temperature Correlation Objective Given correlation matrix C of them, a low-rank matrix, N >> K Partition N merging points into K clusters Decide the clustering number K Singular Value Decomposition (SVD) reveal the real rank (K) information from C Partition the merging points into K clusters K-Means clustering algorithm is employed. Low-Rank Approx. K = 4, N = 70 Reduced from 570 to 54

Structural Reduction & Transient Time Analysis Cluster based reduction (SVD + K-Means) Structural reduction [Hao Yu, DAC’06] Transient time analysis (Back-Euler)

Outline Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning

Experimental Settings Temperature variation profiles obtained by micro-architecture level power-temperature transient simulator with 6 SPEC2000 applications 100 temperature profiles are collected under every 10 million clock cycles Compare two algorithms: DME method: minimize wire-length for zero-skew under Elmore delay model with nominal temperature Our PECO: minimize skew under a more accurate high-order macromodel with temperature variations

Skew Distribution Under 100 temperature maps, and PECO reduces worst-skew and the mean skew

Experimental Results (cont.) PECO reduces the worst-case skew by up to 5X (i.e., for net r5) Skew measured in higher-order delay model considering temperature variations for all applications Skew reduction increases for larger clock nets PECO increases wire-length by less than 1% Runtime Optimization time of PECO is less than DME Model building time is still long but more accurate Note that DME method achieves the optimal wire length under zero-skew constraints for deterministic scenario.

The methodologies can be extended to handle Conclusions Studied the clock optimization for workload dependent temperature variation Reduced the worst-case skew by up to 5X with only 1% wire-length overhead compared to best existing method The methodologies can be extended to handle PVT variations with spatial correlations Other design freedoms such as, floorplanning, power/ground optimization, etc

Reading Assignment Thermal aware clock Hao Yu, Yu Hu, Chuenchen Liu, and Lei He, "Minimal Skew Clock Embedding Considering Time Variant Temperature Variation Gradient," ACM International Symposium on Physical Design (ISPD) , March 2007. Thermal aware floorplanning Chun-Ta Chu, Xinyi Zhang, Lei He and Tom Tong Jing, "Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load," IEEE/ACM International Conf. on Computer-Aided Design (ICCAD) , 2007.