Download presentation
Presentation is loading. Please wait.
1
Chapter 5b Stochastic Circuit Optimization
Prof. Lei He Electrical Engineering Department University of California, Los Angeles URL: eda.ee.ucla.edu
2
Outline on-chip decap allocation with stochastic current model Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning
3
Clock Tree Synthesis in Synchronous Circuits
Clock signals synchronize data transfer between functional elements in synchronous design Different clock structures exist [Tree, Mesh, Hybrid, etc] Clock skew is the delay difference between two sinks of clock tree Clock skew becomes one of the most significant concerns in clock tree synthesis for high performance designs PLL MEM-ctrll Sys Disp AUDIO VIDEO Source Intel For synchronized designs, data transfer between functional elements are synchronized by clock signals. In terms of topology, clock signal can be delivered by clock tree, clock mesh or hybrid clock network. In this work, we concern ourselves on clock tree synthesis only. One important issue in clock synthesis is Clock skew, which is the maximum difference in the arrival time of a clock signal at two different components. Clock skew forces designers to use a large time period between clock pulses. This makes the system slower. So, in addition to other objectives, clock skew should be minimized during clock routing. The right diagram shows the clock skew vs. clock frequency. The main observation is that as the frequency becomes higher, the skew is more comparable to the frequency. In fact, Clock skew becomes the NO.1 concern in clock tree synthesis for high performance designs.
4
Methodologies for Clock Skew Minimization
The sources of skew Un-balanced clock distribution Process, supply voltage and temperature (PVT) variation Uncertainty from loading Methodologies Active de-skew circuit using micro-controller Passive balanced embedding by CAD algorithms Variation-induced skew needs to be considered! s4 a b s1 s2 s3 s0 v The High-performance design is achieved by DSM or heterogeneous integration. It has two trends. One is to design for high-speed constrained by signal/power/thermal integrity. The other is to design for robustness under process/vdd/temperature variation. They bring the following new challenges for CAD. First, the design for high-speed introduces strong electromagnetic couplings. Second, the design in deep submicron results in a distributed circuit model with large number of nets and ports. Moreover, the integration is usually heterogeneous and hence results in a structured model with multi-physics In addition, the variation and modification introduce a large number of perturbations or parameters to a nominal design. It challenges the circuit level simulation because a detailed verification and automation will never finish. A fast simulator becomes a need. Embedding s0 s1 s3 s4 s2 a b v Topo-Gen
5
Outline Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Variation Sources: Spatial & Temporal Temperature Correlations Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning
6
Spatial Temperature Variation Induced Skew
Spatial variant: Non-uniform power density generates on-chip temperature gradient Clock tree embedding considering the spatial temperature variation: TACO Ignore the time-variant temperature under different workloads Due to the distribution of different functional units, the power density is non-uniform over the chip. The left figure shows the Intel dual-core architecture. The power dissipation for a core is 15 times larger than that for a cache. Such non-uniform power density may cause significant on-chip temperature gradient as shown in the right figure. There is one piece of work presented in 05 considering such spatial temperature variation. However they ignore the temporal variation of the temperature due to different workloads.
7
Temporal Temperature Variation Induced Skew
Significant different temperature maps from two SPEC2000 applications: Ammp, Gzip Dilemma: Optimizing skew for one application hurts the other…. If we apply different applications on the same chip, the temperature maps may be significantly different. We can achieve zero-skew in the left figure by selecting a good layout under the current temperature map, both source-to-sink paths delay values are 7ns. However, for the same clock tree layout, when the application changes, the on-chip temperature will change as well, which make the S->A delay as 2ns while S->B delay as 6ns, and the skew becomes 4ns instead of zero as before. And now we are actually in a dilemma that optimizing clock skew for one application may result very bad skew for the other. In fact, that’s exactly the problem we are trying to solve in this work!
8
Given: To find: Problem Formulation
The source, sinks and an initial embedding of the clock tree Each region is modeled by mean and variance for temperature, and correlation between variations To find: An re-embedding of the clock tree To Minimize the worst case skew under all temperature variations Formally, we formulate this problem as follows. Given the source, sinks and an initial embedding of the clock tree, Each region is modeled by mean and variance for temperature, and correlation (co-variance) between variations. We try to find an re-embedding of the clock tree so that we can minimize the worst case skew under all temperature variations. The figure shows the result for one of our test designs, the black wires are the original clock embedding and red wires show the difference between the re-embedded tree and the original one.
9
Correlations in Temperature Variation
Spatial and Temporal Correlation: Strong correlations exist between temperature for different workloads and different regions on chip Resource sharing between workloads cause temporal correlation Considering temperature correlations during optimization can compress searching space! (i,j) Correlation between area i and j By power-thermal simulation, we extract the correlation between temperature values for different workloads and different regions on chip. The following figure shows the extracted correlation map by a sequence inputs from 6 SPEC2000 applications. The element (i,j) in this map denotes the correlation strength between sub-region i and j under different workloads. We can observe strong correlation between temperature values for different workloads and different regions on chip. In fact, the correlation of temperature variance > 0.8 between most chip regions. By studying correlations, we can reduce the searching space in our algorithm since the same rules can be applied to those tree nodes with strong correlations.
10
Outline Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning
11
Re-embedding Process (An example)
y a b c v Perturbation option Sink Let’s first see an example for our perturbation based algorithm. Given a clock tree topology as shown in the left and its embedding in the right. For each merging point, say x here, we consider several perturbation options, for each of which, we calculate the skew after doing such a perturbation Original merging point
12
Re-embedding Process (An example)
y a b c v New merging point
13
Delay, Skew Calculation for Clock Tree
The clock tree is a SIMO linear system Cares impulse responds in each sinks Perturbed Modified Nodal Analysis (MNA) x is for source, sinks and merging point L selects sink responses Defining a new state variable with both nominal (x) and perturbed state variables (Δx) Structured and parameterized state matrix The number of perturbation configurations I=5N is huge! (N is number of merging points)
14
Compressing State Matrix by Temperature Correlation
Motivations Spatial and temporal correlation of the temperature values excludes the need to exhaustively calculate all perturbation combinations Highly correlated merging points should be perturbed in the same fashion Solution Clustering merging points based on correlation strength Perform the same perturbation for all points within one cluster
15
Merging Points Clustering by Temperature Correlation
Objective Given correlation matrix C of them, a low-rank matrix, N >> K Partition N merging points into K clusters Maximize the correlation strength within each of K clusters C
16
Merging Points Clustering by Temperature Correlation
Objective Given correlation matrix C of them, a low-rank matrix, N >> K Partition N merging points into K clusters Decide the clustering number K Singular Value Decomposition (SVD) reveal the real rank (K) information from C Partition the merging points into K clusters K-Means clustering algorithm is employed. Low-Rank Approx. K = 4, N = 70 Reduced from 570 to 54
17
Structural Reduction & Transient Time Analysis
Cluster based reduction (SVD + K-Means) Structural reduction [Hao Yu, DAC’06] Transient time analysis (Back-Euler)
18
Outline Thermal Aware Clock Tree Routing Backgrounds and Motivations Modeling and Problem Formulation Algorithms Experimental Results Conclusions Temperature Aware Microprocessor Floorplanning
19
Experimental Settings
Temperature variation profiles obtained by micro-architecture level power-temperature transient simulator with 6 SPEC2000 applications 100 temperature profiles are collected under every 10 million clock cycles Compare two algorithms: DME method: minimize wire-length for zero-skew under Elmore delay model with nominal temperature Our PECO: minimize skew under a more accurate high-order macromodel with temperature variations
20
Skew Distribution Under 100 temperature maps, and PECO reduces worst-skew and the mean skew
21
Experimental Results (cont.)
PECO reduces the worst-case skew by up to 5X (i.e., for net r5) Skew measured in higher-order delay model considering temperature variations for all applications Skew reduction increases for larger clock nets PECO increases wire-length by less than 1% Runtime Optimization time of PECO is less than DME Model building time is still long but more accurate Note that DME method achieves the optimal wire length under zero-skew constraints for deterministic scenario.
22
The methodologies can be extended to handle
Conclusions Studied the clock optimization for workload dependent temperature variation Reduced the worst-case skew by up to 5X with only 1% wire-length overhead compared to best existing method The methodologies can be extended to handle PVT variations with spatial correlations Other design freedoms such as, floorplanning, power/ground optimization, etc
23
Reading Assignment Thermal aware clock
Hao Yu, Yu Hu, Chuenchen Liu, and Lei He, "Minimal Skew Clock Embedding Considering Time Variant Temperature Variation Gradient," ACM International Symposium on Physical Design (ISPD) , March 2007. Thermal aware floorplanning Chun-Ta Chu, Xinyi Zhang, Lei He and Tom Tong Jing, "Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load," IEEE/ACM International Conf. on Computer-Aided Design (ICCAD) , 2007.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.