Fang Gong1, Hao Yu2, and Lei He1 1Electrical Engineering Dept., UCLA

PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation
Fang Gong1, Hao Yu2, and Lei He1 1Electrical Engineering Dept., UCLA 2Berkeley Design Automation Presented by Fang Gong Thanks for introduction. Today, I will present “A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation”. This paper is a joint work of Hao Yu in Berkely Design Automation, and was supervised by Prof. Lei He in UCLA. 1

Outline Background and Motivation Algorithms Experimental Results
Conclusion and Future Work I’ll follow this outline to give my presentation. First, we will give background and motivation. Then, we introduce our new geometric moments, and explain how to perform the parallel FMM and incremental GMRES. Finally, we will show some experimental results and conclude this talk. <click> First, we will talk about background and motivation. 2

Conclusion and Future Work I’ll follow this outline to give my presentation. First, we will give background and motivation. Then, we introduce our new geometric moments, and explain how to perform the parallel FMM and incremental GMRES. Finally, we will show some experimental results and conclude this talk. <click> First, we will talk about background and motivation. 3

Process Variation and Cap Extraction
From [Kang and Gupta] Process variation leads to capacitance variation OPC lithography and CMP polishing As the feature size scaling down in recent years, VLSI design suffers from process variation that comes from chemical mechanical polishing (CMP) and optical proximity effects and can not be neglected any more. Figures show the preliminary results for a chip layout in 90nm technology. Right figure shows the fraction of the M2-layer wire segments whose extracted capacitance change by a given amount when actual litho-simulated shapes are taken into account. It is obvious that the capacitance will be changed due to process variation. It can impact timing and signal integrity analysis, and lead to delay variation and analog mismatch effect. So, it is important to extract capacitance accurately considering process variation. Capacitance variation affects circuit performance Delay variation and analog mismatch 4

Background of BEM Based Cap Extraction
Source Panel j Capacitance extraction in FastCap Procedures Discretize metal surface into panels Form linear system by collocation Results in dense potential coeffs Solve by iterative GMRES Fast Multipole method (FMM) to evaluate Matrix Vector Product (MVP) Preconditioned GMRES iteration with guided convergence Observe Panel i First, let me introduce some background about fastcap, one BEM based capacitance extraction method. In Fastcap, it partitions conductor surfaces into small panels, and compute the potential coefficient matrix with the equation for each Pij. For example, the potential at panel i due to the charge at source panel j can be evaluated with this equation. As such, it has a linear system Pq=v, q is charge vector, v is known panel potential. Then, it will solve the linear system with preconditioned GMRES, and use FMM to evaluate Matrix Vector Product P*q. In this way, we can find two problems to apply Fastcap for stochastic capacitance extraction: (1) How to consider process variation in FMM? (2) How to consider different variations from many sources in precondition? Difficulties for stochastic capacitance extraction How to consider variations in FMM? How to consider different variations in precondition? 5

Motivation of Our Work Existing works
Stochastic integral by low-rank approximation Zhu, Z. and White, J. “FastSies: a fast stochastic integral equation solver for modeling the rough surface effect”. In Proceedings of IEEE/ACM ICCAD 2005. Pros: Rigorous formulation Cons: Random integral is slow for full-chip extraction Stochastic orthogonal polynomial (SOP) expansion Cui, J., and etc. “Variational capacitance modeling using orthogonal polynomial method”. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI 2008. Pros: An efficient non-Monte-Carlo approach Cons: SOP expansion results in an augmented and dense linear system To handle the problem, there are many works on stochastic capacitance extraction. For example, stochastic integral by low-rank approximate method is proposed, but it is slow for full-chip extraction. Another is the stochastic orthogonal polynomial expansion method, which results in an augmented and dense system Therefore, our objective is mainly to improve the efficiency of variational capacitance extraction. First, FMM with nearly linear complexity can be used to reduce time for MVP, and further be parallelized. Also, for different variation, preconditioner should be updated incrementally. Objective of our work Fast multi-pole method (FMM) with nearly O(n) performance with a further parallel improvement. Pre-conditioner should be updated incrementally for different variation. 6

Conclusion and Future Work Now, let’s introduce the stochastic geometric moment first. 7

Flow of piCAP Represent Pij with stochastic geometric moments
Solve with GMRES Build Spectrum preconditioner Evaluate the MVP (Pxq) with FMM in parallel Calculate Cij with the charge distribution. Geometry Info Process Variation Incrementally update preconditioner Geometric Moments Potential Coefficient In order to do so, we develop a stochastic capacitance extraction method, called piCAP. The entire computing flow is shown as this figure. First, piCAP makes use of geometric moments to integrate process variation into potetial coefficient matrix. In other words, Pij can be expressed with an explicit geometry dependence, which can capture process variation correctly. Thus, we develop a modified FMM so that we can evaluate MVP p*q with variation in parallel. Furthermore, we develop iGMRES to solve the variational system. We use spectral preconditioner to improve iteration convergence, and we incrementally update it for different variation. Represent Pij with stochastic geometric moments Use parallel FMM to evaluate MVP of Pxq Obtain capacitance (mean and variance) with incrementally preconditioned GMRES 8

Stochastic Geometric Moment
source-cube observer cube source-panel d0 r0 In this paper, we consider two stochastic variation sources: panel distance ‘d’ and panel width ‘h’. To define geometric moments, we need first consider multipole expansion along x-y-z coordinates. Assume there are two cubes: source cube and observer cube as shown in this figure. ‘r’, ‘d’ are vectors. We can define the multipole expansion of one over r minus d, which is the potential on center of observer cube due to the unit charge on source panel in the free space. For local expansion, denotes by Li, can be simply achieved by exchanging d and h in Mi.Clearly, both Mi and Li have an explicit dependence on panel width h and distance d. We call them geometric moments. Consider two independent variation sources: panel distance (d) and panel width (w)‏ Multipole expansion along x-y-z coordinates: multipole moments and local moments Mi and Li show an explicit dependence on geometry parameters, and are called geometric moments. 9

Stochastic Potential Coefficient Expansion
Relate geometric parameters to random variables Let be random variable for panel width w, and be random variable for panel distance d. Geometric moments Mp and Lp are: Now, the potential coefficient is For two different variation sources, the moments becomes as follows: \xi_h is variation for panel width, \xi_d for panel distance. Thus, geometric moments including variations can be written with these two equations with 1-order SOP. As such, Pij can be expressed with Mi, Li and thus variations \xi_d and \xi_h with moments. Furthermore, we can derive the general expression of SOP of P potential matrix and q vector. This equation shows the n-order expansion of P, while q can be expanded up to m order. 10

Stochastic Potential Coefficient Expansion
Relate geometric parameters to random variables Let be random variable for panel width w, and be random variable for panel distance d. Geometric moments Mp and Lp are: Now, the potential coefficient is n-order stochastic orthogonal polynomial expansion of P Accordingly, m-th order (m = 2n + n(n − 1)) expansion of charge is: For two different variation sources, the moments becomes as follows: \xi_h is variation for panel width, \xi_d for panel distance. Thus, geometric moments including variations can be written with these two equations with 1-order SOP. As such, Pij can be expressed with Mi, Li and thus variations \xi_d and \xi_h with moments. Furthermore, we can derive the general expression of SOP of P potential matrix and q vector. This equation shows the n-order expansion of P, while q can be expanded up to m order. 11

Augmented System Recap: SOP expansion leads to a large and dense system equation Potential Coefficient Solve with GMRES Build Spectrum preconditioner Evaluate the MVP (Pxq) with FMM in parallel Calculate Cij with the charge distribution. Geometric Moments Incrementally update preconditioner Geometry Info Process Variation After taking inner-product with each OP by two sides, the Pq=v system will become one augmented and dense linear system as shown in this equation. The variables increase up to 3X of original system. Therefore, it will be time- consuming to calculate its MVP with charge vector and further solve it. So, the next problem is how to evaluate MVP with parallel FMM to reduce CPU time, and how to solve it efficiently with precondition GRMES including spectrum preconditoiner as well as incremental precondition. We will discuss these topics in following sections one by one. 12

Parallel Fast Multipole Method--upward
Overview of Parallel fast-multipole method (FMM)‏ group panels in cubes, and build hierarchical tree for cubes We use 8-degree trees in implementation, but use 2-degree trees for illustration here. A parallel FMM distributes cubes to different processors Upward Pass Level 0 Level 1 As we known, FMM use hierarchical tree to evaluate MVP with O(N) or O(N log N) complexity. But, our augmented system is larger and denser, so we hope to evaluate it more efficient. Thus, we develop parallel FMM to do so. It first build oct-tree structure, which means each cube has 8 child cubs except bottom cubes. Then, it distributes all cubes into different processors. We use 2 dimension tree to explain key steps: upward pass and downward pass. As shown in the figure,…(follow animation scripts)… Level 2 M2M Level 3 starting from bottom level, it calculates stochastic geometric moments M2M operations can be performed in parallel at different nodes Update parent’s moments by summing the moments of its children— called M2M operation 13

Parallel Fast Multipole Method--Downwards
Downward Pass M2L Level 0 L2L Level 1 Level 2 M2M Level 3 After that, it enters downward pass step, which distribute parent node interaction down to its children. Firstly,…(follow animation scripts)… M2M, L2L are local operations, while M2L is global operation. How to reduce communication traffic due to global operation? At the top level, calculation of potential between two cubes—called M2L operation. Sum L2L results with near-field potential for all panels at bottom level and return Pxq potential is further distributed down to children from their parent in parallel—called L2L operation. Calculate near-field potential directly in parallel 14

Reduction of traffic between processors
Cube 1 Cube under calculation Cube 0 Dependency List … Cube k In parallel FMM, it needs efficient communication of multi-pole expansion moments between processors. We notice data dependency comes from the interaction list during M2L operations. That is a local cube needs to know the ME moments from cubes in its interaction list. One straightforward way is to build interaction list with all remote cubes whose ME contribute to the potential for given cubes. But this will result in huge communication overhead. So, for given cube, we use complement Interaction list which list all cubes that require ME moments of it. As such, FMM can anticipate which ME moments will be required by other processors, and distribute required moments prior to the compuation. Global data dependence exists in M2L operation at the top level Pre-fetch moments: distributes its moments to all cubes on its dependency list before the calculations. As such, it can hide communication time. 15

Flow of piCAP Use spectrum pre-conditioner to accelerate convergence
Potential Coefficient Solve with GMRES Build Spectrum preconditioner Evaluate the MVP (Pxq) with FMM in parallel Calculate Cij with the charge distribution. Geometric Moments Incrementally update preconditioner Geometry Info Process Variation In order to do so, we develop a stochastic capacitance extraction method, called piCAP. The entire computing flow is shown as this figure. First, piCAP makes use of geometric moments to integrate process variation into potetial coefficient matrix. In other words, Pij can be expressed with an explicit geometry dependence, which can capture process variation correctly. Thus, we develop a modified FMM so that we can evaluate MVP p*q with variation in parallel. Furthermore, we develop iGMRES to solve the variational system. We use spectral preconditioner to improve iteration convergence, and we incrementally update it for different variation. Use spectrum pre-conditioner to accelerate convergence Incrementally update the pre-conditioner for different variation. 16

Deflated Spectral Iteration
Why need spectral preconditioner GMRES needs too many iterations to achieve convergence. Spectral preconditioner shifts the spectrum of system matrix to improve the iteration convergence Deflated spectral iteration k (k=1 power iteration) partial eigen-pairs Spectrum preconditioner Why need incremental precondition Variation can significantly change spectral distribution Building each pre-conditioner for different variations is expensive Simultaneously considering all variations increases the complexity of our model. With geometric moments, we have one large system, and can evaluate P*q with FMM in parallel. Then, the system Pq=v should be solved with GMRES, but it may need too many iterations. So, in order to improve GMRES convergence, we use spectral preconditioner W to shift the eigen-value distribution of P matrix. To build this preconditioner, we first use power iteration algorithm to calculate first K eigen-pairs Vk and Dk. Then, the spectrum preconditioner can be built with this equations. As shown in below figure, left is the spectrum of original P matrix, and right is the spectrum of preconditioned P matrix. We can observe that eigen-values are shift to near one, and well clustered after precondition. 17

Incremental Precondition
For updated system , the update for the i-th eigen vector is: is the subspace composed of is the updated spectrum Updated pre-conditioner W’ is Inverse operation only involves diagonal matrix DK Consider different variations by updating the nominal preconditioner partially.

Outline Background and Motivation Algorithm Experimental Results
Conclusion and Future Work Next, we give some experimental results to validate our method. 19

2 panels, d0 = 4.24μm, w0 = 1μm, d1 = 20%d0, w1 = 20%h0
Accuracy Comparison Setup: two panels with random variation for distance d and width w Result: Stochastic Geometric Moments have high accuracy with average error of 1.8%, and can be up to ~1000X faster than MC 2 panels, d0 = 7.07μm, w0 = 1μm, d1 = 20%d0 MC (3000)‏ piCAP Cij (fF)‏ Time(s)‏ 2 panels, d0 = 11.31μm, w0 = 1μm, d1 = 10%d0 2 panels, d0 = 4.24μm, w0 = 1μm, d1 = 20%d0, w1 = 20%h0 First, we verify the accuracy of our method. We use two distant panels with random variations for distance and width, respectively. We compare piCAP with Monte-Carlo method which perform 3000 times. Under different geometry parameter and variation configuration, we can see the piCAP not only keep high accuracy with maximum 1.8% error, but also be up to ~1000X faster than Monte-Carlo. 20

Runtime for parallel FMM
Setup Two-layer example with 20 conductors. Other: 40, 80, 160 conductors Evaluate Pxq (MVP) with 10% perturbation on panel distance Result All examples can have about 3X speedup with 4 processors #wire 20 40 80 160 #panels 12360 10320 11040 12480 1 proc. /1.0 /1.0 /1.0 /1.0 2 proc. /1.7X /1.4X /1.7X /1.7X 3 proc. /2.0X /2.0X /2.0X /2.0X 4 proc. /2.7X /2.9X /3.0X /2.8X Then, we validate the speed. In this experiment, we use two-layer structure with 20, 40, 80 and 160 conductors. The 20 conductor case is shown in right figure. We compare the MVP time with 10% perturbation on panel distance. The time with different processor number are compared in follow table. From this table, we can observe that all examples can have 3X speedup with 4 processors due to parallel FMM acceleration. 21

Efficiency of spectral preconditioner
Setup: Three test structures: single plate, 2x2 bus, cubic Result Compare diagonal precondition with spectrum precondition Spectrum precondition accelerates convergence of GMRES (3X). Next, we verify the efficiency of spectrum preconditioner. In this experiment, we use 3 different structures: single plate, cross-over structure, cube. We use simple diagonal preconditioner as the baseline. From this table, we can see spectrum preconditioner can accelerate GMRES convergence with 3X speedup for these examples. Although spectrum preconditioner needs a little more time to be built than diagonal preconditioner, it becomes neglectable when compared with time for MVP evaluation, especially MVP will be repeated many times. So, the total runtime of spectrum preconditioner will be less than that of diagonal preconditioner. # panel # variable diagonal prec. spectral prec. # iter Time(s)‏ plate 256 768 29 24.59 11 8.625 cubic 864 2592 32 49.59 19.394 bus 1272 3816 41 72.58 15 29.21 22 22

Speedup by Incremental Precondition
Setup Test on two-layer 20 conductor example Incremental update of nominal pre-conditioner for different variation sources Compare with non-incremental one Result: Up to 15X speedup over non-incremental results, and only incremental one can finish all large examples. discretization w-t-l #panel #variable Total Runtime (s)‏ Non-incremental incremental 3x3x7 2040 6120 81.375 3x3x15 3960 11880 3x3x24 18360 - 3x3x50 12360 37080 Finally, we compare the total runtime of incremental GMRES with non- incremental case. In this experiment, we use 2-layer structure with 20 conductors, but apply different discretization. We list the discretization, panel number, variable number and total time for two methods in following table. And results shows about 15X speedup over non-incremental method. Moreover, only incremental GMRES can finish all large examples. 23

Conclusion and Future Work
Introduce stochastic geometric moments Develop a parallel FMM to evaluate the matrix- vector product with process variation Develop a spectral pre-conditioner incrementally to consider different variations Future Work: extend our parallel and incremental solver to solve other IC-variation related stochastic analysis To conclude my talk, I summarize main contribution of this paper. We propose stochastic geometric moments to integrate process variation into FMM, and develop a fast parallel FMM to evaluate the MVP with variation. Moreover, we develop incremental preconditioned GMERS to consider different variation with improved convergence. Our future work may apply this parallel and incremental solver to solve other stochastic analysis. 24

Thanks PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong, Hao Yu and Lei He Thanks. 25

Fang Gong1, Hao Yu2, and Lei He1 1Electrical Engineering Dept., UCLA

Similar presentations

Presentation on theme: "Fang Gong1, Hao Yu2, and Lei He1 1Electrical Engineering Dept., UCLA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fang Gong1, Hao Yu2, and Lei He1 1Electrical Engineering Dept., UCLA

Similar presentations

Presentation on theme: "Fang Gong1, Hao Yu2, and Lei He1 1Electrical Engineering Dept., UCLA"— Presentation transcript:

Similar presentations

About project

Feedback