Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Similar presentations


Presentation on theme: "ECE 530 – Analysis Techniques for Large-Scale Electrical Systems"— Presentation transcript:

1 ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Lecture 11: Iterative Methods for Sparse Linear Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 10/7/2015

2 Iterative Methods for Sparse Linear Systems
Direction solution methods based on LU decomposition were originally preferred because of their robustness and predictable behavior Iterative methods for solving general, large, sparse linear systems have been gaining more popularity, with the trend traced back from the 1960s and 1970s Iterative methods started to approach the quality and robustness of direct methods Iterative methods are easier to implement efficiently on high performance computers (HPCs) than direct methods Some of the notes are adopted from

3 Iterative Methods The problem is still to solve for Ax = b
Stationary (or relaxation) methods: x(i+1) = Gx(i) + c where G and c do not depend on iteration count i Non-stationary methods: x(i+1) = x(i) + a(i) p(i) where computation involves information that changes at each iteration

4 Non-stationary methods
Convergence conditions for stationary methods depend on matrix G and henceforth matrix A Therefore, their practical applicability could be limited by the specific linear systems to solve Non-stationary methods can overcome this limitation by casting it as an optimization problem and then adopting iterative first-order optimization methods

5 An Optimization Problem
We will first focus on a simpler scenario that where A is symmetric (i.e., A = AT) and positive definite (i.e., A≻0, all eigenvalues nonnegative) Consider the quadratic problem 𝑓 𝒙 = 1 2 𝒙 𝑇 𝑨𝒙 − 𝒃 𝑇 𝒙 The optimal x* that minimizes f(x) is given by the solution of which is exactly the solution to Ax = b Steepest descent is a classical optimization method

6 Examples of Quadratic Functions
Positive definite matrix b) negative-definite matrix c) Singular matrix d) positive indefinite matrix

7 Steepest Descent Algorithm
Iteratively update x along the gradient direction −𝛻𝑓 𝐱 =𝐛 −𝐀𝐱 The stepsize is selected to minimize f(x) along −𝛻𝑓 𝒙 Set i=0, e > 0, x(0) = 0, so r(0) = b - Ax(0) = b While ||r(i) ||  e Do (a) calculate the best stepsize (b) x(i+1) = x(i) + a(i) r(i) (c) r(i+1) = r(i) - a(i) Ar(i) (d) i := i + 1 End While Note there is only one matrix, vector multiply per iteration

8 SD Example Starting at (-2,-2) take the direction of steepest descent of f Find the point on the intersection of the two surfaces that minimizes f Intersection of surfaces. The gradient at the bottommost point is orthogonal to the gradient of the previous step

9 SD Example: Solution Path

10 Steepest Descent Convergence
At the limit point, we have the gradient r(i) = b - Ax(i) = 0 We define the A-norm of x We can show exponential convergence, that is 𝒙 𝑖 − 𝒙 ∗ A ≤ 𝜅−1 𝜅+1 𝑖 𝒙 0 − 𝒙 ∗ A where 𝜅 is the condition number of A, i.e.,

11 Steepest Descent Convergence
Because (𝜅-1)/(𝜅+1) < 1 the error will decrease with each steepest descent iteration, albeit potentially quite slow for large 𝜅 The function values decreases a little quicker, as per but this can still be really slow if 𝜅 is large The issue is steepest descent often finds itself taking steps along the same direction as that of its earlier steps

12 SD Case Study Convergence speed also depends on the ratio of the initial guess’ components along the eigenvector directions, reflected by another parameter 𝜇 Worst case when 𝜇=±𝜅

13 Conjugate Direction Methods
An improvement over the steepest descent is to take the exact number of steps using a set of search directions and obtain the solution after n such steps This is the basic idea in the conjugate direction methods Image compares steepest descent with a conjugate direction approach Image Source:

14 Conjugate Direction Methods
The basic idea is the n search directions denoted by need to be A-orthogonal, that is At the ith iteration, we will update

15 (𝒅 (𝑖) )′𝛻𝑓 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 =(𝒅 (𝑖) )′(𝑨 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 −𝒃)
Stepsize Selection The stepsize 𝛼 (𝑖) is chosen such that 𝑓 𝒙 (𝑖) + 𝛼 (𝑖) 𝒅 (𝑖) = min 𝛼 𝑓( 𝒙 𝑖 +𝛼 𝒅 𝑖 ) By setting to zero the derivative (𝒅 (𝑖) )′𝛻𝑓 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 =(𝒅 (𝑖) )′(𝑨 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 −𝒃) we obtain the stepsize (cf. SD update in slide #7) 𝛼 (𝑖) = (𝒅 (𝑖) )′(𝒃−𝑨 𝒙 𝑖 ) (𝒅 (𝑖) )′𝑨 (𝒅 𝑖 ) = (𝒅 (𝑖) )′ 𝒓 𝑖 (𝒅 (𝑖) )′𝑨 𝒅 𝑖

16 Convergence Proof To prove the convergence of conjugate direction method, we can show that 𝒙 (𝑖+1) = arg min 𝒙∈ 𝑀 𝑖 𝑓(𝒙) where 𝑀 𝑖 ={ 𝒙 0 +span 𝒅 0 ,… 𝒅 𝑖 } This is exactly due to the A- orthogonality of 𝒅 𝑖 ’s Suppose all the d(0), d(1)… d(n-1) are linearly independent (l.i.), we have 𝑀 𝑛−1 =span 𝒅 0 ,… 𝒅 𝑛−1 = Rn Therefore, 𝒙 (𝑛) = arg min 𝑓 𝒙 = 𝒙 ∗ is the optimum

17 Linearly Independent Directions
Proposition: If A is positive definite, and the set of nonzero vectors d(0), d(1)… d(n-1) are A-orthogonal, then these vectors are linearly independent (l.i.) Proof: Suppose there are constants ai, i=0,1,2,…n such Multiplying by A and then scalar product with d(i) gives Since A is positive definite, it follows ai = 0 Hence, the vectors are l.i. Recall l.i. only if all a's = 0

18 Conjugate Direction Method
Given the search direction 𝒅 𝑖 , the i-th iteration What we have not yet covered is how to get the n search directions. We'll cover that shortly, but the next slide presents an algorithm, followed by an example.

19 Orthogonalization To quickly generate A–orthogonal search directions, one can use the so-termed Gram-Schmidt orthogonalization procedure Suppose we are given a l.i. set of n vectors {u0, u1, …, un-1}, successively construct d(j), j=0, 1, 2, … n-1, by removing from uj all the components along directions The trick is to use the gradient directions; i.e., ui = r(i) for all i=0,1,…,n-1, which will yield the very popular conjugate gradient method updates

20 Conjugate Gradient Method
Set i=0, e > 0, x(0) = 0, so r(0) = b - Ax(0) = b While ||r(i) ||  e Do (a) If i = 0 Then d(0) = r(0) Else Begin 𝛽 (𝑖) = [ 𝒓 (𝑖) ] 𝑇 𝒓 (𝑖) [ 𝒓 (𝑖−1) ] 𝑇 𝒓 (𝑖−1) d(i) = r(i) + b(i)d(i-1) End Upon obtaining d(i) using GS, x(i) update is very similar to SD method

21 Conjugate Gradient Algorithm
(b) Update stepsize 𝛼 (𝑖) = (𝒅 (𝑖) )′ 𝒓 𝑖 (𝒅 (𝑖) )′𝑨 𝒅 𝑖 (c) x(i+1) = x(i) + a(i) d(i) (d) r(i+1) = r(i) - a(i) Ad(i) (e) i := i + 1 End While Note that there is only one matrix vector multiply per iteration!

22 Conjugate Gradient Example
Using the same system as before, let Select i=0, x(0) = 0, e = 0.1, then r(0) = b With i = 0, d(0) = r(0) = b

23 Conjugate Gradient Example
𝛼 (0) = (𝒅 (0) )′ 𝒓 (𝒅 (0) )′𝑨 𝒅 =0.0582 This first step exactly matches Steepest Descent

24 Conjugate Gradient Example
With i=1 solve for b(1) Then 1 1 1 𝛼 (1) = (𝒅 (1) )′ 𝒓 (𝒅 (1) )′𝑨 𝒅 1 = =1.388

25 Conjugate Gradient Example
And 1 1 1 1

26 Conjugate Gradient Example
With i=2 solve for b(2) Then 2 2 2 1 𝛼 (3) = (𝒅 (2) )′ 𝒓 (𝒅 (2) )′𝑨 𝒅 2 =0.078

27 Conjugate Gradient Example
And 2 2 2 2 Done in 3 = n iterations!

28 General Krylov Subspace Method
In conjugate gradient, the iterate x(i) actually minimizes 𝑓 𝒙 = 1 2 𝒙 𝑇 𝑨𝒙 − 𝒃 𝑇 𝒙 over the linear manifold {x(0) + Ki(r(0),A)} A more generic Krylov method is to find x(i) in {x(0) + Ki(r(0),A)} that minimizes ||r(i) ||= ||b-Ax(i) || For positive definite (PD) A, both methods attain 𝒙 (𝑛) = 𝒙 ∗ = 𝐀 −1 𝒃 For non-PD A, we can use a more general Generalized Minimum Residual Method (GMRES)


Download ppt "ECE 530 – Analysis Techniques for Large-Scale Electrical Systems"

Similar presentations


Ads by Google