ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Slides:

Advertisements

Similar presentations

Engineering Optimization

Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.

Iterative Methods and QR Factorization Lecture 5 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.

Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Jonathan Richard Shewchuk Reading Group Presention By David Cline

Gradient Methods April Preview Background Steepest Descent Conjugate Gradient.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Function Optimization Newton’s Method. Conjugate Gradients

Tutorial 12 Unconstrained optimization Conjugate gradients.

Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.

Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn

Optimization Methods One-Dimensional Unconstrained Optimization

Gradient Methods Yaron Lipman May Preview Background Steepest Descent Conjugate Gradient.

CS240A: Conjugate Gradients and the Model Problem.

Function Optimization. Newton’s Method Conjugate Gradients Method

Orthogonality and Least Squares

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

CSE 245: Computer Aided Circuit Simulation and Verification

Optimization Methods One-Dimensional Unconstrained Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

UNCONSTRAINED MULTIVARIABLE

Systems of Linear Equations Iterative Methods

Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.

1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Nonlinear programming Unconstrained optimization techniques.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Discriminant Functions

AN ORTHOGONAL PROJECTION

1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.

1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.

CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.

Chapter 10 Real Inner Products and Least-Square

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Chapter 10 Minimization or Maximization of Functions.

ECE 476 Power System Analysis Lecture 12: Power Flow Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.

Steepest Descent Method Contours are shown below.

Gradient Methods In Optimization

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.

4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.

MA5233 Lecture 6 Krylov Subspaces and Conjugate Gradients Wayne M. Lawton Department of Mathematics National University of Singapore 2 Science Drive 2.

Krylov-Subspace Methods - II Lecture 7 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.

The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.

CSE 245: Computer Aided Circuit Simulation and Verification

CSE 245: Computer Aided Circuit Simulation and Verification

CS5321 Numerical Optimization

CSE 245: Computer Aided Circuit Simulation and Verification

Conjugate Gradient Method

Performance Optimization

Outline Preface Fundamentals of Optimization

Section 3: Second Order Methods

Conjugate Direction Methods

Presentation transcript:

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 11: Iterative Methods for Sparse Linear Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign haozhu@illinois.edu 10/7/2015

Iterative Methods for Sparse Linear Systems Direction solution methods based on LU decomposition were originally preferred because of their robustness and predictable behavior Iterative methods for solving general, large, sparse linear systems have been gaining more popularity, with the trend traced back from the 1960s and 1970s Iterative methods started to approach the quality and robustness of direct methods Iterative methods are easier to implement efficiently on high performance computers (HPCs) than direct methods Some of the notes are adopted from http://cseweb.ucsd.edu/classes/sp10/cse245/

Iterative Methods The problem is still to solve for Ax = b Stationary (or relaxation) methods: x(i+1) = Gx(i) + c where G and c do not depend on iteration count i Non-stationary methods: x(i+1) = x(i) + a(i) p(i) where computation involves information that changes at each iteration

Non-stationary methods Convergence conditions for stationary methods depend on matrix G and henceforth matrix A Therefore, their practical applicability could be limited by the specific linear systems to solve Non-stationary methods can overcome this limitation by casting it as an optimization problem and then adopting iterative first-order optimization methods

An Optimization Problem We will first focus on a simpler scenario that where A is symmetric (i.e., A = AT) and positive definite (i.e., A≻0, all eigenvalues nonnegative) Consider the quadratic problem 𝑓 𝒙 = 1 2 𝒙 𝑇 𝑨𝒙 − 𝒃 𝑇 𝒙 The optimal x* that minimizes f(x) is given by the solution of which is exactly the solution to Ax = b Steepest descent is a classical optimization method

Examples of Quadratic Functions Positive definite matrix b) negative-definite matrix c) Singular matrix d) positive indefinite matrix

Steepest Descent Algorithm Iteratively update x along the gradient direction −𝛻𝑓 𝐱 =𝐛 −𝐀𝐱 The stepsize is selected to minimize f(x) along −𝛻𝑓 𝒙 Set i=0, e > 0, x(0) = 0, so r(0) = b - Ax(0) = b While ||r(i) ||  e Do (a) calculate the best stepsize (b) x(i+1) = x(i) + a(i) r(i) (c) r(i+1) = r(i) - a(i) Ar(i) (d) i := i + 1 End While Note there is only one matrix, vector multiply per iteration

SD Example Starting at (-2,-2) take the direction of steepest descent of f Find the point on the intersection of the two surfaces that minimizes f Intersection of surfaces. The gradient at the bottommost point is orthogonal to the gradient of the previous step

SD Example: Solution Path

Steepest Descent Convergence At the limit point, we have the gradient r(i) = b - Ax(i) = 0 We define the A-norm of x We can show exponential convergence, that is 𝒙 𝑖 − 𝒙 ∗ A ≤ 𝜅−1 𝜅+1 𝑖 𝒙 0 − 𝒙 ∗ A where 𝜅 is the condition number of A, i.e.,

Steepest Descent Convergence Because (𝜅-1)/(𝜅+1) < 1 the error will decrease with each steepest descent iteration, albeit potentially quite slow for large 𝜅 The function values decreases a little quicker, as per but this can still be really slow if 𝜅 is large The issue is steepest descent often finds itself taking steps along the same direction as that of its earlier steps

SD Case Study Convergence speed also depends on the ratio of the initial guess’ components along the eigenvector directions, reflected by another parameter 𝜇 Worst case when 𝜇=±𝜅

Conjugate Direction Methods An improvement over the steepest descent is to take the exact number of steps using a set of search directions and obtain the solution after n such steps This is the basic idea in the conjugate direction methods Image compares steepest descent with a conjugate direction approach Image Source: http://en.wikipedia.org/wiki/File:Conjugate_gradient_illustration.svg

Conjugate Direction Methods The basic idea is the n search directions denoted by need to be A-orthogonal, that is At the ith iteration, we will update

(𝒅 (𝑖) )′𝛻𝑓 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 =(𝒅 (𝑖) )′(𝑨 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 −𝒃) Stepsize Selection The stepsize 𝛼 (𝑖) is chosen such that 𝑓 𝒙 (𝑖) + 𝛼 (𝑖) 𝒅 (𝑖) = min 𝛼 𝑓( 𝒙 𝑖 +𝛼 𝒅 𝑖 ) By setting to zero the derivative (𝒅 (𝑖) )′𝛻𝑓 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 =(𝒅 (𝑖) )′(𝑨 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖 −𝒃) we obtain the stepsize (cf. SD update in slide #7) 𝛼 (𝑖) = (𝒅 (𝑖) )′(𝒃−𝑨 𝒙 𝑖 ) (𝒅 (𝑖) )′𝑨 (𝒅 𝑖 ) = (𝒅 (𝑖) )′ 𝒓 𝑖 (𝒅 (𝑖) )′𝑨 𝒅 𝑖

Convergence Proof To prove the convergence of conjugate direction method, we can show that 𝒙 (𝑖+1) = arg min 𝒙∈ 𝑀 𝑖 𝑓(𝒙) where 𝑀 𝑖 ={ 𝒙 0 +span 𝒅 0 ,… 𝒅 𝑖 } This is exactly due to the A- orthogonality of 𝒅 𝑖 ’s Suppose all the d(0), d(1)… d(n-1) are linearly independent (l.i.), we have 𝑀 𝑛−1 =span 𝒅 0 ,… 𝒅 𝑛−1 = Rn Therefore, 𝒙 (𝑛) = arg min 𝑓 𝒙 = 𝒙 ∗ is the optimum

Linearly Independent Directions Proposition: If A is positive definite, and the set of nonzero vectors d(0), d(1)… d(n-1) are A-orthogonal, then these vectors are linearly independent (l.i.) Proof: Suppose there are constants ai, i=0,1,2,…n such Multiplying by A and then scalar product with d(i) gives Since A is positive definite, it follows ai = 0 Hence, the vectors are l.i. Recall l.i. only if all a's = 0

Conjugate Direction Method Given the search direction 𝒅 𝑖 , the i-th iteration What we have not yet covered is how to get the n search directions. We'll cover that shortly, but the next slide presents an algorithm, followed by an example.

Orthogonalization To quickly generate A–orthogonal search directions, one can use the so-termed Gram-Schmidt orthogonalization procedure Suppose we are given a l.i. set of n vectors {u0, u1, …, un-1}, successively construct d(j), j=0, 1, 2, … n-1, by removing from uj all the components along directions The trick is to use the gradient directions; i.e., ui = r(i) for all i=0,1,…,n-1, which will yield the very popular conjugate gradient method updates

Conjugate Gradient Method Set i=0, e > 0, x(0) = 0, so r(0) = b - Ax(0) = b While ||r(i) ||  e Do (a) If i = 0 Then d(0) = r(0) Else Begin 𝛽 (𝑖) = [ 𝒓 (𝑖) ] 𝑇 𝒓 (𝑖) [ 𝒓 (𝑖−1) ] 𝑇 𝒓 (𝑖−1) d(i) = r(i) + b(i)d(i-1) End Upon obtaining d(i) using GS, x(i) update is very similar to SD method

Conjugate Gradient Algorithm (b) Update stepsize 𝛼 (𝑖) = (𝒅 (𝑖) )′ 𝒓 𝑖 (𝒅 (𝑖) )′𝑨 𝒅 𝑖 (c) x(i+1) = x(i) + a(i) d(i) (d) r(i+1) = r(i) - a(i) Ad(i) (e) i := i + 1 End While Note that there is only one matrix vector multiply per iteration!

Conjugate Gradient Example Using the same system as before, let Select i=0, x(0) = 0, e = 0.1, then r(0) = b With i = 0, d(0) = r(0) = b

Conjugate Gradient Example 𝛼 (0) = (𝒅 (0) )′ 𝒓 0 (𝒅 (0) )′𝑨 𝒅 0 =0.0582 This first step exactly matches Steepest Descent

Conjugate Gradient Example With i=1 solve for b(1) Then 1 1 1 𝛼 (1) = (𝒅 (1) )′ 𝒓 1 (𝒅 (1) )′𝑨 𝒅 1 = 725 12450 =1.388

Conjugate Gradient Example And 1 1 1 1

Conjugate Gradient Example With i=2 solve for b(2) Then 2 2 2 1 𝛼 (3) = (𝒅 (2) )′ 𝒓 2 (𝒅 (2) )′𝑨 𝒅 2 =0.078

Conjugate Gradient Example And 2 2 2 2 Done in 3 = n iterations!

General Krylov Subspace Method In conjugate gradient, the iterate x(i) actually minimizes 𝑓 𝒙 = 1 2 𝒙 𝑇 𝑨𝒙 − 𝒃 𝑇 𝒙 over the linear manifold {x(0) + Ki(r(0),A)} A more generic Krylov method is to find x(i) in {x(0) + Ki(r(0),A)} that minimizes ||r(i) ||= ||b-Ax(i) || For positive definite (PD) A, both methods attain 𝒙 (𝑛) = 𝒙 ∗ = 𝐀 −1 𝒃 For non-PD A, we can use a more general Generalized Minimum Residual Method (GMRES)