Computational Intelligence

Slides:



Advertisements
Similar presentations
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Advertisements

Computational Intelligence Winter Term 2012/13 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2013/14 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence
Computational Intelligence Winter Term 2014/15 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Computational Intelligence Winter Term 2013/14 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
PTAS for Bin-Packing. Special Cases of Bin Packing 1. All item sizes smaller than Claim 1: Proof: If then So assume Therefore:
Tutorial 12 Unconstrained optimization Conjugate gradients.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Ordinary least squares regression (OLS)
Why Function Optimization ?

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
ENCI 303 Lecture PS-19 Optimization 2
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Improving Performance of The Interior Point Method by Preconditioning Project Proposal by: Ken Ryals For: AMSC Fall 2007-Spring 2008.
7 7.2 © 2016 Pearson Education, Ltd. Symmetric Matrices and Quadratic Forms QUADRATIC FORMS.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational Optimization
Computational Intelligence
Computational Intelligence
CSE 245: Computer Aided Circuit Simulation and Verification
Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational Intelligence
CSE 245: Computer Aided Circuit Simulation and Verification
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
Derivative of scalar forms
Computational Intelligence
Evolutionary Computation,
PTAS for Bin-Packing.
Computational Intelligence
~ Least Squares example
~ Least Squares example
Computational Intelligence
Performance Surfaces.
Computational Intelligence
Performance Optimization
Computational Intelligence
Outline Preface Fundamentals of Optimization
Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.
Computational Intelligence
Computational Intelligence
Outline Preface Fundamentals of Optimization
Computational Intelligence
Computational Intelligence
Computational Intelligence
Conjugate Direction Methods
Presentation transcript:

Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund

Towards CMA-ES mutation: Y = X + Z Z ~ N(0, C) multinormal distribution maximum entropy distribution for support Rn, given expectation vector and covariance matrix how should we choose covariance matrix C? unless we have not learned something about the problem during search  don‘t prefer any direction!  covariance matrix C = In (unit matrix) x C = diag(s1,...,sn) C orthogonal x C = In x G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 2

Towards CMA-ES claim: mutations should be aligned to isolines of problem (Schwefel 1981) if true then covariance matrix should be inverse of Hessian matrix!  assume f(x)  ½ x‘Ax + b‘x + c  H = A Z ~ N(0, C) with density since then many proposals how to adapt the covariance matrix  extreme case: use n+1 pairs (x, f(x)), apply multiple linear regression to obtain estimators for A, b, c invert estimated matrix A! OK, but: O(n6)! (Rudolph 1992) G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 3

Towards CMA-ES doubts: are equi-aligned isolines really optimal? principal axis should point into negative gradient direction! (proof next slide) most (effective) algorithms behave like this: run roughly into negative gradient direction, sooner or later we approach longest main principal axis of Hessian, now negative gradient direction coincidences with direction to optimum, which is parallel to longest main principal axis of Hessian, which is parallel to the longest main principal axis of the inverse covariance matrix (Schwefel OK in this situation) G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 4

Towards CMA-ES Z = rQu, A = B‘B, B = Q-1 if Qu were deterministic ...  set Qu = -f(x) (direction of steepest descent) G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 5

Towards CMA-ES Apart from (inefficient) regression, how can we get matrix elements of Q?  iteratively: C(k+1) = update( C(k), Population(k) ) basic constraint: C(k) must be positive definite (p.d.) and symmetric for all k ≥ 0, otherwise Cholesky decomposition impossible: C = Q‘Q Lemma Let A and B be quadratic matrices and ,  > 0. A, B symmetric   A +  B symmetric. A positive definite and B positive semidefinite   A +  B positive definite Proof: ad a) C =  A +  B symmetric, since cij =  aij +  bij =  aji +  bji = cji ad b) x  Rn \ {0}: x‘(A +  B) x =  x‘Ax +  x‘Bx > 0 > 0 ≥ 0 ■ G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 6

Towards CMA-ES Theorem A quadratic matrix C(k) is symmetric and positive definite for all k ≥ 0, if it is built via the iterative formula C(k+1) = k C(k) + k vk v‘k where C(0) = In, vk ≠ 0, k > 0 and liminf k > 0. Proof: If v ≠ 0, then matrix V = vv‘ is symmetric and positive semidefinite, since as per definition of the dyadic product vij = vi ¢ vj = vj ¢ vi = vji for all i, j and for all x  Rn : x‘ (vv‘) x = (x‘v) ¢ (v‘x) = (x‘v)2 ≥ 0. Thus, the sequence of matrices vkv‘k is symmetric and p.s.d. for k ≥ 0. Owing to the previous lemma matrix C(k+1) is symmetric and p.d., if C(k) is symmetric as well as p.d. and matrix vkv‘k is symmetric and p.s.d. Since C(0) = In symmetric and p.d. it follows that C(1) is symmetric and p.d. Repetition of these arguments leads to the statement of the theorem. ■ G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 7

CMA-ES Idea: Don‘t estimate matrix C in each iteration! Instead, approximate iteratively! (Hansen, Ostermeier et al. 1996ff.) → Covariance Matrix Adaptation Evolutionary Algorithm (CMA-EA) Set initial covariance matrix to C(0) = In C(t+1) = (1-) C(t) +  wi di di‘  : “learning rate“ 2 (0,1) m = mean of all selected parents complexity: O(n2 + n3) di = (xi: – m) /  sorting: f(x1:) ≤ f(x2:) ≤ ... ≤ f(x:) is positive semidefinite dispersion matrix dyadic product: dd‘ = G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 8

CMA-ES variant: m = mean of all selected parents p(t+1) = (1 - c) p(t) + (c (2 - c) eff)1/2 (m(t) – m(t-1) ) / (t) “Evolution path“ p(0) = 0  2 (0,1) C(0) = In C(t+1) = (1 - ) C(t) +  p(t) (p(t))‘ complexity: O(n2) → Cholesky decomposition: O(n3) für C(t) G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 9

CMA-ES State-of-the-art: CMA-EA (currently many variants) → successful applications in practice C, C++, Java Fortran, Python, Matlab, R, Scilab available in WWW: http://www.lri.fr/~hansen/cmaes_inmatlab.html http://shark-project.sourceforge.net/ (EAlib, C++) … G. Rudolph: Computational Intelligence ▪ Winter Term 2015/16 10