Search and Optimization Methods Based in part on Chapter 8 of Hand, Manilla, & Smyth David Madigan.

Slides:



Advertisements
Similar presentations
Nelder Mead.
Advertisements

Optimization : The min and max of a function
EM Algorithm Jur van den Berg.
Expectation Maximization
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
K-means clustering Hongning Wang
Machine Learning and Data Mining Clustering
Visual Recognition Tutorial
Easy Optimization Problems, Relaxation, Local Processing for a single variable.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
458 Interlude (Optimization and other Numerical Methods) Fish 458, Lecture 8.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation Maximization Algorithm
Optimization Methods One-Dimensional Unconstrained Optimization
Unsupervised Learning
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Advanced Topics in Optimization
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM Algorithm Likelihood, Mixture Models and Clustering.
Optimization Methods One-Dimensional Unconstrained Optimization
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Molecular Modeling: Geometry Optimization C372 Introduction to Cheminformatics II Kelsey Forsythe.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Chem Math 252 Chapter 5 Regression. Linear & Nonlinear Regression Linear regression –Linear in the parameters –Does not have to be linear in the.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
CHAPTER 4, Part II Oliver Schulte Summer 2011 Local Search.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Flat clustering approaches
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
Advanced Computer Graphics Optimization Part 2 Spring 2002 Professor Brogan.
Intro. ANN & Fuzzy Systems Lecture 37 Genetic and Random Search Algorithms (2)
CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Non-linear Minimization
Haim Kaplan and Uri Zwick
Latent Variables, Mixture Models and EM
Bayesian Models in Machine Learning
Generally Discriminant Analysis
Biointelligence Laboratory, Seoul National University
L23 Numerical Methods part 3
Presentation transcript:

Search and Optimization Methods Based in part on Chapter 8 of Hand, Manilla, & Smyth David Madigan

Introduction This chapter is about finding the models and parameters that minimize a general score function S Often have to conduct a parameter search for each visited model The number of possible structures can be immense. For example, there are 3.6  undirected graphical models with 10 vertices

Greedy Search 1. Initialize. Chose an initial state M k 2. Iterate. Evaluate the score function at all adjacent states and move to the best one 3. Stopping Criterion. Repeat step 2 until no further improvement can be made. 4. Multiple Restarts. Repeat 1-3 from different starting points and choose the best solution found.

Systematic Search Heuristics Breadth-first, Depth-first, Beam-search, etc.

Parameter Optimization Finding the parameters  that minimize a score function S(  ) is usually equivalent to the problem of minimizing a complicated function in a high-dimensional space Define the gradient function is S as: When closed form solutions to  S(  )=0 exist, no need for numerical methods.

Gradient-Based Methods 1. Initialize. Choose an initial value for  =  0 2. Iterate. Starting with i=0, let  i+1 =  i + i v i where v is the direction of the next step and lambda is the distance. Generally choose v to be a direction that improves the score 3. Convergence. Repeat step 2 until S appears to have reached a local minimum. 4. Multiple Restarts. Repeat steps 1-3 from different initial starting points and choose the best minimum found.

Univariate Optimization Let g(  )=S’(  ). Newton-Raphson proceeds as follows. Suppose g(  s )=0. Then:

1-D Gradient-Descent usually chosen to be quite small Special case of NR where 1/g’(  i ) is replaced by a constant

Multivariate Case Curse-of-Dimensionality again. For example, suppose S is defined on a d-dimensional unit hypercube. Suppose we know that all components of  are less than 1/2 at the optimum. if d=1, have eliminated half the parameter space if d=2, have eliminated 1/4 of the parameter space if d=20, have eliminated 1/1,000,000 of the parameter space!

Multivariate Gradient Descent -g(  i ) points in the direction of steepest descent Guaranteed to converge if small enough Essentially the same as the back-propagation method used in NNs Can replace with second-derivative information (“quasi-Newton” uses approx).

Simplex Search Method Evaluates d+1 points arranged in a hyper- tetrahedron For example, with d=2, evaluates S at the vertices of an equilateral triangle Reflect the triangle in the side opposite the vertex with the highest value Repeat until oscillation occurs, then half the sides of the triangle No calculation of derivatives...

EM for two-component Gaussian mixture Tricky to maximize…

EM for two-component Gaussian mixture, cont. This is the “E-step.” Does a soft assignment of observations to mixture components

EM for two-component Gaussian mixture – Algorithm

EM with Missing Data Let Q(H) denote a probability distribution for the missing data This is a lower bound on l(  )

EM (continued) In the E-Step, Max is achieved when In the M-Step, need to maximize:

EM Normal Mixture Example