Optimization by Model Fitting Chapter 9 Luke, Essentials of Metaheuristics, 2011 Byung-Hyun Ha R1.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.
Linkage Learning in Evolutionary Algorithms. Recombination Missouri University of Science and Technology Recombination explores the search space Classic.
Multiobjective Optimization Chapter 7 Luke, Essentials of Metaheuristics, 2011 Byung-Hyun Ha R1.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Classification and risk prediction
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Data Mining Techniques Outline
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Instance Based Learning
Dimensional reduction, PCA
Machine Learning CMPT 726 Simon Fraser University
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Data mining and statistical learning - lecture 13 Separating hyperplane.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Prepared by Barış GÖKÇE 1.  Search Methods  Evolutionary Algorithms (EA)  Characteristics of EAs  Genetic Programming (GP)  Evolutionary Programming.
Genetic Algorithm.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Estimation of Distribution Algorithms (EDA)
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Genetic Algorithms Genetic algorithms imitate a natural optimization process: natural selection in evolution. Developed by John Holland at the University.
Outline Introduction Evolution Strategies Genetic Algorithm
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Coevolution Chapter 6, Essentials of Metaheuristics, 2013 Spring, 2014 Metaheuristics Byung-Hyun Ha R2R3.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Combinatorial Optimization Chapter 8, Essentials of Metaheuristics, 2013 Spring, 2014 Metaheuristics Byung-Hyun Ha R2.
Linear Models for Classification
Lecture 2: Statistical learning primer for biologists
Learning Classifier Systems (Introduction) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering and Computer Science Victoria University.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CS479/679 Pattern Recognition Dr. George Bebis
Data Transformation: Normalization
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Probabilistic Models with Latent Variables
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Math Essentials Part 2
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Pattern Recognition and Machine Learning
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
A task of induction to find patterns
Presentation transcript:

Optimization by Model Fitting Chapter 9 Luke, Essentials of Metaheuristics, 2011 Byung-Hyun Ha R1

1 Outline  Introduction  Model fitting by classification  Model fitting with distribution  Summary

2 Introduction  Exploring and/or exploiting solution space  Construction or composition  Tweak or mutation  Recombination or crossover .. other ways?  In perspective of statistics  Population and sampling e.g., a set of all students, a sample of students for examining their height  Tweaking in search (metaheuristics) Sampling space of candidate solutions to select high-quality ones  An alternative to selecting and Tweaking by (statistical) model  Classification model Graduate students ;-), decision trees, neural networks, …  Probability distribution

3 Introduction  Example: T-problem with 5 jobs  Training by sampling 15 solutions from population of 120 ones, and question: what is the quality of ?  How? By classification or using probability distribution (23) (15) (12) (19) (11) (11) (15) (10) (24) (16) (15) (19) (16) (16) (19) (15) (20) (15) (15) (12) (15) (18) (15) (17) (15) (16) (12) (11) (19) (11) (15) (15) (17) (21) (13) (11) (19) (15) (19) (13) (17) (21) (22) (14) (18) (18) (11) (19) (18) (15) (19) (16) (13) (18) (17) (15) (21) (14) (14) (15) (19) (18) (15) (21) (15) solution space as population sampling a sample as representatives of population something we can do?

4 Model Fitting by Classification  Classification problem  Given a collection of records, to find a model for class attribute as a function of the values of other attributes  Fitting a model, or  model induction, machine learning (23) (15) (12) (11) (15) (10) (15) (19) (16) (15) (20) (15) training set a classification model induction Is a good solution? query or test Give me a good solution! generation

5 Model Fitting by Classification  Examples of classification algorithms  Graduate students by naggings of professors ;-)  Decision trees by C4.5 and ID3 c.f.,  k-nearest-neighbor (kNN) by kNN algorithm  Neural networks by backpropagation algorithm records (training set) a decision tree for classification (or prediction)

6 Model Fitting by Classification  Classification problem (revisited)  Given a collection of records, to find a model for class attribute as a function of the values of other attributes  Application of classification to search  Given a collection of solutions, to find a model for fitness as a function of the values of components of solutions  Generating children from the model  Rejection sampling with discriminative models Algorithm 115 and 117  Region-based sampling with generative models Algorithm 116  Learnable Evolution Model (LEM)  Algorithm 114 a classification model Is a good solution? Give me a good solution! discriminative model generative model rejection sampling region-based sampling

7 Model Fitting by Classification  Examples  Inducing a decision tree  Generating children from a decision tree x y goodbad good y 0.7 x 0.3 y 0.6 x 0.5 goodbad y 0.4 bad x 0.7 badgood

8 Model Fitting by Classification  Example (cont’d)  A model that specifies the probability y xx x y yy goodbad x

9 Model Fitting by Classification  Example (Talbi, 2009)  Application of rule-based classifier into crossover operator  Rules If X 4 = 5 and X 5 < 2, then class = best...  Patterns matching the rules    5  1  ...  Possible crossover?

10 Model Fitting with a Distribution  An alternative form of model  A distribution of an infinite-sized population A set of candidate solutions: a sample from population  Working with sample distribution  Estimation of Distribution Algorithm  Representing distribution of infinite population with a number of samples  Loop: sampling a set of individuals  assessing them  adjust the distribution to reflect the new fitness results  Algorithm 118: An Abstract Estimation of Distribution Algorithm (EDA)

11 Model Fitting with a Distribution  Representing distributions for genotype with n genes  Using n-dimensional histogram A fairly high-resolution grid to accurately represent the distribution c.f., kd-tree or quadtree A fairly high amount of grid points a n when distribution of each gene is discretized into a pieces  Using parametric distribution e.g., m number of gaussian curves How many gaussian curves? n-dimensional gaussian: mean vector of size n and a covariance matrix of size n 2 1,000 genes? 1,000,000 numbers

12 Model Fitting with a Distribution  Representing distributions (cont’d)  Using marginal distributions Projecting full distribution into a single dimension for each gene Representing single distribution, again 1-dimensional array as a histogram 1-dimensional gaussians as a parametric representation Size of representation? Problems (very big)?

13 Model Fitting with a Distribution  Univariate Estimation of Distribution Algorithms  Population-Based Incremental Learning (PBIL) Genes having finite discrete values n marginal distributions with n genes, initially uniform Representation? Truncation selection of good solutions sampled using distribution Gradual marginal distribution update Algorithm 119: Population-Based Incremental Learning  Univariate Marginal Distribution Algorithm (UMDA) A variation on PBIL Any selection procedure, allowed Entirely replacing distribution D each time around (  = 1) Large sample, required (why?)  Compact Genetic Algorithm (cGA) Genes having boolean values Updating each marginal distribution by pairwise comparison of individuals c.f., Modeling finite population instead of infinite one Algorithm 120: The Compact Genetic Algorithm

14 Model Fitting with a Distribution  Univariate Estimation of Distribution Algorithms (cont’d)  Real-valued representations By discretization of each marginal distribution Histogram approach Using PBIL directly By parametric approach e.g., using single gaussian Unbiased estimators of mean and variance for parameter estimation Updating each marginal distribution by linear combination Using multiple distributions c.f., Expectation Maximization (EM) algorithm

15 Model Fitting with a Distribution  Multivariate Estimation of Distribution Algorithms  Problems in univariate estimation (using marginal distributions) Assumption of no linkage between genes c.f., cooperative coevolution  An alternative Using bivariate distributions One distribution for every pair of genes Using triple genes per distribution, using quadruple …  A better way Multivariate distribution for strongly-linked genes, selectively e.g., Bayes Network c.f., not only about how good, but also about why it is good (Hierarchical) Bayesian Optimization Algorithm Algorithm 121: An Abstract Version of the Bayesian Optimization Algorithm (BOA)

16 Hybrid Metaheuristics (Talbi, 2009)  Combining with X  Mathematical programming approaches Enumeration algorithms Relaxation and decomposition methods Branch and cut and price algorithms  Constraint programming  Data mining techniques  Multiobjective optimization  Classical hybrid approaches  Low-level relay hybrids  Low-level teamwork hybrids  High-level relay hybrids  High-level teamwork hybrids

17 Summary  Exploring and/or exploiting solution space  In perspective of statistics  Model fitting by classification  Employing decision trees, kNN, neural networks  Generating children from the model  Model fitting with a distribution  Estimation of Distribution Algorithm  Representing distributions n-dimensional histogram, parametric distributions, marginal distributions  Univariate Estimation of Distribution Algorithms Problems  Multivariate Estimation of Distribution Algorithms Bayes Network