24-02-2006Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

A Tutorial on Learning with Bayesian Networks

. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.

Applied Evolutionary Optimization Prabhas Chongstitvatana Chulalongkorn University.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA.

Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.

Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.

Estimation of Distribution Algorithms Let’s review what have done in EC so far: We have studied EP and found that each individual searched via Gaussian.

Lecture 5: Learning models using EM

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

Hierarchical Allelic Pairwise Independent Function by DAVID ICLĂNZAN Present by Tsung-Yu Ho At Teilab,

Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.

Today Logistic Regression Decision Trees Redux Graphical Models

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.

Evolutionary Intelligence

Efficient Model Selection for Support Vector Machines

Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.

FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.

Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,

Estimation of Distribution Algorithms (EDA)

Zorica Stanimirović Faculty of Mathematics, University of Belgrade

Probabilistic Graphical Models

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Simulated Annealing.

Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK

Markov Random Fields Probabilistic Models for Images

GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.

Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.

How to apply Genetic Algorithms Successfully Prabhas Chongstitvatana Chulalongkorn University 4 February 2013.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

An Introduction to Variational Methods for Graphical Models

EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.

Lecture 2: Statistical learning primer for biologists

John Lafferty Andrew McCallum Fernando Pereira

Approaches to Selection and their Effect on Fitness Modelling in an Estimation of Distribution Algorithm A. E. I. Brownlee, J. A. McCall, Q. Zhang and.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Lecture 5 Introduction to Sampling Distributions.

Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.

D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.

An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.

Pattern Recognition and Machine Learning

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

Today Graphical Models Representing conditional dependence graphically

Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.

Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.

Markov Networks.

Markov Random Fields Presented by: Vladan Radosavljevic.

FDA – A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions BISCuit EDA Seminar

Markov Networks.

Outline Texture modeling - continued Markov Random Field models

Presentation transcript:

Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon University

Siddhartha Shakya2 Outline From GAs to EDAs Probabilistic Graphical Models in EDAs –Bayesian networks –Markov Random Fields Fitness modelling approach to estimating and sampling MRF in EDA –Gibbs distribution, energy function and modelling the fitness –Estimating parameters (Fitness modelling approach) –Sampling MRF (several different approaches) Conclusion

Siddhartha Shakya3 Genetic Algorithms (GAs) Population based optimisation technique Based on Darwin's theory of Evolution A solution is encoded as a set of symbols known as chromosome A population of solution is generated Genetic operators are then applied to the population to get next generation that replaces the parent population

Siddhartha Shakya4 Simple GA simulation

Siddhartha Shakya5 GA to EDA

Siddhartha Shakya6 Simple EDA simulation

Siddhartha Shakya7 Joint Probability Distribution (JPD) Solution as a set of random variables Joint probability Distribution (JPD) Exponential to the number of variables, therefore not feasible to calculate in most cases Needs Simplification!!

Siddhartha Shakya8 Factorisation of JPD Univariate model: No interaction: Simplest model Bivariate model: Pair- wise interaction Multivariate Model: interaction of more than two variables

Siddhartha Shakya9 Typical estimation and sampling of JPD in EDAs Learn the interaction between variables in the solution Learn the probabilities associated with interacting variables This specifies the JPD: p(x) Sample the JPD (i.e. learned probabilities)

Siddhartha Shakya10 Probabilistic Graphical Models Efficient tool to represent the factorisation of JPD Marriage between probability theory and Graph theory Consist of Two components –Structure –Parameters Two types of PGM –Directed PGM (Bayesian Networks) –Undirected PGM (Markov Random Field)

Siddhartha Shakya11 Directed PGM (Bayesian networks) Structure: Directed Acyclic Graph (DAG) Independence relationship: A variable is conditionally independent of rest of the variables given its parents Parameters: Conditional probabilities X1X1 X2X2 X3X3 X4X4 X5X5

Siddhartha Shakya12 Bayesian networks The factorisation of JPD encoded in terms of conditional probabilities is JPD for BN X1X1 X2X2 X3X3 X4X4 X5X5

Siddhartha Shakya13 Estimating a Bayesian network Estimate structure Estimate parameters This completely specifies the JPD JPD can then be Sampled X1X1 X2X2 X3X3 X4X4 X5X5

Siddhartha Shakya14 BN based EDAs 1.Initialise parent solutions 2.Select a set from parent solutions 3.Estimate a BN from selected set a.Estimate structure b.Estimate parameters 4.Sample BN to generate new population 5.Replace parents with new set and go to 2 until termination criteria satisfies

Siddhartha Shakya15 How to estimate and sample BN in EDAs Estimating structure –Score + Search techniques –Conditional independence test Estimating parameters –Trivial in EDAs: Dataset is complete –Estimate probabilities of parents before child Sampling –Probabilistic Logical Sampling (Sample parents before child) X1X1 X2X2 X3X3 X4X4 X5X5

Siddhartha Shakya16 BN based EDAs Well established approach in EDAs BOA, EBNA, LFDA, MIMIC, COMIT, BMDA References –Larrañiaga and Lozano 2002 –Pelikan 2002

Siddhartha Shakya17 Markov Random Fields (MRF) Structure: Undirected Graph Local independence: A variable is conditionally independent of rest of the variables given its neighbours Global Independence: Two sets of variables are conditionally independent to each other if there is a third set that separates them. Parameters: potential functions defined on the cliques X1X1 X3X3 X2X2 X4X4 X6X6 X5X5

Siddhartha Shakya18 Markov Random Field The factorisation of JPD encoded in terms of potential function over maximal cliques is JPD for MRF X1X1 X3X3 X2X2 X4X4 X6X6 X5X5

Siddhartha Shakya19 Estimating a Markov Random field Estimate structure from data Estimate parameters –Requires potential functions to be numerically defined This completely specifies the JPD JPD can then be Sampled –No specific order (not a DAG) so a bit problematic X1X1 X3X3 X2X2 X4X4 X6X6 X5X5

Siddhartha Shakya20 MRF in EDA Has recently been proposed as a estimation of distribution technique in EDA Shakya et al 2004, 2005 Santana et el 2003, 2005

Siddhartha Shakya21 MRF based EDA 1.Initialise parent solutions 2.Select a set from parent solutions 3.Estimate a MRF from selected set a.Estimate structure b.Estimate parameters 4.Sample MRF to generate new population 5.Replace parent with new solutions and go to 2 until termination criteria satisfies

Siddhartha Shakya22 How to estimate and sample MRF in EDA Learning Structure –Conditional Independence test (MN-EDA, MN-FDA) –Linkage detection algorithm (LDFA) Learning Parameter –Junction tree approach (FDA) –Junction graph approach (MN-FDA) –Kikuchi approximation approach (MN-EDA) –Fitness modelling approach (DEUM) Sampling –Probabilistic Logic Sampling (FDA, MN-FDA) –Probability vector approach (DEUMpv) –Direct sampling of Gibbs distribution (DEUMd) –Metropolis sampler (Is-DEUMm) –Gibbs Sampler (Is-DEUMg, MN-EDA)

Siddhartha Shakya23 Fitness modelling approach Hamersley Clifford theorem: JPD for any MRF follows Gibbs distribution Energy of Gibbs distribution in terms of potential functions over the cliques Assuming probability of solution is proportional to its fitness: From (a) and (b) a Model of fitness function - MRF fitness model (MFM) – is derived

Siddhartha Shakya24 MRF fitness Model (MFM) Properties: –Completely specifies the JPD for MRF –Negative relationship between fitness and Energy i.e. Minimising energy = maximise fitness Task: –Need to find the structure for MRF –Need to numerically define clique potential function

Siddhartha Shakya25 MRF Fitness Model (MFM) Let us start with simplest model: univariate model – this eliminates structure learning :) For univariate model there will be n singleton clique For each singleton clique assign a potential function Corresponding MFM In terms of Gibbs distribution X1X1 X3X3 X2X2 X4X4 X6X6 X5X5

Siddhartha Shakya26 Estimating MRF parameters using MFM Each chromosome gives us a linear equation Applying it to a set of selected solution gives us a system of linear equations Solving it will give us the approximation to the MRF parameters Knowing MRF parameters completely specifies JPD Next step is to sample the JPD

Siddhartha Shakya27 General DEUM framework Distribution Estimation Using MRF algorithm (DEUM) 1.Initialise parent population P 2.Select set D from P (can use D=P !!) 3.Build a MFM and fit to D to estimate MRF parameters 4.Sample MRF to generate new population 5.Replace P with new population and go to 2 until termination criterion satisfies

Siddhartha Shakya28 How to sample MRF Probability vector approach Direct Sampling of Gibbs Distribution Metropolis sampling Gibbs sampling

Siddhartha Shakya29 Probability vector approach to sample MRF Minimise U(x) to maximise f(x) To minimise U(x) Each α i x i should be minimum This suggests: if α i is negative then corresponding x i should be positive We could get an optimum chromosome for the current population just by looking on α However not always the current population contains enough information to generate optimum We look on sign of each α i to update a vector of probability

Siddhartha Shakya30 DEUM with probability vector (DEUMpv)

Siddhartha Shakya31 Updating Rule Uses sign of a MRF parameter to direct search towards favouring value of respective variable that minimises energy U(x) Learning rate controls convergence

Siddhartha Shakya32 Simulation of DEUMpv

Siddhartha Shakya33 Results OneMax Problem

Siddhartha Shakya34 Results F6 function optimisation

Siddhartha Shakya35 Results Trap 5 function Deceptive problem No solution found

Siddhartha Shakya36 Sampling MRF Probability vector approach Direct sampling of Gibbs distribution Metropolis sampling Gibbs sampling

Siddhartha Shakya37 Direct Sampling of Gibbs distribution In the probability vector approach, only the sign of MRF parameters has been used However, one could directly sample from the Gibbs distribution and make use of the values of MRF parameters Also could use the temperature coefficient to manipulate the probabilities

Siddhartha Shakya38 Direct Sampling of Gibbs distribution

Siddhartha Shakya39 Direct Sampling of Gibbs distribution The temperature coefficient has an important role Decreasing T will cool probability to either 1 or 0 depending upon sign and value of alpha This forms the basis for the DEUM based on direct sampling of Gibbs distribution (DEUMd)

Siddhartha Shakya40 DEUM with direct sampling (DEUMd) 1. Generate initial population, P, of size M 2. Select the N fittest solutions, N ≤ M 3. Calculate MRF parameters 4. Generate M new solutions by sampling univariate distribution 5. Replace P by new population and go to 2 until complete

Siddhartha Shakya41 DEUMd simulation

Siddhartha Shakya42 Experimental results OneMax Problem

Siddhartha Shakya43 F6 function optimization

Siddhartha Shakya44 Plateau Problem (n=180)

Siddhartha Shakya45 Checker Board Problem (n=100)

Siddhartha Shakya46 Trap function of order 5 (n=60)

Siddhartha Shakya47 Experimental results GAUMDAPBILDEUMd Checker Board Fitness ± (4.39) ± (9.2) ± (8.7) ± (5.17) Evaluation ± ( ) ± (9127) ± ( ) ± ( ) Equal- Products Fitness ± ( ) 5.03 ± (18.29) 9.35 ± (43.36) 2.14 ± (6.56) Evaluation ± (0) ± (0) ± (0) ± (0) Colville Fitness 0.61 ± (1.02) ± (102.26) 2.69 ± (2.54) 0.61 ± (0.77) Evaluation ± (0) ± ( ) ± (0) ± (0) Six Peaks Fitness 99.1 ± (9) ± (3.37) ± (1.06) 100 ± (0) Evaluation ± (4940) ± ( ) ± ( ) ± ( )

Siddhartha Shakya48 Analysis of Results For Univariate problems (OneMax), given population size of 1.5n, P=D and T->0, solution was found in single generation For problems with low order dependency between variables (Plateau and CheckerBoard), performance was significantly better than that of other Univariate EDAs. For the deceptive problems with higher order dependency (Trap function and Six peaks) DEUMd was deceived but by slowing the cooling rate, it was able to find solution for Trap of order 5. For the problems where optimum was not known the performance was comparable to that of GA and other EDAs and was better in some cases.

Siddhartha Shakya49 Cost- Benefit Analysis (the cost) Polynomial cost of estimating the distribution compared to linear cost of other univariate EDAs Cost to compute univariate marginal frequency: Cost to compute SVD

Siddhartha Shakya50 Cost- Benefit Analysis (the benefit) DEUMd can significantly reduce the number of fitness evaluations Quality of solution was better for DEUMd than other compared EDAs DEUMd should be tried on problems where the increased solution quality outweigh computational cost.

Siddhartha Shakya51 Sampling MRF Probability vector approach Direct Sampling of Gibbs Distribution Metropolis sampling Gibbs sampling

Siddhartha Shakya52 Example problem: 2D Ising Spin Glass Given coupling constant J find the value of each spins that minimises H MRF fitness model

Siddhartha Shakya53 Metropolis Sampler

Siddhartha Shakya54 Difference in Energy

Siddhartha Shakya55 DEUM with Metropolis sampler

Siddhartha Shakya56 Results

Siddhartha Shakya57 Sampling MRF Probability vector approach Direct Sampling of Gibbs Distribution Metropolis sampling Gibbs sampling

Siddhartha Shakya58 Conditionals from Gibbs distribution For 2D Ising spin glass problem:

Siddhartha Shakya59 Gibbs Sampler

Siddhartha Shakya60 DEUM with Gibbs sampler

Siddhartha Shakya61 Results

Siddhartha Shakya62 Summary From GA to EDA PGM approach to modelling and sampling distribution in EDA DEUM: MRF approach to modelling and sampling Learn Structure: No structure learning so far (Fixed models are used) Learn Parameter: Fitness modelling approach Sample MRF: –Probability vector approach to sample –Direct sampling of Gibbs distribution –Metropolis sampler –Gibbs Sampler Results are encouraging and lot more to explore