Akio Utsugi National Institute of Bioscience and Human-technology,

Slides:

Advertisements

Similar presentations

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

Advertisements

Probabilistic analog of clustering: mixture models

Probabilistic models Haixu Tang School of Informatics.

Biointelligence Laboratory, Seoul National University

Guillaume Bouchard Xerox Research Centre Europe

BAYESIAN INFERENCE Sampling techniques

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Visual Recognition Tutorial

Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.

Parametric Inference.

Machine Learning CMPT 726 Simon Fraser University

A Two Level Monte Carlo Approach To Calculating

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,

3 September, 2009 SSP2009, Cardiff, UK 1 Probabilistic Image Processing by Extended Gauss-Markov Random Fields Kazuyuki Tanaka Kazuyuki Tanaka, Muneki.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

14 October, 2010LRI Seminar 2010 (Univ. Paris-Sud)1 Statistical performance analysis by loopy belief propagation in probabilistic image processing Kazuyuki.

Lecture 2: Statistical learning primer for biologists

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Markov Chain Monte Carlo in R

Biointelligence Laboratory, Seoul National University

Learning Deep Generative Models by Ruslan Salakhutdinov

Deep Feedforward Networks

Ch 12. Continuous Latent Variables ~ 12

ICS 280 Learning in Graphical Models

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Variational Bayes Model Selection for Mixture Distribution

Model Inference and Averaging

Ch3: Model Building through Regression

Multimodal Learning with Deep Boltzmann Machines

Latent Variables, Mixture Models and EM

Probabilistic Models for Linear Regression

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Bayesian Models in Machine Learning

SMEM Algorithm for Mixture Models

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Graduate School of Information Sciences, Tohoku University

Stochastic Optimization Maximization for Latent Variable Models

An introduction to Graphical Models – Michael Jordan

The free-energy principle: a rough guide to the brain? K Friston

Biointelligence Laboratory, Seoul National University

Expectation-Maximization & Belief Propagation

Adaptive Cooperative Systems Chapter 6 Markov Random Fields

Parametric Methods Berlin Chen, 2005 References:

Lecture 11 Generalizations of EM.

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Markov Networks.

Presentation transcript:

Bayesian Sampling and Ensemble Learning in Generative Topographic Mapping Akio Utsugi National Institute of Bioscience and Human-technology, Neural Processing Letters, vol. 12, no. 3, pp. 277-290. Summarized by Jong-Youn, Lim

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) Introduction SOM A minimal model for the formation of topology-preserving maps An information processing tool to extract a hidden smooth manifold from data Drawbacks : no explicit statistical model for the data generation Alternatives Elastic net Generative topographic mapping : based on the mixture of spherical Gaussian generators with a constraint on the centroids © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) Hyperparameter search of GTM on small data using a Gibbs sampler, but time consuming on large data Needs for deterministic algorithm producing the estimates quickly – ensemble learning(to minimize the variational free energy of the model, which gives an upper bound of negative log evidence) © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

Generative topographic mapping Two versions : an original regression version , a Gaussian process version It consists of a spherical Gaussian mixture density and a Gaussian process prior A spherical Gaussian mixture density © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) W has a Gaussian prior Bayesian inference of W Inference of h is based on its evidence f(X|h) (the maximizer of the evidence is called the generalized maximum likelihood(GML) estimate of h) The approximations for the hyperparameter search algorithm are valid only on abundant data Hyperparameter search is improved using a Gibbs sampler © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) Gibbs sampler in GTM Any moment of the posteriors can be obtained precisely by an average over the long sample series Gibbs sampler is one of MCMC methods, which does not need the design of a trial distribution Conditional posterios on Y and W Conditional posterior on Y(p is the posterior selection probabilities of the inner units © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) The conditional posterior on W is obtained by normalizing f(X,Y,W|h) (product of 1,2,4) © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) Conditional posteriors on hyperparameters Conditional posteriors are obtained by normalizing © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

Ensemble learning in GTM The ensemble learning is a deterministic algorithm to obtain the estimates of parameters and hyperparameters concurrently Approximating ensemble density Q, and its variational free energy on a model H If we restrict Q to a factorial form, we can have a straightforward algorithm for the minimization of F © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) The optimization procedure Initial densities are set to the partial ensembles A new density of Q(Y) is obtained from other densities by Each of the other partial ensembles is also updated using the same formula as above except that Y and the target variable are exchanged These updates of the partial ensembles are repeated until a convergence condition is satisfied © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) Simulations Compare the algorithms in simulations : the ensemble learning(deterministic algorithm), the Gibbs sampler Artificial data , i = 1,..,n are generated from two independent standard Gaussian random series { }, { } by Three noise levels : © 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI)

© 2001 SNU CSE Artificial Intelligence Lab (SCAI) Conclusion A simulation experiment showed the superiority of the Gibbs sampler on small data and the validity of the deterministic algorithms on large data © 2001 SNU CSE Artificial Intelligence Lab (SCAI)