Guillaume Bouchard Xerox Research Centre Europe

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Bayesian Belief Propagation

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Expectation Maximization

An Introduction to Variational Methods for Graphical Models.

Linear Models for Classification: Probabilistic Methods

Chapter 4: Linear Models for Classification

Computer vision: models, learning and inference

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

BAYESIAN INFERENCE Sampling techniques

Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:

Visual Recognition Tutorial

Variational Inference and Variational Message Passing

Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan Joint work with Benjamin Marlin, and Kevin Murphy University.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Maximum likelihood (ML) and likelihood ratio (LR) test

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.

Conditional Random Fields

Parametric Inference.

Machine Learning CMPT 726 Simon Fraser University

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Visual Recognition Tutorial

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Lecture 19: More EM Machine Learning April 15, 2010.

Probabilistic Graphical Models

Generative verses discriminative classifier

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Markov Random Fields Probabilistic Models for Images

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

-Arnaud Doucet, Nando de Freitas et al, UAI

Variational Inference for the Indian Buffet Process

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

BCS547 Neural Decoding.

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

Lecture 2: Statistical learning primer for biologists

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Multidimensional Scaling By Marc Sobel. The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we.

Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Stat 223 Introduction to the Theory of Statistics

Learning Deep Generative Models by Ruslan Salakhutdinov

Variational filtering in generated coordinates of motion

Generalized Iterative Scaling Exponential Family Distributions

CH 5: Multivariate Methods

Multimodal Learning with Deep Boltzmann Machines

Distributions and Concepts in Probability Theory

Statistical Learning Dong Liu Dept. EEIS, USTC.

Akio Utsugi National Institute of Bioscience and Human-technology,

Stochastic Optimization Maximization for Latent Variable Models

Bayesian Nonparametric Matrix Factorization for Recorded Music

Stat 223 Introduction to the Theory of Statistics

Robust Full Bayesian Learning for Neural Networks

Expectation-Maximization & Belief Propagation

Parametric Methods Berlin Chen, 2005 References:

Classical regression review

Presentation transcript:

Guillaume Bouchard Xerox Research Centre Europe Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe

Deterministic Inference in Hybrid Graphical Models X1 X2 X3 X4 Y1 X5 Y2 Y3 X0 Discrete variables with continuous* parents No sufficient statistic No conjugate distribution Intractable inference Approximate deterministic inference Local sampling Deterministic approximations Gaussian quadrature delta method Laplace approximation Maximize a lower bound to the variational free energy Discrete variable Continuous variable Observed variable Hidden variable *or a large number of discrete parents December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Variational inference X1i X2i β 1 β2 Yi Data i Focus on Bayesian multinomial logistic regression Mean field approximation Discrete variable Continuous variable Observed variable Hidden variable  Q belongs to an approximation family upper bound? max upper bound? December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bounding the log-partition function (1) Binary case dimension: classical bound [Jordan and Jaakkola] We propose its multiclass extension December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bounding the log-partition function (2) K=2 K=10 December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Other upper bounds Concavity of the log [e.g. Blei et al.] Worst curvature [Bohning] Bound using hyperbolic cosines [Jebara] Local approximation [Gibbs] not proved to be an upper bound  December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Proof Idea: Expand the product of inverted sigmoids Upper-bounded by K quadratic upper bounds Lower bounded by a linear function (log-convexity of f) Proof: apply Jensen inequality to December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bounds on the Expectation Exponential bound Quadratic bound simulations December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bayesian multinomial logistic regression Exponential bound Cannot be maximized in closed form gradient-based optimization Fixed point equation (unstable !) Quadratic bound Analytic update: December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Numerical experiments Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Conclusion Multinomial links in graphical models are feasible Existing bound work well We can expect further improvements Remark better bounds are only needed for the Bayesian setting For MAP estimation, even a loose bound converge Future work Application to discriminative learning Mixture-based mean-field approximation December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Backup slides December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Numerical experiments Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Numerical experiments Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Jebara’s bound One dimension: Hyperbolic cosine bound Multi-dimensional case December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe