Guillaume Bouchard Xerox Research Centre Europe

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayesian Belief Propagation
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Expectation Maximization
An Introduction to Variational Methods for Graphical Models.
Linear Models for Classification: Probabilistic Methods
Chapter 4: Linear Models for Classification
Computer vision: models, learning and inference
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
BAYESIAN INFERENCE Sampling techniques
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
Visual Recognition Tutorial
Variational Inference and Variational Message Passing
Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan Joint work with Benjamin Marlin, and Kevin Murphy University.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Maximum likelihood (ML) and likelihood ratio (LR) test
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Conditional Random Fields
Parametric Inference.
Machine Learning CMPT 726 Simon Fraser University
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Lecture 19: More EM Machine Learning April 15, 2010.
Probabilistic Graphical Models
Generative verses discriminative classifier
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Markov Random Fields Probabilistic Models for Images
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
-Arnaud Doucet, Nando de Freitas et al, UAI
Variational Inference for the Indian Buffet Process
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Lecture 2: Statistical learning primer for biologists
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Multidimensional Scaling By Marc Sobel. The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Stat 223 Introduction to the Theory of Statistics
Learning Deep Generative Models by Ruslan Salakhutdinov
Variational filtering in generated coordinates of motion
Generalized Iterative Scaling Exponential Family Distributions
CH 5: Multivariate Methods
Multimodal Learning with Deep Boltzmann Machines
Distributions and Concepts in Probability Theory
Statistical Learning Dong Liu Dept. EEIS, USTC.
Akio Utsugi National Institute of Bioscience and Human-technology,
Stochastic Optimization Maximization for Latent Variable Models
Bayesian Nonparametric Matrix Factorization for Recorded Music
Stat 223 Introduction to the Theory of Statistics
Robust Full Bayesian Learning for Neural Networks
Expectation-Maximization & Belief Propagation
Parametric Methods Berlin Chen, 2005 References:
Classical regression review
Presentation transcript:

Guillaume Bouchard Xerox Research Centre Europe Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe

Deterministic Inference in Hybrid Graphical Models X1 X2 X3 X4 Y1 X5 Y2 Y3 X0 Discrete variables with continuous* parents No sufficient statistic No conjugate distribution Intractable inference Approximate deterministic inference Local sampling Deterministic approximations Gaussian quadrature delta method Laplace approximation Maximize a lower bound to the variational free energy Discrete variable Continuous variable Observed variable Hidden variable *or a large number of discrete parents December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Variational inference X1i X2i β 1 β2 Yi Data i Focus on Bayesian multinomial logistic regression Mean field approximation Discrete variable Continuous variable Observed variable Hidden variable  Q belongs to an approximation family upper bound? max upper bound? December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bounding the log-partition function (1) Binary case dimension: classical bound [Jordan and Jaakkola] We propose its multiclass extension December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bounding the log-partition function (2) K=2 K=10 December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Other upper bounds Concavity of the log [e.g. Blei et al.] Worst curvature [Bohning] Bound using hyperbolic cosines [Jebara] Local approximation [Gibbs] not proved to be an upper bound  December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Proof Idea: Expand the product of inverted sigmoids Upper-bounded by K quadratic upper bounds Lower bounded by a linear function (log-convexity of f) Proof: apply Jensen inequality to December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bounds on the Expectation Exponential bound Quadratic bound simulations December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Bayesian multinomial logistic regression Exponential bound Cannot be maximized in closed form gradient-based optimization Fixed point equation (unstable !) Quadratic bound Analytic update: December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Numerical experiments Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Conclusion Multinomial links in graphical models are feasible Existing bound work well We can expect further improvements Remark better bounds are only needed for the Bayesian setting For MAP estimation, even a loose bound converge Future work Application to discriminative learning Mixture-based mean-field approximation December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Backup slides December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Numerical experiments Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Numerical experiments Iris dataset 4 dimensions 3 classes Prior: unit variance Experiment Learning: Batch updates Compared to MCMC estimation based on 100K samples Error = Euclidian distance between the mean and variance parameters Results The “worse curvature” bound is more faster and better… December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe

Guillaume Bouchard, Xerox Research Center Europe Jebara’s bound One dimension: Hyperbolic cosine bound Multi-dimensional case December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe