Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan Joint work with Benjamin Marlin, and Kevin Murphy University.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Part 2: Unsupervised Learning

Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Pattern Recognition and Machine Learning

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Expectation Maximization

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Supervised Learning Recap

An Introduction to Variational Methods for Graphical Models.

Probabilistic Clustering-Projection Model for Discrete Data

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Guillaume Bouchard Xerox Research Centre Europe

Visual Recognition Tutorial

A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Machine Learning CMPT 726 Simon Fraser University

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Visual Recognition Tutorial

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

GAUSSIAN PROCESS REGRESSION FORECASTING OF COMPUTER NETWORK PERFORMANCE CHARACTERISTICS 1 Departments of Computer Science and Mathematics, 2 Department.

Crash Course on Machine Learning

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.

Biointelligence Laboratory, Seoul National University

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Lecture 2: Statistical learning primer for biologists

Dropout as a Bayesian Approximation

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

Gaussian Processes For Regression, Classification, and Prediction.

CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.

Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.

Logistic Regression Saed Sayad 1www.ismartsoft.com.

Recitation4 for BigData Jay Gu Feb MapReduce.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Learning Deep Generative Models by Ruslan Salakhutdinov

Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Latent Variables, Mixture Models and EM

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Probabilistic Models for Linear Regression

Statistical Learning Dong Liu Dept. EEIS, USTC.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bayesian Models in Machine Learning

SMEM Algorithm for Mixture Models

Stochastic Optimization Maximization for Latent Variable Models

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Presentation transcript:

Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan Joint work with Benjamin Marlin, and Kevin Murphy University of British Columbia June 29, 2011

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Modeling Binary Data Main Topic of our paper Bernoulli-Logistic Latent Gaussian Models (bLGMs) Image from

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan bLGMs - Classification Models Bayesian Logistic Regression and Gaussian Process Classification (Jaakkola and Jordan 1996, Rasmussen 2004, Gibbs and Mackay 2000, Kuss and Rasmussen 2006, Nickisch and Rasmussen 2008, Kim and Ghahramani, 2003). Figures reproduced using GPML toolbox

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan bLGMs - Latent Factor Models Probabilistic PCA and Factor Analysis models (Tipping 1999, Collins, Dasgupta and Schapire 2001, Mohammed, Heller, and Ghahramani 2008, Girolami 2001, Yu and Tresp 2004).

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Parameter Learning is Intractable Logistic Likelihood is not conjugate to the Gaussian prior. x We propose piecewise bounds to obtain tractable lower bounds to marginal likelihood.

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Learning in bLGMs

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Bernoulli-Logistic Latent Gaussian Models Parameter Set z 1n y 1n n=1:N µ Σ W z 2n z Ln y 2n y Dn y 3n

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Learning Parameters of bLGMs x

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan some other tractable terms in m and V + Variational Lower Bound (Jensen’s) x

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Quadratic Bounds Bohning’s bound (Bohning, 1992) x

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Quadratic Bounds Bohning’s bound (Bohning, 1992) x Jaakkola’s bound (Jaakkola and Jordan, 1996) Both bounds have unbounded error.

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Problems with Quadratic Bounds 1-D example with µ = 2, σ = 2 Generate data, fix µ = 2, and compare marginal likelihood and lower bound wrt σ znzn ynyn n=1:N µ σ As this is a 1-D problem, we can compute lower bounds without Jensen’s inequality. So plots that follow have errors only due to error in bounds.

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Problems with Quadratic Bounds BohningJaakkolaPiecewise Q 1 (x) Q 2 (x) Q 3 (x)

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Piecewise Bounds

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Finding Piecewise Bounds Find Cut points, and parameters of each pieces by minimizing maximum error. Linear pieces (Hsiung, Kim and Boyd, 2008) Quadratic Pieces (Nelder- Mead method) Fixed Piecewise Bounds! Increase accuracy by increasing number of pieces.

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Linear Vs Quadratic

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Results

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Binary Factor Analysis (bFA) Z 1n Y 1n n=1:N W Z 2n Z Ln Y 2n Y Dn Y 3n UCI voting dataset with D=15, N=435. Train-test split 80-20% Compare cross-entropy error on missing value prediction on test data.

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan bFA – Error vs Time Bohning Jaakkola Piecewise Linear with 3 pieces Piecewise Quad with 3 pieces Piecewise Quad with 10 pieces

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan bFA – Error Across Splits Error with Piecewise Quadratic Error with Bohning and Jaakkola Bohning Jaakkola

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Gaussian Process Classification We repeat the experiments described in Kuss and Rasmussen, 2006 We set µ =0 and squared exponential Kernel Σ ij =σ exp[(x i -x j )^2/s] Estimate σ and s. We run experiments on Ionoshphere (D = 200) Compare Cross-entropy Prediction Error for test data. z1z1 y1y1 µ Σ z2z2 zDzD yDyD y2y2 X s σ

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan GP – Marginal Likelihood

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan GP – Prediction Error

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan EP vs Variational We see that the variational approach underestimates the marginal likelihood in some regions of parameter space. However, both methods give comparable results for prediction error. In general, the variational EM algorithm for parameter learning is guaranteed to converge when appropriate numerical methods are used, Nickisch and Rasmussen (2008) describe the variational approach as more principled than EP.

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Conclusions

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Fixed piecewise bounds can give a significant improvement in estimation and prediction accuracy relative to variational quadratic bounds. We can drive the error in the logistic-log-partition bound to zero by letting the number of pieces increase. This increase in accuracy comes with a corresponding increase in computation time. Unlike many other frameworks, we have a very fine grained control over the speed-accuracy trade-off through controlling the number of pieces in the bound. Conclusions

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Thank You

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Piecewise-Bounds: Optimization Problem

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Latent Gaussian Graphical Model LED dataset, 24 variables, N=2000 z 1n y 1n n=1:N µ Σ z 2n z Dn y Dn y 3n

Piecewise Bounds for Binary Latent Gaussian Models ICML Mohammad Emtiyaz Khan Sparse Version