Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Part 2: Unsupervised Learning
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
VC theory, Support vectors and Hedged prediction technology.
Biointelligence Laboratory, Seoul National University
Computer vision: models, learning and inference Chapter 8 Regression.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Suggested readings Historical notes Markov chains MCMC details
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Econometrics & Business The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Pattern Recognition and Machine Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Scalable Text Mining with Sparse Generative Models
A Bidirectional Matching Algorithm for Deformable Pattern Detection with Application to Handwritten Word Retrieval by K.W. Cheung, D.Y. Yeung, R.T. Chin.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
Bayes Factor Based on Han and Carlin (2001, JASA).
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Model Inference and Averaging
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Spatial Dynamic Factor Analysis Hedibert Freitas Lopes, Esther Salazar, Dani Gamerman Presented by Zhengming Xing Jan 29,2010 * tables and figures are.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
An Introduction to Support Vector Machines (M. Law)
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Characterizing the Function Space for Bayesian Kernel Models Natesh S. Pillai, Qiang Wu, Feng Liang Sayan Mukherjee and Robert L. Wolpert JMLR 2007 Presented.
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Gaussian Processes For Regression, Classification, and Prediction.
Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Bayesian fMRI analysis with Spatial Basis Function Priors
Sparse Kernel Machines
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Computer vision: models, learning and inference
Robust Full Bayesian Learning for Neural Networks
Generally Discriminant Analysis
Mathematical Foundations of BME
Test #1 Thursday September 20th
Presentation transcript:

Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011

Summary of contributions Propose generalized kernel models (GKMs) as a framework in which sparsity can be given an explicit treatment and in which a fully Bayesian methodology can be carried out Data augmentation methodology to develop a MCMC algorithm for inference Approach shown to be related Gaussian processes and provide a flexible approximation method for GPs

Bayesian approach for kernel supervised learning The form of the regressor or classifier is given by For a Mercer kernel, there exists a corresponding mapping (say ), from the input space, such that This provides an equivalent representation in the feature space, where,

Generalized Kernel Models

Prior for regression coefficients

Sparse models Recall that the number of active vectors is the number of non-zero components of – We are thus interested in a prior for which allows some components of to be zero

Methodology For the indicator vector

Graphical model

Inference Gibbs for most parameters MH for kernel parameters Reversible jump Markov Chain for – takes 2^n distinct values – For small n, posterior may be obtained by calculating the normalizing constant by summing over all possible values of – For large n, a reversible jump MC sampler may be employed to identify high posterior probability models

Automatic choice of active vectors We generate a proposal from the current value of by one of the three possible moves: Prediction :

Sparse Gaussian process for classification Given a function, then is a Gaussian process with zero mean and covariance function and vice versa. Also,

Sparse GP classification

Results