A Bayesian Matrix Factorization Model for Relational Data UAI 2010 Authors: Ajit P. Singh & Geoffrey J. Gordon Presenter: Xian Xing Zhang Relational Learning.

Slides:



Advertisements
Similar presentations
Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.
Advertisements

Hierarchical Models and
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Computer vision: models, learning and inference
What is Statistical Modeling
Bayesian statistics – MCMC techniques
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Practical Statistics for Physicists Dresden March 2010 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Machine Learning CMPT 726 Simon Fraser University
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.
Latent Factor Models Geoff Gordon Joint work w/ Ajit Singh, Byron Boots, Sajid Siddiqi, Nick Roy.
Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Lorelei Howard and Nick Wright MfD 2008
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Particle Filtering. Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models.
Recitation 1 Probability Review
Chapter Two Probability Distributions: Discrete Variables
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Computer vision: models, learning and inference Chapter 19 Temporal models.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Corinne Introduction/Overview & Examples (behavioral) Giorgia functional Brain Imaging Examples, Fixed Effects Analysis vs. Random Effects Analysis Models.
Spatial Dynamic Factor Analysis Hedibert Freitas Lopes, Esther Salazar, Dani Gamerman Presented by Zhengming Xing Jan 29,2010 * tables and figures are.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Tracking with dynamics
Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.
Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Bayesian Methods Will Penny and Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 12.
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Chapter 7. Classification and Prediction
ICS 280 Learning in Graphical Models
CS 2750: Machine Learning Density Estimation
Parameter Estimation 主講人:虞台文.
Linear Mixed Models in JMP Pro
CH 5: Multivariate Methods
Linear Hierarchical Modelling
SPM2: Modelling and Inference
Learning Theory Reza Shadmehr
LECTURE 07: BAYESIAN ESTIMATION
Hierarchical Models and
Multivariate Methods Berlin Chen
Bayesian Inference in SPM2
Multivariate Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Presentation transcript:

A Bayesian Matrix Factorization Model for Relational Data UAI 2010 Authors: Ajit P. Singh & Geoffrey J. Gordon Presenter: Xian Xing Zhang Relational Learning via Collective Matrix Factorization SIGKDD 2008

Basic ideas Collective matrix factorization is proposed for relational learning when an entity participates in multiple relations. Several matrices (with different types of support) are factored simultaneously with shared parameters CMF is extended to a hierarchical Bayesian model to enhance the sharing of statistics strength

An example of application Functional Magnetic Resonance Imaging (fMRI): – fMRI data can be viewed as a relation (real valued), Response(stimulus, voxel) ∈ [0, 1] – stimulus side-information: a relation (binary) Co- occurs(word, stimulus) ∈ {0, 1} ( which is collected as the statistics of whether the stimulus word co-occurs with other commonly- used words in large ) – The goal is to predict unobserved values of the Response relation

Basic model description In fMRI example, the Co-occurs relation is an m×n matrix X; the Response relation is an n×r matrix Y. Likelihood of each matrix X and Y: Co-occurs (p_X) is modeled by the Bernoulli distribution, Response (p_Y) is modeled by a Gaussian.

Hierarchical Collective Matrix Factorization Information between entities can only be shared indirectly, through another facto: e.g., in f(UV’), two distinct rows of U are correlated only through V. The hierarchical prior acts as a shrinkage estimator for the rows of a factor, pooling information indirectly, through Θ.

Bayesian Inference Hessian Metropolis-Hastings: – In random walk Metropolis-Hastings it samples from a proposal distribution defined by a Gaussian with mean equal to the sample at time t, F_i(t) and covariance matrix, which is problematic. – HMH uses both the gradient and Hessian to automatically construct a proposal distribution at each sampling step. This is claimed as the main technical contribution of the UAI2010 paper.

Related work

Experiment setting The Co-occurs(word, stimulus) relation is collected by measuring whether or not the stimulus word occurs within five tokens of a word in the Google Tera-word corpus. Hold-out prediction: Fold-in prediction (to predict a new row in Y)

Experiment results

Discussions Existing methods force one to choose between ignoring parameter uncertainty or making Gaussianity assumptions. Non-Gaussian response types significantly improve predictive accuracy. While non-Gaussianity complicates the construction of proposal distributions for Metropolis-Hastings, it does have a significant impact on predictive accuracy