Download presentation
Presentation is loading. Please wait.
Published byNigel Mills Modified over 9 years ago
1
A Bayesian Matrix Factorization Model for Relational Data UAI 2010 Authors: Ajit P. Singh & Geoffrey J. Gordon Presenter: Xian Xing Zhang Relational Learning via Collective Matrix Factorization SIGKDD 2008
2
Basic ideas Collective matrix factorization is proposed for relational learning when an entity participates in multiple relations. Several matrices (with different types of support) are factored simultaneously with shared parameters CMF is extended to a hierarchical Bayesian model to enhance the sharing of statistics strength
3
An example of application Functional Magnetic Resonance Imaging (fMRI): – fMRI data can be viewed as a relation (real valued), Response(stimulus, voxel) ∈ [0, 1] – stimulus side-information: a relation (binary) Co- occurs(word, stimulus) ∈ {0, 1} ( which is collected as the statistics of whether the stimulus word co-occurs with other commonly- used words in large ) – The goal is to predict unobserved values of the Response relation
4
Basic model description In fMRI example, the Co-occurs relation is an m×n matrix X; the Response relation is an n×r matrix Y. Likelihood of each matrix X and Y: Co-occurs (p_X) is modeled by the Bernoulli distribution, Response (p_Y) is modeled by a Gaussian.
5
Hierarchical Collective Matrix Factorization Information between entities can only be shared indirectly, through another facto: e.g., in f(UV’), two distinct rows of U are correlated only through V. The hierarchical prior acts as a shrinkage estimator for the rows of a factor, pooling information indirectly, through Θ.
6
Bayesian Inference Hessian Metropolis-Hastings: – In random walk Metropolis-Hastings it samples from a proposal distribution defined by a Gaussian with mean equal to the sample at time t, F_i(t) and covariance matrix, which is problematic. – HMH uses both the gradient and Hessian to automatically construct a proposal distribution at each sampling step. This is claimed as the main technical contribution of the UAI2010 paper.
7
Related work
8
Experiment setting The Co-occurs(word, stimulus) relation is collected by measuring whether or not the stimulus word occurs within five tokens of a word in the Google Tera-word corpus. Hold-out prediction: Fold-in prediction (to predict a new row in Y)
9
Experiment results
10
Discussions Existing methods force one to choose between ignoring parameter uncertainty or making Gaussianity assumptions. Non-Gaussian response types significantly improve predictive accuracy. While non-Gaussianity complicates the construction of proposal distributions for Metropolis-Hastings, it does have a significant impact on predictive accuracy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.