Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
Probabilistic models Haixu Tang School of Informatics.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Bayesian Estimation in MARK
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Master thesis presentation Joanna Gatz TU Delft 29 of July 2007 Properties and Applications of the T copula.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
The General Linear Model. The Simple Linear Model Linear Regression.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Visual Recognition Tutorial
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Separate multivariate observations
Computer vision: models, learning and inference Chapter 5 The Normal Distribution.
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Markov Random Fields Probabilistic Models for Images
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Variational Inference for the Indian Buffet Process
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
CS Statistical Machine learning Lecture 24
Lecture 2: Statistical learning primer for biologists
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.
Principal Component Analysis (PCA)
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis B. P. Carlin and A. E. Gelfand Statistics and Computing 1991 A Generic Approach to Posterior.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Example The strength of concrete depends, to some extent on the method used for drying it. Two different drying methods were tested independently on specimens.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Introduction to Vectors and Matrices
Bayesian Semi-Parametric Multiple Shrinkage
Bayesian Generalized Product Partition Model
Advanced Statistical Computing Fall 2016
Basic simulation methodology
Accelerated Sampling for the Indian Buffet Process
Linear and generalized linear mixed effects models
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Presented by Kojo Essuman Ackah Spring 2018 STA 6557 Project
Summarizing Data by Statistics
Introduction to Vectors and Matrices
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presentation transcript:

Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang ECE, Duke University July 10, 2009 Peter D. Hoff to appear in Journal of Computational and Graphical Statistics

Outline Introduction and Motivations Sampling from the Vector Von Mises-Fisher (vMF) Distribution (existing method) Sampling from the Matrix Von Mises-Fisher (mMF) Distribution Sampling from the Bingham-Von Mises-Fisher (BMF) Distribution One Example Conclusions 1/21

Introduction The matrix Bingham distribution – quadratic term The matrix von Mises-Fisher distribution – linear term The matrix Bingham-von Mises-Fisher distribution Stiefel manifold: set of rank- orthonormal matrices, denoted 2/21

Motivations Sampling orthonormal matrices from distributions is useful for many applications. Examples: Factor analysis latent Given uniform priors over Stiefel manifold, observed matrix 3/21

Motivations Principal components observed matrix, with each row Eigen-value decomposition Likelihood Posterior with respect to uniform prior with 4/21

Motivations Network data, symmetric binary observed matrix, with the 0-1 indicator of a link between nodes i and j. Posterior with respect to uniform prior E: symmetric matrix of independent standard normal noise 5/21

Sampling from the vMF Distribution (wood, 1994) the modal vector; constant distribution for any given angle, concentration parameter A distribution on the -sphere in defines the modal direction. 6/21

Sampling from the vMF Distribution (wood, 1994) ( Proposal envelope ) (1) A simple direction For a fixed orthogonal matrix, 7/21 (2) An arbitrary direction

Sampling from the mMF Distribution Rejection sampling scheme 1: uniform envelope Acceptance region rejection region accept Sample when a bound Extremely inefficient 8/21

Sampling from the mMF Distribution Rejection sampling scheme 2: based on sampling from vMF Y Y Y 9/21 Proposal samples are drawn from vMF density functions with parameter, constrained to be orthogonal to other columns of.

Sampling from the mMF Distribution Rejection sampling scheme 2: based on sampling from vMF Y Y Y 9/21 Proposal samples are drawn from vMF density functions with parameter, constrained to be orthogonal to other columns of.

Sampling from the mMF Distribution Rejection sampling scheme 2: based on sampling from vMF Y Y Y Rotate the modal direction 9/21 Proposal samples are drawn from vMF density functions with parameter, constrained to be orthogonal to other columns of.

Sampling from the mMF Distribution Rejection sampling scheme 2: based on sampling from vMF Y Y Y Rotate the sample to be orthogonal to the previous columns 9/21 Proposal samples are drawn from vMF density functions with parameter, constrained to be orthogonal to other columns of.

Sampling from the mMF Distribution Rejection sampling scheme 2: based on sampling from vMF Proposal samples are drawn from vMF density functions with parameter, constrained to be orthogonal to other columns of. Y Y Y Proposal distribution 9/21

Sampling from the mMF Distribution Rejection sampling scheme 2: based on sampling from vMF Sample scheme: 10/21

Sampling from the mMF Distribution A Gibbs sampling scheme Sample iteratively Note that. When. remedy: sampling two columns at a time Non-orthogonality among the columns of add to the autocorrelation in the Gibbs sampler. remedy: performing the Gibbs sampler on 11/21

Sampling from the BMF Distribution The vector Bingham distribution 12/21

Sampling from the BMF Distribution The vector Bingham distribution 12/21

Sampling from the BMF Distribution The vector Bingham distribution Better mixing 12/21

Sampling from the BMF Distribution The vector Bingham distribution From variable substitution, rejection sampling or grid sampling 12/21

Sampling from the BMF Distribution The vector Bingham distribution The density is symmetric about zero 12/21

Sampling from the BMF Distribution The vector Bingham-von Mises-Fisher distribution The density is not symmetric about zero any more, is no longer uniformly distributed on. The update of and should be done jointly. The modified step 2(b) and 2(c) are: 13/21

Sampling from the BMF Distribution The matrix Bingham-von Mises-Fisher distribution 14/21 Rewrite

Sampling from the BMF Distribution The matrix Bingham-von Mises-Fisher distribution 15/21 Sample two columns at a time Parameterize 2-dimensional orthonormal matrices as Uniform pairs on the circle Uniform

Sampling from the BMF Distribution The matrix Bingham-von Mises-Fisher distribution 16/21

Example: Eigenmodel estimation for network data 17/21

Example: Eigenmodel estimation for network data indicator of a link between nodes i and j. Posterior with respect to uniform prior, symmetric binary observed matrix, with the /21 E: symmetric matrix of independent standard normal noise BMF distribution with

Samples from two independent Markov chains with different starting values Example: Eigenmodel estimation for network data 19/21

Example: Eigenmodel estimation for network data 20/21

Conclusions The sampling scheme of a family of exponential distributions over the Stiefel manifold was developed; This enables us to make Bayesian inference for those orthonormal matrices and incorporate prior information during the inference; The author mentioned several application and implemented the sampling scheme on a network data set. 21/21

References Andrew T. A. Wood. Simulation of the von Mises Fisher distribution. Comm. Statist. Simulation Comput., 23: , 1994 G. Ulrich. Computer generation of distributions on the m-sphere. Appl. Statist., 33, , 1984 J. G. Saw. A family of distributions on the m-sphere and some hypothesis tests. Biometrika, 65, 69-74, 1978