Download presentation
Presentation is loading. Please wait.
Published byParker Lefort Modified over 9 years ago
1
Bayesian Factor Regression Models in the “Large p, Small n” Paradigm Mike West, Duke University Presented by: John Paisley Duke University
2
Outline Empirical Factor Regression (SVD) Latent Factor Regression Sparse Factor Regression
3
Linear Regression & Empirical Factor Regression Linear Regression SVD Regression D is a diagonal matrix of singular values
4
Empirical Factor Regression By definition, Regression is now done in factor space using generalized shrinkage (ridge regression) priors on, e.g. RVM Problem of inversion:has many-to-one mapping is canonical “least-norm” inverse
5
Example: Biscuit Dough Data NIR spectroscopy reflectance values are predictors Response is fat content of dough samples 39 training, 39 testing: data are pooled and testing data responses treated as missing values to be imputed Top 16 factors used, based on size of singular values
6
Example: Biscuit Dough Data (2) Left: Fitted and predicted vs true values Right: Least-norm inverse of beta ~ 1700 nm range is absorbance region for fat As can be seen, solution is not sparse
7
Latent Factor Regression Loosen to Under proper constraints on B, this finds common structure in X and isolates idiosyncrasies to noise Now, variation in X has less effect on y The implied prior is When variance, Phi 0, this reverts to empirical linear regression
8
Sparse Latent Factor Regression WRT gene expression profiling, “multiple biological factors underlie patterns of gene expression variation, so latent factor approaches are natural – we imagine that latent factors reflect individual biological functions… This is a motivating context for sparse models.” Columns of B represents the genes involved in a particular biological factor. Rows of B represent a particular gene’s involvement across biological factors.
9
Example: Gene Expression Data p = 6128 genes measured using Affymetrix DNA microarrays n = 49 breast cancer tumor samples k = 25 factors Factor 3 separates by red: estrogen receptor positive tumors blue: ER negative
10
Example: Gene Expression Data Comparison with results obtained using empirical SVD factors
11
Conclusion Sparse factor regression modeling is a promising framework for dimensionality reduction of predictors. Only those factors that are relevant (e.g. factor 3) are of interest. Therefore, only those genes with non-zero values in that column of B are meaningful.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.