Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007.

Slides:



Advertisements
Similar presentations
A Generalized Nonlinear IV Unit Root Test for Panel Data with Cross-Sectional Dependence Shaoping Wang School of Economics, Huazhong University of Science.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Chapter Outline 3.1 Introduction
Copula Regression By Rahul A. Parsa Drake University &
Covariance Matrix Applications
The General Linear Model Or, What the Hell’s Going on During Estimation?
Dimension reduction (1)
Data mining and statistical learning - lecture 6
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
LMS Durham Symposium Dynamical Systems and Statistical Mechanics 3 – 13 July 2006 Locating periodic orbits in high-dimensional systems by stabilising transformations.
Dimensional reduction, PCA
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name.
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Finding Eigenvalues and Eigenvectors What is really important?
Newton's Method for Functions of Several Variables
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Summarized by Soo-Jin Kim
Presented By Wanchen Lu 2/25/2013
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Structure Preserving Embedding Blake Shaw, Tony Jebara ICML 2009 (Best Student Paper nominee) Presented by Feng Chen.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab.
Sparse Inverse Covariance Estimation with Graphical LASSO J. Friedman, T. Hastie, R. Tibshirani Biostatistics, 2008 Presented by Minhua Chen 1.
Domain Range definition: T is a linear transformation, EIGENVECTOR EIGENVALUE.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
CSE 185 Introduction to Computer Vision Face Recognition.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Lecture 2: Statistical learning primer for biologists
A Convergent Solution to Tensor Subspace Learning.
Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA TOKDAR Biometrika, 2008.
CpSc 881: Machine Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Status of Reference Network Simulations John Dale ILC-CLIC LET Beam Dynamics Workshop 23 June 2009.
Logistic Regression & Elastic Net
2D-LDA: A statistical linear discriminant analysis for image matrix
The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Principal Components Analysis ( PCA)
Reduced echelon form Matrix equations Null space Range Determinant Invertibility Similar matrices Eigenvalues Eigenvectors Diagonabilty Power.
Unsupervised Learning II Feature Extraction
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
PREDICT 422: Practical Machine Learning Module 4: Linear Model Selection and Regularization Lecturer: Nathan Bastian, Section: XXX.
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Eigenvalues and Eigenvectors
Some useful linear algebra
Principal Component Analysis
Eigenvalues and Eigenvectors
Feature space tansformation methods
Basis Expansions and Generalized Additive Models (1)
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Eigenvectors and Eigenvalues
Image Stitching Linda Shapiro ECE/CSE 576.
Image Stitching Linda Shapiro ECE P 596.
Presentation transcript:

Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007

Outline Motivation Principal Components (PCR) Sliced Inverse Regression (SIR) Application Constrained Inverse Regression (CIR) Partial Inverse Regression (PIR) p > N problem simulation results

Motivation Estimate the high-dimensional model: y = g(x 1, x 2,..., x p ) Link function g(.) is unknown Small p (  6 variables) apply multivariate local (linear) polynomial regression Large p (> 10 variables), Curse of dimensionality => Empty space phenomenon

Principal Components (PCR, Massy 1965, JASA) PCR High-dimensional data X   x Eigenvalue decomposition  x e = e ( 1, e 1 ), ( 2, e 2 ),..., ( p, e p ) Retain K components, (e 1, e 2,..., e K ) where K < p Low-dimensional data, Z = (z 1, z 2,..., z K ) where z i = Xe i are the “new” variables (or factors) Low-dimensional subspace, K = ?? Not the most predictive variables Because y information is ignored

Sliced Inverse Regression (SIR, Li 1991, JASA) Similar idea: X n x p  Z n x K Generalized Eigen-decomposition   e =  x e where   = Cov(E[X|y]) Retain K* components, (e 1,..., e K* ) Create new variables Z = (z 1,..., z K* ), where z i = Xe i K* is the smallest integer q (= 0, 1, 2,...) such that Most predictive variables across any set of unit-norm vectors e’s and any transformation T(y)

SIR Applications (Naik, Hagerty, Tsai 2000, JMR) Model p variables reduced to K factors New Product Development context 28 variables  1 factor Direct Marketing context 73 variables  2 factors

Constrained Inverse Regression (CIR, Naik and Tsai 2005, JASA) Can we extract meaningful factors? Yes First capture this information in a set of constraints Then apply our proposed method, CIR

Example 4.1 from Naik and Tsai (2005, JASA) Consider 2-Factor Model p = 5 variables Factor 1 includes variables (4,5) Factor 2 includes variables (1,2,3) Constraint sets:

CIR (contd.) CIR approach Solve the eigenvalue decomposition: (I-P c )   e =  x e where the projection matrix When P c = 0, we get SIR (i.e., nested) Shrinkage (e.g., Lasso) set insignificant effects to zero by formulating an appropriate constraint improves t-values for the other effects (i.e., efficiency)

p > N Problem OLS, MLE, SIR, CIR break down when p > N Partial Inverse Regression (Li, Cook, Tsai, Biometrika, forthcoming) Combines ideas from PLS and SIR Works well even when p > 3N Variables are highly correlated Single-index Model g(.) unknown

p > N Solution To estimate , first construct the matrix R as follows where e 1 is the principal eigenvector of   = Cov(E[X|y]) Then

Conclusions Inverse Regression Methods offer estimators that are applicable for a remarkably broad class of models high-dimensional data including p > N (which is conceptually the limiting case) Estimators are closed-form, so Easy to code (just a few lines) Computationally inexpensive No iterations or re-sampling or draws (hence no do or for loops) Guaranteed convergence Standard errors for inference are derived in the cited papers