Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007.

Similar presentations


Presentation on theme: "Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007."— Presentation transcript:

1 Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007

2 Outline Motivation Principal Components (PCR) Sliced Inverse Regression (SIR) Application Constrained Inverse Regression (CIR) Partial Inverse Regression (PIR) p > N problem simulation results

3 Motivation Estimate the high-dimensional model: y = g(x 1, x 2,..., x p ) Link function g(.) is unknown Small p (  6 variables) apply multivariate local (linear) polynomial regression Large p (> 10 variables), Curse of dimensionality => Empty space phenomenon

4 Principal Components (PCR, Massy 1965, JASA) PCR High-dimensional data X   x Eigenvalue decomposition  x e = e ( 1, e 1 ), ( 2, e 2 ),..., ( p, e p ) Retain K components, (e 1, e 2,..., e K ) where K < p Low-dimensional data, Z = (z 1, z 2,..., z K ) where z i = Xe i are the “new” variables (or factors) Low-dimensional subspace, K = ?? Not the most predictive variables Because y information is ignored

5 Sliced Inverse Regression (SIR, Li 1991, JASA) Similar idea: X n x p  Z n x K Generalized Eigen-decomposition   e =  x e where   = Cov(E[X|y]) Retain K* components, (e 1,..., e K* ) Create new variables Z = (z 1,..., z K* ), where z i = Xe i K* is the smallest integer q (= 0, 1, 2,...) such that Most predictive variables across any set of unit-norm vectors e’s and any transformation T(y)

6 SIR Applications (Naik, Hagerty, Tsai 2000, JMR) Model p variables reduced to K factors New Product Development context 28 variables  1 factor Direct Marketing context 73 variables  2 factors

7 Constrained Inverse Regression (CIR, Naik and Tsai 2005, JASA) Can we extract meaningful factors? Yes First capture this information in a set of constraints Then apply our proposed method, CIR

8 Example 4.1 from Naik and Tsai (2005, JASA) Consider 2-Factor Model p = 5 variables Factor 1 includes variables (4,5) Factor 2 includes variables (1,2,3) Constraint sets:

9 CIR (contd.) CIR approach Solve the eigenvalue decomposition: (I-P c )   e =  x e where the projection matrix When P c = 0, we get SIR (i.e., nested) Shrinkage (e.g., Lasso) set insignificant effects to zero by formulating an appropriate constraint improves t-values for the other effects (i.e., efficiency)

10 p > N Problem OLS, MLE, SIR, CIR break down when p > N Partial Inverse Regression (Li, Cook, Tsai, Biometrika, forthcoming) Combines ideas from PLS and SIR Works well even when p > 3N Variables are highly correlated Single-index Model g(.) unknown

11 p > N Solution To estimate , first construct the matrix R as follows where e 1 is the principal eigenvector of   = Cov(E[X|y]) Then

12 Conclusions Inverse Regression Methods offer estimators that are applicable for a remarkably broad class of models high-dimensional data including p > N (which is conceptually the limiting case) Estimators are closed-form, so Easy to code (just a few lines) Computationally inexpensive No iterations or re-sampling or draws (hence no do or for loops) Guaranteed convergence Standard errors for inference are derived in the cited papers


Download ppt "Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007."

Similar presentations


Ads by Google