Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
The Maximum Likelihood Method
Linear Regression.
Experiments and Variables
The General Linear Model Or, What the Hell’s Going on During Estimation?
Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data.
Model assessment and cross-validation - overview
Chapter 2: Lasso for linear models
Data mining and statistical learning - lecture 6
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
The Simple Linear Regression Model: Specification and Estimation
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Generalized Regression Model Based on Greene’s Note 15 (Chapter 8)
Nonparametric Regression and Clustered/Longitudinal Data
Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University
Statistical Background
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Chapter 15 Panel Data Analysis.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Linear and generalised linear models
Continuous Random Variables and Probability Distributions
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Linear and generalised linear models
Economics Prof. Buckles
Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University.
The Paradigm of Econometrics Based on Greene’s Note 1.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
INTRODUCTION When two or more instruments sound the same portion of atmosphere and observe the same species either in different spectral regions or with.
Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
CY3A2 System identification Input signals Signals need to be realisable, and excite the typical modes of the system. Ideally the signal should be persistent.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Univariate Time series - 2 Methods of Economic Investigation Lecture 19.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Estimating standard error using bootstrap
Probability Theory and Parameter Estimation I
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Latent Variables, Mixture Models and EM
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
OVERVIEW OF LINEAR MODELS
Presentation transcript:

Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data

Outline Longitudinal, marginal nonparametric model Kernel Methods Working independence pseudo observation methods Comparison With Smoothing Splines Locality, Consistency, etc.

Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster SubjectWave 1Wave 2…Wave m 1XXX 2XXX …X nXXX

Panel Data i = 1,…,n clusters/individuals j = 1,…,m observations n = 1, m  Standard Time Series n , m bounded: our case n , m  Some results apply

The Marginal Nonparametric Model Y = Response X = time-varying covariate Question: can we improve efficiency by accounting for correlation? We can do so for parametric problems (GLS)

Independent Data Splines (smoothing, P-splines, etc.) with penalty parameter = Ridge regression fit Some bias, smaller variance is over-parameterized least squares is a polynomial regression

Independent Data Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h As the bandwidth h  0, only observations with X near t get any weight in the fit

Kernel Methods Largely based on working independence Ignores correlation structure entirely in the fitting Fixes up standard errors afterwards Large literature (see my web site for papers and references Significant loss of efficiency possible, as with any problem

Kernel Methods First kernel methods trying to account for correlation failed Bizarre result: Knowing the correct covariance matrix was worse than working independence Justification for working independence? Difficulty: defining “locality” for multivariate observations with the same mean function

Pseudo-observation Kernel Methods Pseudo-observations transform the responses Construction: linear transformation of Y Mean =  (X) remains unchanged Obvious(?): make the covariance matrix diagonal Apply standard kernel smoothers to independent pseudo-observations

Pseudo-observation Kernel Methods Choices: infinite, but at least one works Note: The mean is unchanged Iterate: Start with W.I., transform, apply working independence smoother, etc. Efficiency: Always better than working independence

Pseudo-observation Kernel Methods Construction: Mean =  (X) unchanged Covariance = diagonal, back to independence Generalizes to Time Series, say AR(1) Efficiency with respect to working independence

Pseudo-observation Kernel Methods Time Series: Generalizations to finite order ARMA process possible Multiple transformations chosen so that resulting estimates are asymptotically independent, then average In AR(1), reverse the roles of current and lagged variables It is not clear, however, that insisting on a transformation to independence is efficient

Accounting for Correlation Splines have an obvious analogue for non- independent data Let be the covariance matrix Penalized Generalized least squares (GLS) GLS ridge regression Because splines are based on likelihood ideas, they generalize quickly to new problems

Efficiency (with respect to WI) of Splines and Pseudo-Observation Kernels: Splines Superior Exchng: Exchangeable with correlation 0.6 AR: autoregressive with correlation 0.6 Near Sing: A nearly singular matrix

New Construction Due to Naisyin Wang (Biometrika, 2003) Multiple steps Get initial estimate m observations per cluster/individual Consider observation j=1. Assume that is known and equal to for k=2,…,m Form local likelihood score with only the 1 st component mean unknown

New Construction Continue. Consider observation j. Assume that is known and equal to for Form local likelihood score with only the j th component mean unknown Repeat for all j Sum local likelihood scores over j and solve Gives new Now iterate.

Efficiency (with respect to WI) of Splines and Wang-type Kernels: Nearly identical Exchng: Exchangeable with correlation 0.6 AR: autoregressive with correlation 0.6 Near Sing: A nearly singular matrix

GLS Splines and New Kernels We now know the relationship between GLS Splines and the new kernel methods Both are pseudo-observation methods Identical pseudo-observations Working independence is applied to both pseudo-observations Fitting methods at each stage differ (splines versus kernels!) Independence?: the pseudo-observations are not

GLS Splines and New Kernels Let be the inverse covariance matrix Form the pseudo-observations: Weight the j th component: Algorithm: iterate until convergence Use your favorite method (splines, kernels, etc.) This is what GLS splines and new Kernels do Not a priori obvious!

GLS Splines and New Kernels It is easy to see that GLS splines have an exact formula (GLS ridge regression) Less obvious but true that the new kernel methods also have an exact formula Both are linear in the responses

GLS Splines and New Kernels: Locality Write the linear expressions We generated data, fixed the first X for the first person at X 11 = 0.25 Then we investigated the weight functions as a function of t, where you want to estimate the regression function

The weight functions W S,ij (t,X 11 =0.25) and W K,ij (t,X 11 =0.25) for a specific case for correlated data, working independence Note the similarity of shape and the locality: only if t is near =0.25 does X 11 = 0.25 get any weight Red = Kernel Blue = Spline

The weight functions W S,ij (t,X 11 =0.25) and W K,ij (t,X 11 =0.25) for a specific case for correlated data, GLS Note the similarity of shape and the lack of locality: Red = Kernel Blue = Spline

The weight functions W S,ij (t,X 11 =0.25) and W S,ij (t,X 11 =0.25) for a specific case for correlated data, GLS versus Working Independence Red = GLS Blue = Working Independence

Three Questions Why are neither GLS splines nor Kernels local in the usual sense? The weight functions look similar in data. Does this mean that splines and kernels are in some sense asymptotically equivalent? Theory for Kernels is possible. Can we use these results/ideas to derive bias/variance theory for GLS splines?

Locality GLS Splines and Kernels are iterative versions of working independence applied to Nonlocality is thus clear: if any X in a cluster or individual, say X i1, is near t, then all X’s in that cluster, such as X i2, get weight for  (t) Locality is thus at the cluster/individual level

Spline and Kernel Equivalence We have shown that for correlated data, a result similar to Silverman’s for independent data hold. Asymptotically, the spline weight function is equivalent to the same kernel weight function described by Silverman

Spline and Kernel Equivalence The bandwidth though changes: for cubic smoothing splines with smoothing parameter, let Let the density of X ij be f j Then the effective bandwidth at t is Note how this depends on the correlation structure

Asymptotic Theory for GLS Splines We have derived results that show that GLS splines have smaller asymptotic variance than WI splines for the same bandwidth Same result holds for splines

Asymptotic Theory for GLS Splines We have derived the bias and variance formulae for cubic smoothing splines with fixed penalty parameter  0 Without going into technical details, these formulae are the same as those for kernels with the equivalent bandwidth Generalizes work of Nychka to non-iid settings

Conclusions Accounting for correlation to improve efficiency in nonparametric regression is possible Pseudo-observation methods can be defined, and form an essential link GLS splines and the “right” GLS kernels have the same asymptotics Locality of estimation is at the cluster level, and not the individual X ij level. GLS spline theory

Raymond CarrollAlan Welsh Naisyin WangEnno Mammen Xihong Lin Oliver Linton Coauthors Series of papers summarizing these results and their history are on my web site Fairy Penguin

Advertisement Semiparametric Regression Regression via penalized regression splines Cambridge University Press, 2003 David RuppertMatt Wand Raymond Carroll