Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Model assessment and cross-validation - overview
The General Linear Model. The Simple Linear Model Linear Regression.
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Forecasting JY Le Boudec 1. Contents 1.What is forecasting ? 2.Linear Regression 3.Avoiding Overfitting 4.Differencing 5.ARMA models 6.Sparse ARMA models.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Maximum likelihood (ML) and likelihood ratio (LR) test
Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Maximum likelihood (ML) and likelihood ratio (LR) test
Nonparametric Regression and Clustered/Longitudinal Data
Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Gene-Environment Case-Control Studies Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Model Selection in Semiparametrics and Measurement Error Models Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Linear and generalised linear models
Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University.
Maximum likelihood (ML)
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Outline Separating Hyperplanes – Separable Case
Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.
Model Inference and Averaging
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Machine Learning 5. Parametric Methods.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Univariate Time series - 2 Methods of Economic Investigation Lecture 19.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Estimating standard error using bootstrap
The simple linear regression model and parameter estimation
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
Model Inference and Averaging
Ch3: Model Building through Regression
CJT 765: Structural Equation Modeling
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Generally Discriminant Analysis
Presentation transcript:

Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data

Outline Series of Semiparametric Problems: Panel data Matched studies Family studies Finance applications

Outline General Framework: Likelihood-criterion functions Algorithms: kernel-based General results: Semiparametric efficiency Backfitting and profiling Splines and kernels: Summary and conjectures

Xihong Lin Harvard University Acknowledgments

Basic Problems Semiparametric problems Parameter of interest, called Unknown function The key is that the unknown function is evaluated multiple times in computing the likelihood for an individual

Example 1: Panel Data i = 1,…,n clusters/individuals j = 1,…,m observations per cluster SubjectWave 1Wave 2…Wave m 1XXX 2XXX …X nXXX

Example 1: Marginal Parametric Model Y = Response X,Z = time-varying covariates General Result: We can improve efficiency for  by accounting for correlation: Generalized Least Squares (GLS)

Example 1: Marginal Semiparametric Model Y = Response X,Z = varying covariates Question: can we improve efficiency for  by accounting for correlation?

Example 1: Marginal Nonparametric Model Y = Response X = varying covariate Question: can we improve efficiency by accounting for correlation? (GLS)

Example 2: Matched Studies Prospective logistic model: i = person, S = stratum The usual idea is that the stratum-dependent random variables may have been chosen by an extremely weird process, hence impossible to model.

Example 2: Matched Studies The usual likelihood is determined by Note how the conditioning removes Also note: function evaluated twice per stratum

Example 3: Model in Finance Model in finance Note how the function is evaluated m- times for each subject

Example 3: Model in Finance Model in finance Previous literature used an integration estimator, namely first solved via backfitting: Computation was pretty horrible For us, exact computation, general theory

Example 4: Twin Studies Family consists of twins, followed longitudinally Baseline for each twin modeled nonparametrically via Longitudinal modeled parametrically via

General Formulation These examples all have common features: They have a parameter They have an unknown function The function is evaluated multiple times for each unit (individual, matched pair, family) This distinguishes it from standard semiparametric models

General Formulation Y ij = Response X ij,Z ij = possibly varying covariates Loglikelihood (or criterion function) All my examples have the criterion function

General Formulation: Examples Loglikelihood (or criterion function) As stated previously, this is not a standard semiparametric problem, because of the multiple function evaluations

General Formulation: Overview Loglikelihood (or criterion function) For these problems, I will give constructive methods of estimation with Asymptotic expansions and inference available If the criterion function is a likelihood function, then the methods are semiparametric efficient. Methods avoid solving integral equations

The Semiparametric Model Y = Response X,Z = time-varying covariates Question: can we improve efficiency for  by accounting for correlation, i.e., what method is semiparametric efficient?

Semiparametric Efficiency The semiparametric efficient score is readily worked out. Involves a Fredholm equation of the 2 nd kind Effectively impossible to solve directly: Involves densities of each X conditional on the others The usual device of solving integral equations does not work here (or at least is not worth trying)

The Efficient Score (Yuck!)

My Approach First pretend that if you knew, then you could solve for. I am going to suggest an algorithm for then estimating I am then going to turn to the question of estimating

Profile methods work like this. Fix Apply your smoother Call the result Maximize the Gaussian Loglikelihood function in Explicit solution for most smoothers in Gaussian cases Profiling in Gaussian Problems

Profile methods maximize This can be difficult numerically in nonlinear problems A type of backfitting is often much easier numerically Profiling

Backfitting Methods Backfitting methods work like this. Fix Apply your smoother Call the result Maximize the Loglikelihood function in : Iterate until convergence (explicit solution for most smoothers, but different from profiling)

Backfitting/Profiling Example Partially linear model, one function Define Fit the expectations by local linear kernel regression (or whatever)

Backfitting/Profiling Example The Estimators are These are numerically different, but asymptotically equivalent The equivalence is a subtle calculation, even in this simple context

Backfitting/Profiling Example The asymptotic equivalence of profiling and backfitting in this partially linear model has one subtlety Profiling: off-the-shelf smoothers are OK Backfitting: off-the-shelf smoothers need to be undersmoothed to get rid of asymptotic bias

Backfitting/Profiling Hu, et al. (2004, Biometrika) showed that in general problems: Backfitting is generally more variable than profiling, for linear-type problems Backfitting and profiling need not necessarily have the same limit distributions

General Formulation: Revisited Y ij = Response X ij,Z ij = varying covariates Loglikelihood (or criterion function) The key is that the function is evaluated multiple times for each individual The goal is to estimate and efficiently

General Formulation: Revisited What I want to show you is a constructive solution, i.e., one that can be computed Different from solving integral equations Completely general Theoretically sound The methodology is based on kernel methods, i.e., local methods. First a little background

Simple Local Likelihood Consider a nonparametric regression with iid data The Loglikelihood function is

Simple Local Likelihood Let K be a density function, and h a bandwidth Your target is the function at x The kernel weights for local likelihood are If K is the uniform density, only observations within h of x get any weight

Simple Local Likelihood Only observations within h = 0.25 of x = -1.0 get any weight

Simple Local Likelihood Near x, the function should be nearly linear The idea then is to do a likelihood estimate local to x via weighting, i.e., maximize Then announce

Simple Local Likelihood In the linear model, local likelihood is local linear regression It is essentially equivalent to loess, splines, etc. I’ll now use local likelihood ideas to solve the general problem

General Formulation: Revisited Likelihood (or criterion function) The goal is to estimate the function at a target value t Fix. Pretend that the formulation involves different functions

General Formulation: Revisited Pretend that the formulation involves different functions Pretend that are known Fit a local linear regression via local likelihood: Get the local score function for

General Formulation: Revisited Repeat: Pretend knowing Fit a local linear regression: Get the local score function Finally, solve Explicit solution in the Gaussian cases

Main Results Semiparametric Efficient for Backfitting (under-smoothed) = profiling The equivalence of backfitting and profiling is not obvious in the general case.

Main Results Explicit variance formulae High-order expansions for parameters and functions Used for estimating population quantities such as population means, etc.

Marginal Approaches The most standard approach is a marginal one Often, we can write, for known G, Similar would be to write the likelihood function for single observations:

Marginal Approaches The marginal approaches ignore the correlation structure Lots, and lots, and lots of papers Methods tend to be very inefficient if the correlation structure is important

Econometric Example In panel data, interest can be in random-fixed effects models Our usual variance components model: is independent of everything If so, this is a version of our partially linear model, hence already solved by us

Econometric Example Econometricians though worry that is correlated with Z or X This says that represents unmeasured variables. This is the fixed-effects model They want to know the effects of (X,Z), controlling for individual factors

Econometric Example Starting model: Get rid of the terms, e.g., A special case of our model!

Econometric Example Model: The terms are correlated over j = 2,…,m The variance efficiency loss of ignoring these correlations is (2+m)/4

Econometric Example Example: China Health and Nutrition Survey No parametric part Response Y = caloric intake (log scale) Predictor X = income Initial random effects model result suggests that for very low incomes, an increase in income is NOT associated with an increase in calories

Econometric Example Random effects model suggests that for very low incomes, an increase in income is NOT associated with an increase in calories The fixed effects model fits with economic theory and common sense Specification test confirms this

Econometric Example The fixed effects cubic regression fit is far too steep at either end. The nonparametric fit makes much more sense

Remarks on Splines Splines are a practical alternative to kernels Penalized splines (smoothing, P-splines, etc.) with penalty parameter = Easy to develop, very flexible Computable, truly nonparametric Difficult theory (Mammen & van der Geer, Mammen & Nielsen) In partially linear model for smoothing splines, for example, they are equivalent to kernel methods

Remarks on Splines Unpenalized splines There are theoretical results for non-penalized splines These methods assume fixed, known knots Then slowly grow the number of knots Theoretically equivalent to our methods The theory, and the method, is irrelevant

Unpenalized Splines No penalty and standard number of knots = crazy curves

Unpenalized Splines The theoretical results for unpenalized splines require that the relationship between the number of knots k and the sample size n be Every paper in this area does data analysis with <= 5 knots. Why?

Splines With Knot Selection There is a nice literature on using fixed-knot splines but with the knots selected Basically, use model-selection techniques to zero out some of the coefficients This gets the smoothness back

Conclusions General likelihood: Distinguishing Property: Unknown function evaluated repeatedly in each individual Kernel method: Iterated local likelihood calculations, explicit solution in Gaussian cases

Conclusions General results: Semiparametric efficient: construction, no integral equations need to be solved Backfitting and profiling: asymptotically equivalent

Conclusions Smoothing Splines and Kernels: Asymptotically the same in the Gaussian case Splines: generally easier to compute, although smoothing parameter selection can be intensive Unpenalized splines: irrelevant theory, need knot selection

Conclusions Splines and Kernels: One might conjecture that splines can be constructed for the general problem that are asymptotically efficient Open Problem: is this true, and how?

Thanks!

Conjectured Approach Mammen and Nielsen worked in a nonlinear least squares context with multiple functions Roughly, the obvious version of their method is Both methods are semiparametric efficient when profiled

Conjectured Approach Roughly, the obvious version of the Mammen and Nielsen method is This can be used for the model