Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
Copula Regression By Rahul A. Parsa Drake University &
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data.
Model assessment and cross-validation - overview
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Visual Recognition Tutorial
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Maximum likelihood (ML) and likelihood ratio (LR) test
Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Maximum likelihood (ML) and likelihood ratio (LR) test
Nonparametric Regression and Clustered/Longitudinal Data
Evaluating Hypotheses
Parametric Inference.
Gene-Environment Case-Control Studies Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Gene-Environment Case-Control Studies Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Maximum likelihood (ML)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Review of Lecture Two Linear Regression Normal Equation
SOLUTION FOR THE BOUNDARY LAYER ON A FLAT PLATE
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Chapter 13: Inference in Regression
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Gene-Environment Case-Control Studies
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Effect of the Reference Set on Frequency Inference Donald A. Pierce Radiation Effects Research Foundation, Japan Ruggero Bellio Udine University, Italy.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
12. Principles of Parameter Estimation
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University Papers available at my web site TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A A A A A

Texas is surrounded on all sides by foreign countries: Mexico to the south and the United States to the east, west and north

College Station, home of Texas A&M University I-35 I-45 Big Bend National Park Wichita Falls, Wichita Falls, that’s my hometown West Texas Palo Duro Canyon, the Grand Canyon of Texas Guadalupe Mountains National Park East Texas 

Palo Duro Canyon of the Red River

Co-Authors Arnab Maity

Co-Authors Nilanjan Chatterjee

Co-Authors Kyusang YuEnno Mammen

Outline Parametric Score Tests Straightforward extension to semiparametric models Profile Score Testing Gene-Environment Interactions Repeated Measures

Parametric Models Parametric Score Tests Parameter of interest =   Nuisance parameter =   Interested in testing whether   Log-Likelihood function =

Parametric Models Score Tests are convenient when it is easy to maximize the null loglikelihood But hard to maximize the entire loglikelihood

Parametric Models Let be the MLE for a given value of  Let subscripts denote derivatives Then the normalized score test statistic is just

Parametric Models Let be the Fisher Information evaluated at  = 0, and with sub-matrices such as Then using likelihood properties, the score statistic under the null hypothesis is asymptotically equivalent to

Parametric Models The asymptotic variance of the score statistic is Remember, all computed at the null  = 0 Under the null, if  = 0 has dimension p, then

Parametric Models The key point about the score test is that all computations are done at the null hypothesis Thus, if maximizing the loglikelihood at the null is easy, the score test is easy to implement.

Semiparametric Models Now the loglikelihood has the form Here, is an unknown function. The obvious score statistic is Where is an estimate under the null

Semiparametric Models Estimating in a loglikelihood like This is standard Kernel methods used local likelihood Splines use penalized loglikelihood

Simple Local Likelihood Let K be a density function, and h a bandwidth Your target is the function at z The kernel weights for local likelihood are If K is the uniform density, only observations within h of z get any weight

Simple Local Likelihood Only observations within h = 0.25 of x = -1.0 get any weight

Simple Local Likelihood Near z, the function should be nearly linear The idea then is to do a likelihood estimate local to z via weighting, i.e., maximize Then announce

Simple Local Likelihood It is well-known that the optimal bandwidth is The bandwidth can be estimated from data using such things as cross-validation

Score Test Problem The score statistic is Unfortunately, when this statistic is no longer asymptotically normally distributed with mean zero The asymptotic test level = 1!

Score Test Problem The problem can be fixed up in an ad hoc way by setting This defeats the point of the score test, which is to use standard methods, not ad hoc ones.

Profiling in Semiparametrics In profile methods, one does a series of steps For every , estimate the function by using local likelihood to maximize Call it

Profiling in Semiparametrics Then maximize the semiparametric profile loglikelihood Often difficult to do the maximization, hence the need to do score tests

Profiling in Semiparametrics The semiparametric profile loglikelihood has many of the same features as profiling does in parametric problems. The key feature is that it is a projection, so that it is orthogonal to the score for, or to any function of Z alone.

Profiling in Semiparametrics The semiparametric profile score is

Profiling in Semiparametrics The problem is to compute Without doing profile likelihood!

Profiling in Semiparametrics The definition of local likelihood is that for every , Differentiate with respect to .

Profiling in Semiparametrics Then Algorithm: Estimate numerator and denominator by nonparametric regression All done at the null model!

Results There are two things to estimate at the null model Any method can be used without affecting the asymptotic properties Not true without profiling

Results We have implemented the test in some cases using the following methods: Kernels Splines from gam in Splus Splines from R Penalized regression splines All results are similar: this is as it should be: because we have projected and profiled, the method of fitting does not matter

Results The null distribution of the score test is asymptotically the same as if the following were known

Results This means its variance is the same as the variance of This is trivial to estimate If you use different methods, the asymptotic variance may differ

Results With this substitution, the semiparametric score test requires no undersmoothing Any method works How does one do undersmoothing for a spline or an orthogonal series?

Results Finally, the method is a locally semiparametric efficient test for the null hypothesis The power is: the method of nonparametric regression that you use does not matter

Example Colorectal adenoma: a precursor of colorectal cancer N-acetyltransferase 2 (NAT2): plays important role in detoxification of certain aromatic carcinogen present in cigarette smoke Case-control study of colorectal adenoma Association between colorectal adenoma and the candidate gene NAT2 in relation to smoking history.

Example Y = colorectal adenoma X = genetic information (below) Z = years since stopping smoking

More on the Genetics Subjects genotyped for six known functional SNP’s related to NAT2 acetylation activity Genotype data were used to construct diplotype information, i.e., The pair of haplotypes the subjects carried along their pair of homologous chromosomes

More on the Genetics We identifies the 14 most common diplotypes We ran analyses on the k most common ones, for k = 1,…,14

The Model The model is a version of what is done in genetics, namely for arbitrary, The interest is in the genetic effects, so we want to know whether  However, we want more power if there are interactions

The Model For the moment, pretend is fixed This is an excellent example of why score testing: the model is very difficult to fit numerically With extensions to such things as longitudinal data and additive models, it is nearly impossible to fit

The Model Note however that under the null, the model is simple nonparametric logistic regression Our methods only require fits under this simple null model

The Method The parameter is not identified at the null However, the derivative of the loglikelihood evaluated at the null depends on The, the score statistic depends on

The Method Our theory gives a linear expansion and an easily calculated covariance matrix for each The statistic as a process in converges weakly to a Gaussian process

The Method Following Chatterjee, et al. (AJHG, 2006), the overall test statistic is taken as (a,c) are arbitrary, but we take it as (-3,3)

Critical Values Critical values are easy to obtain via simulation Let b=1,…,B, and let Recall By the weak convergence, this has the same limit distribution as (with estimates under the null) in the simulated world

Critical Values This means that the following have the same limit distributions under the null This means you just simulate a lot of times to get the null critical value

Simulation We did a simulation under a more complex model (theory easily extended) Here X = independent BVN, variances = 1, and with means given as c = 0 is the null

Simulation In addition, We varied the true values as

Power Simulation

Simulation Summary The test maintains its Type I error Little loss of power compared to no interaction when there is no interaction Great gain in power when there is interaction Results here were for kernels: almost numerically identical for penalized regression splines

NAT2 Example Case-control study with 700 cases and 700 controls As stated before, there were 14 common diplotypes Our X was the design matrix for the k most common, k = 1,2,…,14

NAT2 Example Z was years since stopping smoking Co-factors S were age and gender The model is slightly more complex because of the non-smokers (Z=0), but those details hidden here

NAT2 Example Results

Stronger evidence of genetic association seen with the new model For example, with 12 diplotypes, our p-value was 0.036, the usual method was 0.214

Extensions: Repeated Measures We have extended the results to repeated measures models If there are J repeated measures, the loglikelihood is Note: one function, but evaluated multiple times

Extensions: Repeated Measures If there are J repeated measures, the loglikelihood is There is no straightforward kernel method for this Wang (2003, Biometrika) gave a solution in the Gaussian case with no parameters Lin and Carroll (2006, JRSSB) gave the efficient profile solution in the general case including parameters

Extensions: Repeated Measures It is straightforward to write out a profiled score at the null for this loglikelihood The form is the same as in the non-repeated measures case: a projection of the score for onto the score for

Extensions: Repeated Measures Here the estimation of is not trivial because it is the solution of a complex integral equation

Extensions : Repeated Measures Using Wang (2003, Biometrika) method of nonparametric regression using kernels, we have figured out a way to estimate This solution is the heart of a new paper (Maity, Carroll, Mammen and Chatterjee, JRSSB, 2009)

Extensions : Repeated Measures The result is a score based method: it is based entirely on the null model and does not need to fit the profile model It is a projection, so any estimation method can be used, not just kernels There is an equally impressive extension to testing genetic main effects in the possible presence of interactions

Extensions : Nuisance Parameters Nuisance parameters are easily handled with a small change of notation

Extensions: Additive Models We have developed a version of this for the case of repeated measures with additive models in the nonparametric part

Extensions: Additive Models The additive model method uses smooth backfitting (see multiple papers by Park, Yu and Mammen)

Summary Score testing is a powerful device in parametric problems. It is generally computationally easy It is equivalent to projecting the score for onto the score for the nuisance parameters

Summary We have generalized score testing from parametric problems to a variety of semiparametric problems This involved a reformulation using the semiparametric profile method It is equivalent to projecting the score for onto the score for The key was to compute this projection while doing everything at the null model

Summary Our approach avoided artificialities such as ad hoc undersmoothing It is semiparametric efficient Any smoothing method can be used, not just kernels Multiple extensions were discussed