Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Brief introduction on Logistic Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
EPI 809/Spring Probability Distribution of Random Error.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
(c) 2007 IUPUI SPEA K300 (4392) Outline Least Squares Methods Estimation: Least Squares Interpretation of estimators Properties of OLS estimators Variance.
Models with Discrete Dependent Variables
Multiple regression analysis
Chapter 10 Simple Regression.

1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 4 Multiple Regression.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Clustered or Multilevel Data
Chapter 11 Multiple Regression.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Bootstrapping applied to t-tests
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Correlation & Regression
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
Chapter 13: Inference in Regression
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Regression Analysis (2)
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Correlation.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Random Sampling, Point Estimation and Maximum Likelihood.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
DOX 6E Montgomery1 Design of Engineering Experiments Part 9 – Experiments with Random Factors Text reference, Chapter 13, Pg. 484 Previous chapters have.
General Linear Model 2 Intro to ANOVA.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Estimating standard error using bootstrap
Inference for Regression
CH 5: Multivariate Methods
BIVARIATE REGRESSION AND CORRELATION
…Don’t be afraid of others, because they are bigger than you
CHAPTER 29: Multiple Regression*
Simultaneous Inferences and Other Regression Topics
Simple Linear Regression
Presentation transcript:

Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Topics Covered What are GEE? Relationship with robust standard errors Why they are not as complicated as they appear How does simulation answer (or not) the differences between different GEE approaches

Issues… My results are questionable (thanks to Richard…) Not shown in their entirety But – Agree with other studies Fixed cluster size is definitely correct

A simple example Consider simple uncorrelated linear regression, e.g. height on weight Minimize sum of squares

Simple example II Differentiate wrt each parameter and set = 0 In general if we have p covariates then minimizing ss is the same as solving p estimating equations

Extensions Non-linear regression (logistic) Weighting, based on the correlation of the results

Surprisingly – Not that bad For each cluster, D j is a 2 x m ij matrix

A is an m ij x m ij matrix with diagonal elements Independence – Identity matrix Exchangeable. 1s on the diagonal, rho everywhere else Unadjusted studies -

So what is D j T V j ? Independence – Control Independence - IV Exch Control Exch IV

Missing Out Some Algebra Independence. Estimate And estimate OR as Exch -

Simple Interpretation Independence gives equal weight to each observation Exchangeable gives weight proportional to the variance (measured by rho) No obvious working correlation matrix which gives equal weight to each cluster

Note on Simulation Used to make inference about methods behaviour when unclear as to theoretical properties Simulator has choice over – Parameters varied –Output measured These should answer relevant questions

Relevance for simulation studies Equal cluster sizes give the same point estimate Any potential benefits of one approach over the other in terms of precision (measured by MSE) cannot be found Simulation studies should always consider the variable cluster size case

Unadjusted studies What outcome (OR, RR, RD) are we interested in measuring? What weights do we use for each cluster? Does the estimating procedure e.g. confidence interval construction have the right size?

Estimating the Variance Done using robust standard errors F is a matrix which depends on V and D is estimated by Independence is identical to robust standard errors Criticism of GEE is also criticism of RSE

Problems and solutions is biased downwards for small samples (< 40 clusters) p-values too small We “know” what this bias is (function of D and V). Lets call it H We replace with Basically changing the filling of our sandwich

C.I Construction 1.Wald Test a)Independence b)Exchangeable c)Bias Corrected 2.Score Test (adjusted score test) Evaluate score equations at H 0 obtain a χ 2 statistic.

More on the score test Score test is conservative Using bias correction will make it worse Multiply χ 2 statistic by J / (J-1) CI construction is done using the bisection algorithm

Results! - Size (5% Nominal) 4-6 clusters15-20 clusters Naïve12% Ind11%9.5% Exch9%8% B.C.7.5%7% Adj. Score5.2%5%

Power H 0 is not true. Simulation studies tend to use beta- binomial distribution to simulate Common rho (?) If size is above nominal, power will e inflated as well. If they have the same size, does MSE have an effect?

Power results In general above nominal. Due to incorrect size Naïve > Ind > Exch > B.C = Score This result is expected and surprising at the same time. Score and B.C actually attain the nominal level Considered later

Adjusted studies Very few have been done ( 2.5) Beta – binomial distribution is not amenable to including covariates Cluster level covariate – same argument applies for the fixed / variable cluster size issue Results are identical

Why is the adjusted score powerful? 1.The score test is just better 2.Power is based on p-values, rather than C.Is. Containing 1. It is possible to have a p-value that is significant but the confidence interval contains 1 3.Score statistic not derived for all data sets due to model fitting

Fitting the models R – various libraries (gee, geese, geepack). No score test. Crashes STATA – xtgee – no score test SAS – Proc Genmod. Score test. No score test CI construction S-Plus – code from authors (allegedly)

Convergence Depends on number of clusters 15 – 20 clusters 100% convergence 10 clusters 99.7% convergence 4 – 6 clusters 99% convergence Score test – lose even more in SAS 15 – 20 clusters lose another 0.5% 4 – 6 clusters lose another 1%

Conclusions If you wish to use GEE then the adjusted score test is the (only?) appropriate way for a small number of clusters This is perhaps questionable The most complicated model to fit in terms of code.

What Should Simulation Do? Reflect what you’ll see in practice –Variable cluster size –Include individual level covariates (ideally imbalanced) Look not only at size but power (and coverage) Measure MSE for no IV cases Sensitivity to departures from assumptions

Number of Studies that do this 0 Mine does. Perhaps ‘luck’ rather than judgement Designed it 2 years ago Decided 2 months ago that it was actually quite good

‘Luck’ 1 supervisor, 2 advisors One advisor suggested MSE The other was adamant I did sensitivity analysis Richard obviously made outstanding contribution. Something of a consortium approach

Data sharing Given this – might be useful to have data files available online Use these for any further analysis methods that may become available Server space? Interactivity? Results?

Thank You