From t-test to multilevel analyses Del-2

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.7 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Heteroskedasticity The Problem:
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
EC220 - Introduction to econometrics (chapter 7)
Lecture 6: Repeated Measures Analyses Elizabeth Garrett Child Psychiatry Research Methods Lecture Series.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Multilevel Models 2 Sociology 8811, Class 24
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Simple Linear Regression Analysis
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Returning to Consumption
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
From t-test to multilevel analyses Del-3
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
Regression Analysis AGEC 784.
From t-test to … multilevel analyses
From t-test to multilevel analyses (Linear regression, GLM, …)
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
Regression model with multiple predictors
QM222 Your regressions and the test
Auto Accidents: What’s responsible?
Simple Linear Regression
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
Simple Linear Regression
Statistics for Business and Economics
Common Statistical Analyses Theory behind them
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

From t-test to multilevel analyses Del-2 Stein Atle Lie, statistician, professor Uni Health, Uni Research http://folk.uib.no/msesl/GLM

Outline Pared t-test (Mean and standard deviation) Two-group t-test (Mean and standard deviations) Linear regression GLM (general linear models) GLMM (general linear mixed model) … PASW (former SPSS), Stata, R, gllamm (Stata)

Pared t-test The straightforward way to analyze two repeated measures is a pared t-test. Measure at time1 or location1 (e.g. Data1) is directly compared to measure at time2 or location2 (e.g. Data2) Is the difference between Data1 and Data2 (Diff = Data1-Data2) unlike 0?

Pared t-test The pared t-test will only be performed for complete (balanced) data. What happens if we delete two observations from data2? (Only 8 complete pairs remain)

Two group t-test If we now consider the data from time1 and time2 (or location1 and location2) to be independent (even if their not) and use a two group t-test on the full dataset, 2*10 observations

ID Grp (or time) Dummy Data 1007 1 6,1 1008 4,6 1009 5 1010 4,8 1011 7,9 1012 13,8 1013 1,4 1014 9,5 1015 12,2 1016 8,5 2 8,3 9,8 15,6 9,2 10,4 3,3 8,4 16,1

Two group t-test (n=20 [10+10]) PASW: T-TEST GROUPS=Grp(1 2) /VARIABLES=Data.

Two group t-test Observe that mean for Grp1 and Grp2 is equal to mean for Data1 and Data2 (from the pared t-test) And that the mean difference is also equal! The difference between pared t-test and two group t-test lies in the Variance - and the number of observations and therefore in the standard deviation and standard error and hence in the p-value and confidence intervals

Two group t-test (s1=s2) s1 s2 m1 m2 D

Two group t-test (s1=s2) s1 s2

ANOVA (Analysis of variance (s1=s2=s3) m1 m2 m3

ANOVA (Analysis of variance (s1=s2=s3)

Linear regression If we now perform an ordinary linear regression with the data as outcome (dependent variable) and the group (time) variable (Grp=1 and 2) as independent variable

Linear regression (n=20) Stata: . regress data grp Source | SS df MS Number of obs = 20 -------------+------------------------------ F( 1, 18) = 1.38 Model | 21.0124998 1 21.0124998 Prob > F = 0.2554 Residual | 274.01701 18 15.2231672 R-squared = 0.0712 -------------+------------------------------ Adj R-squared = 0.0196 Total | 295.02951 19 15.5278689 Root MSE = 3.9017 ------------------------------------------------------------------------------ data | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 2.05 1.744888 1.17 0.255 -1.615873 5.715873 _cons | 5.33 2.75891 1.93 0.069 -.4662545 11.12625

Linear regression The coefficient for group is identical to the mean difference and the standard error, t-statistic, and p‑value are identical to those found in a two‑group t‑test

Linear regression Now exchange the independent variable for group (Grp=1 and 2) with a dummy variable (dummy=0 for grp=1 and dummy=1 for grp=2) the coefficient for the dummy is equal to the coefficient for grp (the mean difference) and the coefficient for the constant term is equal to the mean for grp1 (the standard error is not!)

Linear regression (n=20) Stata: . regress data dummy Source | SS df MS Number of obs = 20 -------------+------------------------------ F( 1, 18) = 1.38 Model | 21.0124998 1 21.0124998 Prob > F = 0.2554 Residual | 274.01701 18 15.2231672 R-squared = 0.0712 -------------+------------------------------ Adj R-squared = 0.0196 Total | 295.02951 19 15.5278689 Root MSE = 3.9017 ------------------------------------------------------------------------------ data | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- dummy | 2.05 1.744888 1.17 0.255 -1.615873 5.715873 _cons | 7.38 1.233822 5.98 0.000 4.787836 9.972164

Linear models in Stata In ordinary linear models (regress and glm) in Stata one may add an option for clustered data – to obtain robust standard errors adjusted for intragroup correlation. (This is ideal when you want to adjust for clustered data, but are not interested in the correlation within or between groups)

Linear regression (n=20) Stata: . regress data dummy, cluster(id) Linear regression Number of obs = 20 F( 1, 9) = 2.64 Prob > F = 0.1388 R-squared = 0.0712 Root MSE = 3.9017 (Std. Err. adjusted for 10 clusters in id) ------------------------------------------------------------------------------ | Robust data | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- dummy | 2.05 1.262145 1.62 0.139 -.8051699 4.90517 _cons | 7.38 1.224847 6.03 0.000 4.609204 10.1508

Linear models in Stata Thus, we now have an alternative to the pared t‑test. The mean difference is identical to that obtained from the pared t‑test, and the standard errors (and p-values) are adjusted for intragroup correlation As an alternative we may use the program gllamm (Generalized Linear Latent And Mixed Models) in Stata http://www.gllamm.org/

gllamm (n=20) gllamm (Stata): . gllamm data dummy, i(id) number of level 1 units = 20 number of level 2 units = 10 ------------------------------------------------------------------------------ data | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dummy | 2.05 1.167852 1.76 0.079 -.2389486 4.338949 _cons | 7.379808 1.172819 6.29 0.000 5.081124 9.678492 Variance at level 1 6.8193955 (3.0174853) level 2 (id) var(1): 6.8114516 (4.5613185)

Linear models in Stata If we now delete two of the observations in Grp2 We then have coefficients (“mean differences”) calculated based on all (n=18) data and standard errors corrected for intragroup correlation - using the commands <regress>, <glm> or <gllamm>

Linear regression (n=18) Stata: . regress data dummy, cluster(id) Linear regression Number of obs = 18 F( 1, 9) = 1.63 Prob > F = 0.2332 R-squared = 0.0587 Root MSE = 4.1303 (Std. Err. adjusted for 10 clusters in id) ------------------------------------------------------------------------------ | Robust data | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- dummy | 1.9575 1.531486 1.28 0.233 -1.506963 5.421963 _cons | 7.38 1.228869 6.01 0.000 4.600105 10.1599

gllamm (n=18) gllamm (Stata): . gllamm data dummy, i(id) number of level 1 units = 18 number of level 2 units = 10 log likelihood = -48.538837 ------------------------------------------------------------------------------ data | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dummy | 2.458305 1.253552 1.96 0.050 .0013882 4.915223 _cons | 7.357426 1.232548 5.97 0.000 4.941677 9.773176 Variance at level 1 6.4041537 (3.3485133) level 2 (id) var(1): 8.7561818 (5.1671805)

Intra class correlation (ICC) Variance at level 1 6.4041537 (3.3485133) level 2 (id) var(1): 8.7561818 (5.1671805) The total variance is hence 6.4041 + 8.7561= 15.1603 (and the standard deviation is hence 3.8936) The proportion of variance attributed to level 2 is therefore ICC = 8.7561/15.1603 = 0.578

Linear regression Ordinary linear regression Assumes data is Normal and i.i.d. (identical independent distributed)

Linear regression Y X Regression line: y = b0 + b1·x b1 b0 residual b1 (x1,y1) (xn,yn) (xi,yi) b0 Kortisol * Months Height * Weight Kortisol * Time X

Linear regression Assumptions: 1) y1, y2,…, yn are independent normal distributed 2) The expectation of Yi is: E(Yi) = b0 + b1·xi (linear relation between X and Y) 3) The variance of Yi is: var(Yi) = s2 (equal variance for ALL values of X)

Linear regression Assumptions - Residualer (ei): yi = a + b·xi + ei 1) e1, e2,…, en are independent normal distributed 2) The expectation of ei is: E(ei) = 0 3) The variance of ei is: var(Yi) = s2

Ordinary linear regression The formula for an ordinary regression can thus be expressed as: yi = b0 + b1·xi + ei ei ~N(0, se2)

Random intercept model Y Regression lines: yij = b0 + b1·xij+vij (x11,y11) b1 (xnp,ynp) b0+uj (xij,yij) su se X

Random intercept model For a random intercept model, we can express the regression line(s) - and the variance components as yij = b0 + b1·xij + vij vij = uj + eij eij ~N(0, se2) (individual) uj ~N(0, su2) (group)

Random intercept model Alternatively we may express the formulas, for the simple variance component model, in terms of random intercepts: yij = b0j + b1·xij + eij b0j = b0 + uj eij ~N(0, se2) (individual) uj ~N(0, su2) (group)