1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.7 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
From Anova to Regression: analyzing the effect on consumption of no. of persons in family Family consumption data family.dta E/Albert/Courses/cdas/appstat00/From.
More on Regression Spring The Linear Relationship between African American Population & Black Legislators.
Heteroskedasticity The Problem:
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Multiple Regression Spring Gore Likeability Example Suppose: –Gore’s* likeability is a function of Clinton’s likeability and not directly.
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Regression Example Using Pop Quiz Data. Second Pop Quiz At my former school (Irvine), I gave a “pop quiz” to my econometrics students. The quiz consisted.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Addressing Alternative Explanations: Multiple Regression Spring 2007.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Regression Forced March Spring Regression quantifies how one variable can be described in terms of another.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
1 Analysis of Variance (ANOVA) EPP 245 Statistical Analysis of Laboratory Data.
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
1 Multivariate Analysis and Discrimination EPP 245 Statistical Analysis of Laboratory Data.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
EXERCISE 5.5 The Stata output shows the result of a semilogarithmic regression of earnings on highest educational qualification obtained, work experience,
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
Wiener Institut für Internationale Wirtschaftsvergleiche The Vienna Institute for International Economic Studies Structural change, productivity.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
QM222 Class 19 Section D1 Tips on your Project
From t-test to multilevel analyses Del-2
The slope, explained variance, residuals
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Eva Ørnbøl + Morten Frydenberg
EPP 245 Statistical Analysis of Laboratory Data
Presentation transcript:

1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 2 Cystic Fibrosis Data Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure.

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 3 Some Stata Commands. insheet using "C:\TD\CLASS\K30Bench2005\cystfibr.csv" (11 vars, 25 obs). graph matrix age sex height weight bmp fev1 rv frc tlc pemax. graph export cystfibr-scm.wmf. regress pemax age sex height weight bmp fev1 rv frc tlc. rvfplot. graph export cystfibr-rvf.wmf

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 4

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 5 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons |

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 6 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons | T-test of additional value of variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 7 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons | Test of whole model

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 8

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 9 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons | Least significant variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 10. regress pemax age height weight bmp fev1 rv frc tlc Source | SS df MS Number of obs = F( 8, 16) = 3.49 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | height | weight | bmp | fev1 | rv | frc | tlc | _cons | Least significant variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 11. regress pemax age height weight bmp fev1 rv frc Source | SS df MS Number of obs = F( 7, 17) = 4.16 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | height | weight | bmp | fev1 | rv | frc | _cons | Least significant variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 12. regress pemax age height weight bmp fev1 rv Source | SS df MS Number of obs = F( 6, 18) = 5.04 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | height | weight | bmp | fev1 | rv | _cons | Least significant variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 13. regress pemax height weight bmp fev1 rv Source | SS df MS Number of obs = F( 5, 19) = 6.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] height | weight | bmp | fev1 | rv | _cons | Least significant variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 14. regress pemax weight bmp fev1 rv Source | SS df MS Number of obs = F( 4, 20) = 7.96 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] weight | bmp | fev1 | rv | _cons | Least significant variable

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 15. regress pemax weight bmp fev1 Source | SS df MS Number of obs = F( 3, 21) = 9.28 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] weight | bmp | fev1 | _cons |

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 16. stepwise, pr(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full model p = >= removing sex p = >= removing tlc p = >= removing frc p = >= removing age p = >= removing height p = >= removing rv Source | SS df MS Number of obs = F( 3, 21) = 9.28 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] fev1 | weight | bmp | _cons |

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 17. stepwise, pr(.1) pe(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full model p = >= removing sex p = >= removing tlc p = >= removing frc p = >= removing age p = >= removing height p = >= removing rv Source | SS df MS Number of obs = F( 3, 21) = 9.28 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] fev1 | weight | bmp | _cons |

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 18 Cautionary Notes The significance levels are not necessarily believable after variable selection The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = After variable selection, F(3,21) = 9.28, p = , which is biased.

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 19 set obs 25 generate x1 = invnormal(uniform()) generate x2 = invnormal(uniform()) generate x3 = invnormal(uniform()) generate x4 = invnormal(uniform()) generate x5 = invnormal(uniform()) generate x6 = invnormal(uniform()) generate x7 = invnormal(uniform()) generate x8 = invnormal(uniform()) generate x9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 20. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 Source | SS df MS Number of obs = F( 9, 15) = 0.91 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | _cons |

October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 21. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full model p = >= removing x4 p = >= removing x6 p = >= removing x1 p = >= removing x7 p = >= removing x8 p = >= removing x3 p = >= removing x5 p = >= removing x9 Source | SS df MS Number of obs = F( 1, 23) = 7.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x2 | _cons |