SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
SPH 247 Statistical Analysis of Laboratory Data April 2, 2010SPH 247 Statistical Analysis of Laboratory Data1.
5/11/ lecture 71 STATS 330: Lecture 7. 5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Generalized Linear Models (GLM)
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Regression Example Using Pop Quiz Data. Second Pop Quiz At my former school (Irvine), I gave a “pop quiz” to my econometrics students. The quiz consisted.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Interpreting Bi-variate OLS Regression
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
1 Multivariate Analysis and Discrimination EPP 245 Statistical Analysis of Laboratory Data.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
EXERCISE 5.5 The Stata output shows the result of a semilogarithmic regression of earnings on highest educational qualification obtained, work experience,
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Regression and Analysis Variance Linear Models in R.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 Estimating and Testing  2 0 (n-1)s 2 /  2 has a  2 distribution with n-1 degrees of freedom Like other parameters, can create CIs and hypothesis tests.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
The slope, explained variance, residuals
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
QM222 Class 15 Section D1 Review for test Multicollinearity
Eva Ørnbøl + Morten Frydenberg
EPP 245 Statistical Analysis of Laboratory Data
Presentation transcript:

SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

Cystic Fibrosis Data Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure. April 23, 2010SPH 247 Statistical Analysis of Laboratory Data2

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data3 cf <- read.csv("cystfibr.csv") pairs(cf) attach(cf) cf.lm <- lm(pemax ~ age+sex+height+weight+bmp+fev1+rv+frc+tlc) print(summary(cf.lm)) print(anova(cf.lm)) print(drop1(cf.lm,test="F")) plot(cf.lm) step(cf.lm) detach(cf)

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data4

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data5 > source("cystfibr.r") > cf.lm <- lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc) > print(summary(cf.lm)) … Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) age sex height weight bmp fev rv frc tlc Residual standard error: on 15 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 9 and 15 DF, p-value:

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data6 > print(anova(cf.lm)) Analysis of Variance Table Response: pemax Df Sum Sq Mean Sq F value Pr(>F) age ** sex height weight bmp fev rv frc tlc Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Performs sequential ANOVA

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data7 > print(drop1(cf.lm, test = "F")) Single term deletions Model: pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC F value Pr(F) age sex height weight bmp fev rv frc tlc Performs Type III ANOVA

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data8

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data9

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data10

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data11

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data12 > step(cf.lm) Start: AIC= pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC - sex tlc height age frc fev rv weight bmp Step: AIC=167.2 pemax ~ age + height + weight + bmp + fev1 + rv + frc + tlc ……………

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data13 Step: AIC= pemax ~ weight + bmp + fev1 + rv Df Sum of Sq RSS AIC rv bmp fev weight Call: lm(formula = pemax ~ weight + bmp + fev1 + rv) Coefficients: (Intercept) weight bmp fev1 rv

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data14 > cf.lm2 <- lm(pemax ~ rv+bmp+fev1+weight) > summary(cf.lm2) Call: lm(formula = pemax ~ rv + bmp + fev1 + weight) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) rv bmp * fev * weight *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 20 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 20 DF, p-value:

Cautionary Notes The significance levels are not necessarily believable after variable selection The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = After variable selection, F(3,21) = 9.28, p = , which is biased. April 23, 2010SPH 247 Statistical Analysis of Laboratory Data15

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data16 set obs 25 generate x1 = invnormal(uniform()) generate x2 = invnormal(uniform()) generate x3 = invnormal(uniform()) generate x4 = invnormal(uniform()) generate x5 = invnormal(uniform()) generate x6 = invnormal(uniform()) generate x7 = invnormal(uniform()) generate x8 = invnormal(uniform()) generate x9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data17. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 Source | SS df MS Number of obs = F( 9, 15) = 0.91 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | _cons |

April 23, 2010SPH 247 Statistical Analysis of Laboratory Data18. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full model p = >= removing x4 p = >= removing x6 p = >= removing x1 p = >= removing x7 p = >= removing x8 p = >= removing x3 p = >= removing x5 p = >= removing x9 Source | SS df MS Number of obs = F( 1, 23) = 7.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x2 | _cons |