9/14/2015330 Lecture 61 STATS 330: Lecture 6. 9/14/2015330 Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
5/11/ lecture 71 STATS 330: Lecture 7. 5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.
5/16/ lecture 141 STATS 330: Lecture 14. 5/16/ lecture 142 Variable selection Aim of today’s lecture  To describe some techniques for selecting.
WARM – UP Is the height (in inches) of a man related to his I.Q.? The regression analysis from a sample of 26 men is shown. (Assume the assumptions for.
Objectives (BPS chapter 24)
5/18/ lecture 101 STATS 330: Lecture 10. 5/18/ lecture 102 Diagnostics 2 Aim of today’s lecture  To describe some more remedies for non-planar.
Generalized Linear Models (GLM)
Multiple Regression Predicting a response with multiple explanatory variables.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Chapter 12 Simple Regression
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 23 Multiple Regression (Sections )
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
BCOR 1020 Business Statistics
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Regression Analysis (2)
© Department of Statistics 2012 STATS 330 Lecture 18 Slide 1 Stats 330: Lecture 18.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
© Department of Statistics 2012 STATS 330 Lecture 19: Slide 1 Stats 330: Lecture 19.
12/22/ lecture 171 STATS 330: Lecture /22/ lecture 172 Factors  In the models discussed so far, all explanatory variables have been.
Linear Models Alan Lee Sample presentation for STATS 760.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Lecture 11: Simple Linear Regression
CHAPTER 29: Multiple Regression*
Model Comparison: some basic concepts
Estimating the Variance of the Error Terms
Presentation transcript:

9/14/ Lecture 61 STATS 330: Lecture 6

9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess the significance of variables in the regression Key concepts: Standard errors Confidence intervals for the coefficients Tests of significance Reference: Coursebook Section 3.2

9/14/ Lecture 63 Variability of the regression coefficients  Imagine that we keep the x’s fixed, but resample the errors and refit the plane. How much would the plane (estimated coefficients) change?  This gives us an idea of the variability (accuracy) of the estimated coefficients as estimates of the coefficients of the true regression plane.

9/14/ Lecture 64 The regression model (cont)  The data is scattered above and below the plane:  Size of “sticks” is random, controlled by  2, doesn’t depend on x 1, x 2

9/14/ Lecture 65 Variability of coefficients (2)  Variability depends on The arrangement of the x’s (the more correlation, the more change, see Lecture 8) The error variance (the more scatter about the true plane, the more the fitted plane changes)  Measure variability by the standard error of the coefficients

9/14/ Lecture 66 Call: lm(formula = volume ~ diameter + height, data = cherry.df) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-07 *** diameter < 2e-16 *** height * --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 28 degrees of freedom Multiple R-Squared: 0.948, Adjusted R-squared: F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16 Standard errors of coefficients Cherries

9/14/ Lecture 67 Confidence interval

9/14/ Lecture 68 Confidence interval (2) A 95% confidence interval for a regression coefficient is of the form Estimated coefficient +/- standard error  t where t is the 97.5% point of the appropriate t-distribution. The degrees of freedom are n-k-1 where n=number of cases (observations) in the regression, and k is the number of variables (assuming we have a constant term)

9/14/ Lecture 69 Example: cherry trees Use function confint > confint(cherry.lm) 2.5% 97.5% (Intercept) diameter height Object created by lm

9/14/ Lecture 610 Hypothesis test  Often we ask “do we need a particular variable, given the others are in the model?”  Note that this is not the same as asking “is a particular variable related to the response?”  Can test the former by examining the ratio of the coefficient to its standard error

9/14/ Lecture 611 Hypothesis test (2)  This is the t-statistic t  The bigger t, the more we need the variable  Equivalently, the smaller the p-value, the more we need the variable

9/14/ Lecture 612 Call: lm(formula = volume ~ diameter + height, data = cherry.df) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-07 *** diameter < 2e-16 *** height * --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 28 degrees of freedom Multiple R-Squared: 0.948, Adjusted R-squared: F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16 t-values p-values Cherries All variables required since p=values small (<0.05)

P-value 9/14/ Lecture 613 P-value: total area is Density curve for t with 28 degrees of freedom

9/14/ Lecture 614 Other hypotheses  Overall significance of the regression: do none of the variables have a relationship with the response?  Use the F statistic: the bigger F, the more evidence that at least one variable has a relationship equivalently, the smaller the p-value, the more evidence that at least one variable has a relationship

9/14/ Lecture 615 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-07 *** diameter < 2e-16 *** height * --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 28 degrees of freedom Multiple R-Squared: 0.948, Adjusted R-squared: F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16 F-value p-value Cherries

9/14/ Lecture 616 Testing if a subset is required  Often we want to test if a subset of variables is unnecessary  Terminology: Full model: model with all the variables Sub-model: model with a set of variables deleted.  Test is based on comparing the RSS of the submodel with the RSS of the full model. Full model RSS is always smaller (why?)

9/14/ Lecture 617 Testing if a subset is adequate (2)  If the full model RSS is not much smaller than the submodel RSS, the submodel is adequate: we don’t need the extra variables.  To do the test, we Fit both models, get RSS for both. Calculate test statistic (see next slide) If the test statistic is large, (equivalently the p- value is small) the submodel is not adequate

9/14/ Lecture 618 Test statistic  Test statistic is  d is the number of variables dropped  s 2 is the estimate of  2 from the full model (the residual mean square)  R has a function anova to do the calculations

9/14/ Lecture 619 P-values  When the smaller model is correct, the test statistic has an F-distribution with d and n- k-1 degrees of freedom  We assess if the value of F calculated from the sample is a plausible value from this distribution by means of a p-value  If the p-value is too small, we reject the hypothesis that the submodel is ok

9/14/ Lecture 620 P-values (cont) Value of F P-value

9/14/ Lecture 621 Example  Free fatty acid data: use physical measures to model a biochemical parameter in overweight children  Variables are FFA: free fatty acid level in blood (response) Age (months) Weight (pounds) Skinfold thickness (inches)

9/14/ Lecture 622 Data ffa age weight skinfold … 20 observations in all

9/14/ Lecture 623 Analysis (1) This suggests that  age is not required if weight, skinfold retained,  skinfold is not required if weight, age retained Can we get away with just weight? > model.full<- lm(ffa~age+weight+skinfold,data=fatty.df) > summary(model.full) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * age weight ** skinfold

9/14/ Lecture 624 Analysis (2) > model.sub<-lm(ffa~weight,data=fatty.df) > anova(model.sub,model.full) Analysis of Variance Table Model 1: ffa ~ weight Model 2: ffa ~ age + weight + skinfold Res.Df RSS Df Sum of Sq F Pr(>F) Small F, large p-value suggest weight alone is adequate. But test should be interpreted with caution, as we “pretested”

Testing a combination of coefficients  Cherry trees: Our model is V = c D    H  or log(V) =   +   log(D) +   log(H)  Dimension analysis suggests   +      How can we test this?  Test statistic is  P value is area under t-curve beyond +/- t 9/14/ Lecture 625

Testing a combination (cont)  We can use the “R330” function test.lc to compute the value of t: 9/14/ Lecture 626 > cherry.lm = lm(log(volume)~log(diameter)+log(height),data=cherry.df) > cc = c(0,1,1) > c = 3 > test.lc(cherry.lm,cc,c) $est [1] $std.err [1] $t.stat [1] $df [1] 28 $p.val [1]

The “R330 package”  A set of functions written for the course, in the form of an R package  Install the package using the R packages menu (see coursebook for details). Then type library(R330) 9/14/ Lecture 627

Testing a combination (cont)  In general, we might want to test c    + c    + c     c (in our example c  = 0, c  =1, c  =1, c = 3)  Estimate is  Test statistic is 9/14/ Lecture 628