Estimating the Variance of the Error Terms

Slides:



Advertisements
Similar presentations
BA 275 Quantitative Business Methods
Advertisements

Inference for Regression
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
EPI 809/Spring Probability Distribution of Random Error.
Objectives (BPS chapter 24)
Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Multiple Regression Predicting a response with multiple explanatory variables.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
The Simple Regression Model
ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found.
Chapter Topics Types of Regression Models
BCOR 1020 Business Statistics
Measures of Association Deepak Khazanchi Chapter 18.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Correlation Coefficients Pearson’s Product Moment Correlation Coefficient  interval or ratio data only What about ordinal data?
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Lecture 5 Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Models
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Using R for Marketing Research Dan Toomey 2/23/2015
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Applied Regression Analysis BUSI 6220
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Chapter 12 Simple Linear Regression and Correlation
CHAPTER 7 Linear Correlation & Regression Methods
Chapter 11: Simple Linear Regression
Regression Diagnostics
Statistics for Business and Economics (13e)
Regression model with multiple predictors
Checking Regression Model Assumptions
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Review of Chapter 3 where Multiple Linear Regression Model:
Checking Regression Model Assumptions
Chapter 12 Simple Linear Regression and Correlation
SAME THING?.
Simple Linear Regression
Correlation and Simple Linear Regression
Review of Chapter 2 Some Basic Concepts: Sample center
Obtaining the Regression Line in R
Simple Linear Regression
Simple Linear Regression
Statistics for Business and Economics
Inference about the Slope and Intercept
St. Edward’s University
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Estimating the Variance of the Error Terms The unbiased estimator for σε2 is sse=sum(result$residuals^2) # 16811.16 mse=sse/(80-2) # 215.5277 sigma.hat=sqrt(mse) # 14.68086 anova(result) Response: Weight Df Sum Sq Mean Sq F value Pr(>F) Waist 1 79284 79284 367.86 < 2.2e-16 *** Residuals 78 16811 216 Total 79 96095 summary(result) lm(formula = Weight ~ Waist) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -51.7279 11.1288 -4.648 1.34e-05 *** Waist 2.3947 0.1249 19.180 < 2e-16 *** Residual standard error: 14.68 on 78 degrees of freedom Multiple R-squared: 0.8251, Adjusted R-squared: 0.8228 F-statistic: 367.9 on 1 and 78 DF, p-value: < 2.2e-16 Pi(xi,yi) Y=β0+β1x yi * * * y * * * * * xi SSTO = SSE + SSR Since the p-vlaue is less than 0.05, we conclude the the regression model account for a significant amount of the variability in weight. R2 = SSR/SSTO

Things that affect the slope estimate Watch the regression podcast by Dr. Will posted on our course webpage. Three things that affect the slope estimate: Sample size (n). Variability of the error terms (σε2). Spread of the independent variable. summary(result) lm(formula = Weight ~ Waist) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -51.7279 11.1288 -4.648 1.34e-05 *** Waist 2.3947 0.1249 19.180 < 2e-16 *** Testing H0: β1 = 0 vs. H1: β1 ≠ 0. t.obs=(Beta1.hat-0)/SE.beta1 # 19.17971 p.value=2*(1-pt(19.18,df=78)) # virtually 0 The smaller σε is, the smaller the standard error of the slope estimate. MSE=anova(result)$Mean[2] # 215.5277 SE.beta1=sqrt(MSE/SSxx) # 0.1248554 SS=function(x,y){sum((x-mean(x))*(y-mean(y)))} SSxy=SS(Waist,Weight) # 33108.35 SSxx=SS(Waist,Waist) # 13825.73 SSyy=SS(Weight,Weight) # 96095.4 = SSTO Beta1.hat=SSxy/SSxx # 2.39469 As n increases, the standard error of the slope estimate decreases.

Confidence Intervals The (1-α)100% C.I. for β1: Hence, the 90% C.I. for β1 for our example is Lower=Beta1.hat – qt(0.95,df=78)*SE.beta1 # 2.186853 Upper=Beta1.hat + qt(0.95,df=78)*SE.beta1 # 2.602528 confint(result,level=.90) 5 % 95 % (Intercept) -70.253184 -33.202619 Waist 2.186853 2.602528 Estimating the mean response (μy) at a specified value of x: predict(result,newdata=data.frame(Waist=c(80,90))) 1 2 139.8473 163.7942 Confidence interval for the mean response (μy) at a specified value of x: predict(result,newdata=data.frame(Waist=c(80,90)),interval=”confidence”) fit lwr upr 1 139.8473 136.0014 143.6932 2 163.7942 160.4946 167.0938

Prediction Intervals Predicting the value of the response variable at a specified value of x: predict(result,newdata=data.frame(Waist=c(80,90))) 1 2 139.8473 163.7942 Prediction interval for the value of new response value (yn+1) at a specified value of x: predict(result,newdata=data.frame(Waist=c(80,90)),interval=”prediction”) fit lwr upr 1 139.8473 110.3680 169.3266 2 163.7942 134.3812 193.2072 predict(result,newdata=data.frame(Waist=c(80,90)),interval=”prediction”,level=.99) 1 139.8473 100.7507 178.9439 2 163.7942 124.7855 202.8029 Note that the only difference between the prediction interval and confidence interval for the mean response is the addition of 1 inside the square root. This makes the prediction intervals wider than the confidence intervals for the mean response.

Confidence and Prediction Bands Working-Hotelling (1-α)100% confidence band: result=lm(Weight~Waist) CI=predict(result,se.fit=TRUE) # se.fit=SE(mean) W=sqrt(2*qf(0.95,2,78)) # 2.495513 band.lower=CI$fit - W*CI$se.fit band.upper=CI$fit + W*CI$se.fit plot(Waist,Weight,xlab="Waist”,ylab="Weight”,main="Confidence Band") abline(result) points(sort(Waist),sort(band.lower),type="l",lwd=2,lty=2,col=”Blue") points(sort(Waist),sort(band.upper),type="l",lwd=2,lty=2,col=”Blue") The ((1-α)100% Prediction Band: mse=anova(result)$Mean[2] se.pred=sqrt(CI$se.fit^2+mse) band.lower.pred=CI$fit - W*se.pred band.upper.pred=CI$fit + W*se.pred points(sort(Waist),sort(band.lower.pred),type="l",lwd=2,lty=2,col="Red") points(sort(Waist),sort(band.upper.pred),type="l",lwd=2,lty=2,col="Red”) ,

Tests for Correlations Testing H0: ρ = 0 vs. H1: ρ ≠ 0. cor(Waist,Weight) # Computes the Pearson correlation coefficient, r 0.9083268 cor.test(Waist,Weight, conf.level=.99) # Tests Ho:rho=0 and also constructs C.I. for rho Pearson's product-moment correlation data: Waist and Weight t = 19.1797, df = 78, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 99 percent confidence interval: 0.8409277 0.9479759 Testing H0: ρ = 0 vs. H1: ρ ≠ 0 using the (Nonparametric) Spearman’s method. cor.test(Waist,Weight,method="spearman") # Test of independence using the Spearman's rank correlation rho # Spearman Rank correlation S = 8532, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.9 Note that the results are exactly the same as what we got when testing H0: β1 = 0 vs. H1: β1 ≠ 0.

Model Diagnostics Assessing uncorrelatedness of the error terms Model: Yi=(β0+β1xi) + εi where, εi’s are uncorrelated with a mean of 0 and constant variance σ2ε. εi’s are normally distributed. (This is needed in the test for the slope.) Assessing uncorrelatedness of the error terms plot(result$residuals,type='b') Assessing Normality qqnorm(result$residuals); qqline(result$residuals) shapiro.test(result$residuals) W = 0.9884, p-value = 0.6937 Assessing Constant Variance plot(result$fitted,result$residuals) levene.test(result$residuals,Waist) Test Statistic = 2.1156, p-value = 0.06764