5/11/2015330 lecture 71 STATS 330: Lecture 7. 5/11/2015330 lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
4/14/ lecture 81 STATS 330: Lecture 8. 4/14/ lecture 82 Collinearity Aims of today’s lecture: Explain the idea of collinearity and its connection.
Objectives 10.1 Simple linear regression
BA 275 Quantitative Business Methods
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
5/18/ lecture 101 STATS 330: Lecture 10. 5/18/ lecture 102 Diagnostics 2 Aim of today’s lecture  To describe some more remedies for non-planar.
Classical Regression III
Multiple Regression Predicting a response with multiple explanatory variables.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Lecture 23: Tues., Dec. 2 Today: Thursday:
Chapter 12 Simple Regression
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Simple Linear Regression Analysis
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Ch. 14: The Multiple Regression Model building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Simple Linear Regression Analysis
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Simple linear regression Linear regression with one predictor variable.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
CORRELATION & REGRESSION
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Lecture 4: Inference in SLR (continued) Diagnostic approaches in SLR BMTRY 701 Biostatistical Methods II.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Introduction to Linear Regression
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Using R for Marketing Research Dan Toomey 2/23/2015
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Linear Models Alan Lee Sample presentation for STATS 760.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Example x y We wish to check for a non zero correlation.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Lecture 11: Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and Simple Linear Regression
Correlation and regression
Correlation and Simple Linear Regression
Console Editeur : myProg.R 1
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Estimating the Variance of the Error Terms
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

5/11/ lecture 71 STATS 330: Lecture 7

5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to predict future values of the response, given the values of the covariates  Describe how to estimate the mean response, for particular values of the covariates

5/11/ lecture 73 Typical questions  Given the height and diameter, can we predict the volume of a cherry tree? E.g.if a particular tree has diameter 11 inches and height 85 feet, can we predict its volume?  What is the average volume of all cherry trees having a given height and diameter? E.g. if we consider all cherry trees with diameter 11 inches and height 85 feet, can we estimate their mean volume?

5/11/ lecture 74 The regression model (revisited)  The diagram shows the true plane, which describes the mean response for any x1, x2  The short “sticks” represent the “errors”  The y-observation is the sum of the mean and the error, Y= 

5/11/ lecture 75 Prediction: how to do it  Suppose we want to predict the response of an individual whose covariates are x 1, … x k. for the cherry tree example, k=2 and x1 (height) is 85, x2 (diameter) is 11  Since the response we want to predict is Y=  we predict Y by making separate predictions of  and  and adding the results together.

5/11/ lecture 76 Prediction: how to do it (2)  Since  is the value of the true regression plane at x 1,…x k, we predict  by the value of the fitted plane at x 1,…x k. This is  We have no information about  other than the fact that it is sampled from a normal distribution with zero mean. Thus, we predict  by zero.

5/11/ lecture 77 Prediction: how to do it (3) Thus, the value of the predictor is Thus, to predict the response y when the covariates are x 1,…x k, we use the fitted plane at x 1,…x k.as the predictor.

5/11/ lecture 78 Inner product form  Inner product form of predictor: Vector of regression coefficients Vector of predictor variables Inner product

5/11/ lecture 79 Prediction error  Prediction error is measured by the expected value of (actual-predictor) 2  Equal to Var (predictor) +  2 First part comes from predicting the mean, second part from predicting the error. Var(predictor) depends on the x’s and on    see formula on slide 12  Standard error of prediction is square root of Var (predictor) +  2  Prediction intervals approximately predictor +/- 2 standard errors, or more precisely….

5/11/ lecture 710 Prediction interval  A prediction interval is an interval which contains the actual value of the response with a given probability e.g  95% Interval is for x’  is

5/11/ lecture 711 Var(predictor) Arrange covariate data (used for fitting model) into a matrix X

5/11/ lecture 712 qt qt(0.975,28) Area = Area = 0.025

5/11/ lecture 713 Doing it in R  Suppose we want to predict the volume of 2 cherry trees. The first has diameter 11 inches and height 85 feet, the second has diameter 12 inches and height 90 feet.  First step is to make a data frame containing the new data. Names must be the same as in the original data.  Then we need to combine the results of the fit (regression coefs, estimate of error variance) with the new data (the predictor variables) to calculate the predictor

5/11/ lecture 714 Doing it in R (2) This only gives the predictions. What about prediction errors, prediction intervals etc? # calculate the fit from the original data cherry.lm =lm(volume~diameter+height,data=cherry.df) # make the new data frame new.df = data.frame(diameter=c(11,12), height=c(85,90)) # do the prediction predict(cherry.lm,new.df) New data (predictor variables) Results of fit

5/11/ lecture 715 predict(cherry.lm,new.df, se.fit=T,interval="prediction") $fit fit lwr upr $se.fit $df [1] 28 $residual.scale [1] Doing it in R (3) 95% Prediction intervals Sqrt of var(predictor) not standard error of prediction Residual mean square

5/11/ lecture 716 Hand calculation Prediction is SE(predictor) = sqrt(se.fit^2 + residual.scale^2) Prediction interval is: Prediction +/- SE(predictor) *qt(0.975,n-k-1) Hence SE(predictor) = sqrt( ^ ^2) = qt(0.975,n-k-1) = qt(0.975,28) = PI is \ * or ( , )

5/11/ lecture 717 Estimating the mean response  Suppose we want to estimate the mean response of all individuals whose covariates are x 1, … x k.  Since the mean we want to predict is the height of the true regression plane at x 1, … x k, we use as our estimate the height of the fitted plane at x 1, …, x k. This is

5/11/ lecture 718 Standard error of the estimate The standard error of the estimate is the square root of Var ( Predictor ) Note that this is less than the standard error of the prediction: Confidence intervals: predictor +/- 2 standard error of estimate * qt(0.975,n-k-1)

5/11/ lecture 719 Doing it in R  Suppose we want to estimate the mean volume of all cherry trees having diameter 11 inches and height 85 feet.  As before, the first step is to make a data frame containing the new data. Names must be the same as in the original data.

5/11/ lecture 720 > new.df<-data.frame(diameter=11,height=85) > predict(cherry.lm,new.df,se.fit=T, interval="confidence") $fit fit lwr upr [1,] $se.fit [1] $df [1] 28 $residual.scale [1] Doing it in R (2) Confidence interval Note: shorter than prediction interval

5/11/ lecture 721 Example: Hydrocarbon data When petrol is pumped into a tank, hydrocarbon vapours are forced into the atmosphere. To reduce this significant source of air pollution, devices are installed to capture the vapour. A laboratory experiment was conducted in which the amount of vapour given off was measured under carefully controlled conditions. In addition to the response, there were four variables which were thought relevant for prediction: t.temp: initial tank temperature ( degrees F) p.temp: temperature of the dispensed petrol ( degrees F) t.vp: initial vapour pressure in tank (psi) p.vp:vapour pressure of the dispensed petrol (psi) hc:emitted dispensed hydrocarbons (g) (response)

5/11/ lecture 722 The data > petrol.df t.temp p.temp t.vp p.vp hc … 125 observations in all

5/11/ lecture 723 Pairs plot

5/11/ lecture 724 Preliminary conclusion  All variables seem related to the response  p.vp and t.vp seem highly correlated  Quite strong correlations between some of the other variables  No obvious outliers

5/11/ lecture 725 Fit the “full” model >petrol.reg<-lm(hc~ t.temp + p.temp + t.vp + p.vp, data=petrol.df) > summary(petrol.reg) Call: lm(formula = hc ~ t.temp + p.temp + t.vp + p.vp, data = petrol.df) Residuals: Min 1Q Median 3Q Max Continued on next slide...

5/11/ lecture 726 Fit the “full” model (2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) t.temp p.temp e-05 *** t.vp ** p.vp e-09 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 120 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 120 DF, p-value: < 2.2e-16

5/11/ lecture 727 Conclusions  Large R 2  Significant coefficients, except for temp  Model seems satisfactory  Can move on to prediction Lets predict hydrocarbon emissions when t.temp=28, p.temp=30, t.vp=3 and p.vp=3.5

5/11/ lecture 728 Prediction > pred.df<-data.frame(t.temp=28, p.temp=30, t.vp=3, p.vp=3.5) > predict(petrol.reg,pred.df,interval="p" ) fit lwr upr [1,] Hydrocarbon emission is predicted to be between 20.2 and 31.9 grams.