Regression Several Explanatory Variables. Example: Scottish hill races data. These data are made available in R as data(hills, package=MASS) They give.

Slides:



Advertisements
Similar presentations
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Advertisements

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Practical Sheet 6 Solutions Practical Sheet 6 Solutions The R data frame “whiteside” which deals with gas consumption is made available in R by > data(whiteside,
FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Statistics for Managers Using Microsoft® Excel 5th Edition
Regression Diagnostics Checking Assumptions and Data.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
4.3 Diagnostic Checks VO Verallgemeinerte lineare Regressionsmodelle.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
WARM-UP Do the work on the slip of paper (handout)
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Chapter 8: Simple Linear Regression Yang Zhenlin.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Residuals.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Example x y We wish to check for a non zero correlation.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Stats Methods at IC Lecture 3: Regression.
Correlation and Linear Regression
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 2 Linear regression.
The simple linear regression model and parameter estimation
Regression and Correlation
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
Statistical Data Analysis - Lecture /04/03
Lecture 18 Outline: 1. Role of Variables in a Regression Equation
Chapter 2 Looking at Data— Relationships
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 26: Inference for Regression
AP Stats: 3.3 Least-Squares Regression Line
Two Quantitative Variables: Linear Regression
No notecard for this quiz!!
Least-Squares Regression
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Correlation and Regression
Determine the type of correlation between the variables.
Least-Squares Regression
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Ch 4.1 & 4.2 Two dimensions concept
Correlation/regression using averages
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3 Vocabulary Linear Regression.
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
Correlation/regression using averages
Presentation transcript:

Regression Several Explanatory Variables

Example: Scottish hill races data. These data are made available in R as data(hills, package=MASS) They give record times (minutes) in 1984 of 35 Scottish hill races, against distance (miles) and total height climbed (feet). We regard time as the response variable, and seek to model how its conditional distribution depends on the explanatory variables distance and climb.

The R code pairs(hills) produces the plots shown.

These show that the response variable time has a strong positive association with each of the explanatory variables distance and climb - although a stronger dependence on distance. However, the two explanatory variables distance and climb also have a strong positive association with each other, and this complicates the modelling.

Preliminary analysis of the data suggests that the observation (number 18) corresponding to Knock Hill is almost certainly in error -the time is much too great for the given distance and climb, and it may have been misrecorded by 1 hour. We therefore omit Knock Hill from the analysis. (use plot and identify commands)

On physical grounds we attempt to find a model with zero intercept. We consider first a linear model (Model 1) involving both the explanatory variables distance and time. time = a x distance + b x climb + ε

The fitted model is time = 5.47 x dist x climb + ε

The analysis, in particular the t or p- values associated with the estimates of the coefficients, shows that distance and climb are both important explanatory variables. (This can be confirmed by noting the very much poorer fits obtained if either of these variables is omitted.)

> plot(hills.model.1) produces

The Cook's distance plot shows that the observations for Bens of Jura and Lairig Ghru both have a very large influence on the fit. These observations have the largest values of climb and distance respectively. We are led to suspect that there may be some nonlinear dependence on climb and/or distance. This would be physically quite natural. It here seems reasonable to introduce quadratic terms as a first attempt to model any nonlinearity.

We consider now the (quite elaborate) model (Model2): time = a 0 x distance + b 0 x (distance) 2 + c 0 x climb + d 0 x(climb) 2 + ε

The fitted model is now: time=5.62xdistance x(distance) xclimb x(climb) 2 +ε The analysis, most notably the t or p- values associated with the estimate of the coefficient of (climb) 2, shows that there is indeed evidence of nonlinearity in the dependence on climb, and (given also physical considerations) quite possibly in the dependence on distance.

Finally, the residuals of model 1 can be plotted against those of model 2.

This suggests that Model 2 is is considerable improvement, at least insofar as it reduces the large residuals associated with the 3 labelled observations. The observations corresponding to Bens of Jura and Lairig Ghru remains moderately influential.