7/2/2015330 Lecture 51 STATS 330: Lecture 5. 7/2/2015330 Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Managerial Economics in a Global Economy
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
4/14/ lecture 81 STATS 330: Lecture 8. 4/14/ lecture 82 Collinearity Aims of today’s lecture: Explain the idea of collinearity and its connection.
Probability & Statistical Inference Lecture 9
5/11/ lecture 71 STATS 330: Lecture 7. 5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.
5/13/ lecture 91 STATS 330: Lecture 9. 5/13/ lecture 92 Diagnostics Aim of today’s lecture: To give you an overview of the modelling cycle,
5/18/ lecture 101 STATS 330: Lecture 10. 5/18/ lecture 102 Diagnostics 2 Aim of today’s lecture  To describe some more remedies for non-planar.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple Regression Predicting a response with multiple explanatory variables.
Lecture 3 Cameron Kaplan
Statistics for the Social Sciences
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Ch. 14: The Multiple Regression Model building
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Lecture 5 Correlation and Regression
Correlation and Regression
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Regression with 2 IVs Generalization of Regression from 1 to 2 Independent Variables.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Simple Linear Regression ANOVA for regression (10.2)
Aim: Review for Exam Tomorrow. Independent VS. Dependent Variable Response Variables (DV) measures an outcome of a study Explanatory Variables (IV) explains.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 10: Correlation and Regression Model.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Correlation & Regression Analysis
Linear Models Alan Lee Sample presentation for STATS 760.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
The simple linear regression model and parameter estimation
Lecture 11: Simple Linear Regression
Inference for Least Squares Lines
Correlation and regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Chapter 14 Inference for Regression
Algebra Review The equation of a straight line y = mx + b
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

7/2/ Lecture 51 STATS 330: Lecture 5

7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab, Building 303 South (303S B75) Tutorial 1: Wednesday Tutorial 2: Friday Tutorial 2: Friday 2-3  Start this Thursday

7/2/ Lecture 53 My office hours  Tuesday 10:30 – 12:00  Thursday 10:30 – 12:00  Room 265, Building 303S Come up to the second floor in the Computer Science lifts, through the doors and turn right

Tutor office hours  Tuesday 1-4 pm  Thursday 12-2, 3-4 pm  Friday 1-3 pm  All in Room  (First floor, Building 303) 7/2/ Lecture 54

Next Week  Monday, Thursday lectures by Arden Miller  No lecture Tuesday  Don’t forget assignment due Aug 4 7/2/ Lecture 55

7/2/ Lecture 56 Today’s lecture: The multiple regression model Aim of the lecture:  To describe the type of data suitable for multiple regression  To review the multiple regression model  To describe some plots that check the suitability of the model

7/2/ Lecture 57 Suitable data Suppose our data frame contains A numeric “response” variable Y One or more numeric “explanatory” variables X 1,…X k (also called covariates, independent variables) and we want to “explain” (model, predict) Y in terms of the explanatory variables, e.g. explain volume of cherry trees in terms of height and diameter explain NOx in terms of C and E

7/2/ Lecture 58 The regression model The model assumes  The responses are normally distributed with means  (each response has a different mean) and constant variance  2  The mean response  of a typical observation depends on the covariates through a linear relationship      x   k x k  The responses are independent

7/2/ Lecture 59 The regression plane The relationship      x   k x k can be visualized as a plane. The plane is determined by the coefficients     ,  k e.g. for k=2 (k=number of covariates)

7/2/ Lecture 510 Slope =   slope=  2 00

7/2/ Lecture 511 The regression model (cont)  The data is scattered above and below the plane:  Size of “sticks” is random, controlled by  2, doesn’t depend on x 1, x 2

7/2/ Lecture 512 Alternative form of model Y        x   k x k  where  is normally distributed with mean zero and variance  2

7/2/ Lecture 513 Checking the data  How can we tell when this model is suitable?  Suitable = data randomly scattered about plane, “planar” for short  k=1, draw scatterplot of y vs x  k=2, use spinner, look for “edge”  k  2 use coplots: if data are planar the plots should be parallel: coplot(y~x1|x2*x3)

7/2/ Lecture 514

7/2/ Lecture 515 Interpretation of coefficients  The regression coefficient  0 gives the average response when all covariates are zero  The regression coefficient  1 gives the slope of the plane in the x 1 direction measures the average increase in the response for a unit increase in x 1 when all other covariates are held constant Is the slope of plots of y versus x1 in a coplot Is not necessarily the slope of y versus x in a scatter plot

7/2/ Lecture 516 Example Consider data from model y = 5 - 1*x1 + 4*x2 + N(0,1) x1 and x2 highly correlated, r = 0.95 Regression of y on x 1 alone has slope 2.767:

7/2/ Lecture 517 Scatter plot: y vs x1 Slope 2.767

7/2/ Lecture 518 Coplot Slopes are all about - 1

Points to note  The regression coefficients  1 and  2 refer to the conditional mean of the response, given the covariates (in this case conditional on x1 and x2)   1 is the slope in the coplot, conditioning on x2  The coefficient in the plot of y versus x1 refers to a different conditional distribution (conditional on x1 only)  The same if and only if x1 and x2 uncorrelated 7/2/ Lecture 519

7/2/ Lecture 520 Estimation of the coefficients  We estimate the (unknown) regression plane by the “least squares plane” (best fitting plane)  Best fitting plane = plane that minimizes the sum of squared vertical deviations from the plane  That is, minimize the least squares criterion

7/2/ Lecture 521 Estimation of the coefficients (2)  The R command lm calculates the coefficients of the best fitting plane  This function solves the normal equations, a set of linear equations derived by differentiating the least squares criterion with respect to the coefficients

7/2/ Lecture 522 Math Stuff (only if you like it) Arrange data on response and covariates into a vector y and a matrix X

7/2/ Lecture 523 Math Stuff: Normal equations Estimates are solutions of

7/2/ Lecture 524 Calculation of best fitting plane > lm(y~x1+x2) Call: lm(formula = y ~ x1 + x2) Coefficients: (Intercept) x1 x

7/2/ Lecture 525 How well does the plane fit?  Judge this by examining the residuals and fitted values: Each observation has a fitted value: Y i : response for observation i, x i1, x i2 : values of explanatory variables for observation i Fitted plane is Fitted value for observation i is (the height of the fitted plane at (x i1,x i2 )

7/2/ Lecture 526 How well does the plane fit? (cont)  The residual is the difference between the response and the fitted value

7/2/ Lecture 527 How well does the plane fit (cont)  For a good fit, residuals must be Small relative to the y’s Have no pattern (don’t depend on the fitted values, x’s etc)  For the model to be useful, we should have a strong relationship between the response and the explanatory variables

7/2/ Lecture 528 Measuring goodness of fit  How can we measure the relative size of residuals and the strength of the relationship between y and the x’s?  The “anova identity” is a way of explaining this

7/2/ Lecture 529 Measuring goodness of fit (cont)  Consider the “residual sum of squares”  Provides an overall measure of the size of the residuals

7/2/ Lecture 530 Measuring goodness of fit (cont)  Consider the “total sum of squares”  Provides an overall measure of the variability of the data

7/2/ Lecture 531 Math stuff: ANOVA identity  Math result: Anova Identity TSS = RegSS + RSS Since RegSS is non-negative, RSS must be less than TSS so RSS/TSS is between 0 and 1. RegSS: Regression sum of squares

7/2/ Lecture 532 Interpretation  The smaller RSS compared to TSS, the better the fit.  If RSS is 0, and TSS is positive, we have a perfect fit  If RegSS=0, RSS=TSS and the plane is flat (all coefficients except the constant are zero), so the x’s don’t help predict y

7/2/ Lecture 533 R2R2  Another way to express this is the coefficient of determination R 2  This is defined as 1-RSS/TSS = RegSS/TSS  R 2 is always between 0 and 1 1 means a perfect fit 0 means a flat plane  R 2 is the square of the correlation between the observations and the fitted values

7/2/ Lecture 534 Estimate of    Recall that   controls the scatter of the observations about the regression plane: the bigger  , the more scatter the bigger  , the smaller R 2    estimated by s 2 =RSS/(n-k-1)

7/2/ Lecture 535 Calculations cherry.lm <- lm(volume~diameter+height, data=cherry.df)) summary(cherry.lm)  The lm function produces an “lm object” that contains all the information from fitting the regression  lm = “linear model” “ell” not I !!!!

7/2/ Lecture 536 Call: lm(formula = volume ~ diameter + height, data = cherry.df) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-07 *** diameter < 2e-16 *** height * --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 28 degrees of freedom Multiple R-Squared: 0.948, Adjusted R-squared: F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16 Info on residuals coefficients R2R2 Estimate of  Cherries