Multiple Regression Predicting a response with multiple explanatory variables.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

BA 275 Quantitative Business Methods
Multiple Regression. Objectives Explanation The most direct interpretation of the regression variate is a determination of the relative importance of.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Linear Regression Exploring relationships between two metric variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Some Analysis of Some Perch Catch Data 56 perch were caught in a freshwater lake in Finland Their weights, lengths, heights and widths were recorded It.
Chapter 7 Forecasting with Simple Regression
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
Regression and Analysis Variance Linear Models in R.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Collaboration and Data Sharing What have I been doing that’s so bad, and how could it be better? August 1 st, 2010.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Lecture 11: Simple Linear Regression
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Correlation and regression
Checking Regression Model Assumptions
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Checking Regression Model Assumptions
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Obtaining the Regression Line in R
Estimating the Variance of the Error Terms
Presentation transcript:

Multiple Regression Predicting a response with multiple explanatory variables

Assumptions Sample representative Error is random with mean of zero Independent variables measured without error Independent variables are linearly independent (multicollinearity) Errors uncorrelated Variance is constant (homoscedasticity

Data/Distribution Issues Consideration of outlier values – accurate estimates may require eliminating them or using robust approaches Non-normal distributions may require transformation Plot response against each explanatory variable

Modeling We want to obtain a model that fits the response (predicts) variable with as few variables as possible R 2 measures proportion of variability accounted for by the explanatory variables Adjusted R 2 takes the number of explanatory variables into account

Modeling Methods General approach is to include variables theoretically relevant to predicting the response –Gradually remove variables that are not significant and compare difference between models for significance Automatic stepwise methods –Forward and backwards

A Simple Example Kalahari data includes site area (LMS), the number of days the site was occupied and the number of people who occupied it Rcmdr – Statistics | Fit models | Linear Model

Two models Model 1: LMS ~ People + Days Model 2: LMS ~ People * Days –LMS ~ People + Days + People * Days Check significance of slopes Compare models for significant difference

> LinearModel.1 <- lm(LMS ~ People +Days, data=Kalahari) > summary(LinearModel.1) Call: lm(formula = LMS ~ People + Days, data = Kalahari) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * People e-05 *** Days * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 12 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 2 and 12 DF, p-value: 6.377e-05

> LinearModel.2 <- lm(LMS ~ People*Days, data=Kalahari) > summary(LinearModel.2) Call: lm(formula = LMS ~ People * Days, data = Kalahari) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) People Days People:Days Residual standard error: on 11 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 3 and 11 DF, p-value:

> anova(LinearModel.1, LinearModel.2) Analysis of Variance Table Model 1: LMS ~ People + Days Model 2: LMS ~ People * Days Res.Df RSS Df Sum of Sq F Pr(>F)

Darl Points Create subset of DartPoints containing only the Darl Points Model 1: Length ~ Width + Thickness Model 2: Length ~ Width * Thickness

> LinearModel.4 <- lm(Length ~ Width +Thick, data=Darl) > summary(LinearModel.4) Call: lm(formula = Length ~ Width + Thick, data = Darl) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) Width * Thick * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 24 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 2 and 24 DF, p-value: 8.554e-05

> LinearModel.5 <- lm(Length ~ Width * Thick, data=Darl) > summary(LinearModel.5) Call: lm(formula = Length ~ Width * Thick, data = Darl) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) Width Thick Width:Thick Residual standard error: on 23 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 3 and 23 DF, p-value:

> anova(LinearModel.4, LinearModel.5) Analysis of Variance Table Model 1: Length ~ Width + Thick Model 2: Length ~ Width * Thick Res.Df RSS Df Sum of Sq F Pr(>F)