MULTIPLE REGRESSION Using more than one variable to predict another.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
SIMPLE LINEAR REGRESSION. Last week  Discussed the ideas behind:  Hypothesis testing  Random Sampling Error  Statistical Significance, Alpha, and.
ANOVA: PART II. Last week  Introduced to a new test:  One-Way ANOVA  ANOVA’s are used to minimize family-wise error:  If the ANOVA is statistically.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 12 Simple Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Intro to Statistics for the Behavioral Sciences PSYC 1900
Linear Regression and Correlation Analysis
Biol 500: basic statistics
Introduction to Probability and Statistics Linear Regression and Correlation.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Statistics for the Social Sciences Psychology 340 Spring 2005 Course Review.
Multiple Regression – Basic Relationships
Correlation and Regression Analysis
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Correlation & Regression
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Active Learning Lecture Slides
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
GROUP DIFFERENCES: THE SEQUEL. Last time  Last week we introduced a few new concepts and one new statistical test:  Testing for group differences 
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Multiple Regression Last Part – Model Reduction 1.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
The Goal of MLR  Types of research questions answered through MLR analysis:  How accurately can something be predicted with a set of IV’s? (ex. predicting.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Education 793 Class Notes Multiple Regression 19 November 2003.
Experimental Design and Statistics. Scientific Method
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
I271B The t distribution and the independent sample t-test.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Correlation. Up Until Now T Tests, Anova: Categories Predicting a Continuous Dependent Variable Correlation: Very different way of thinking about variables.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 Topic 3 – Multiple Regression Analysis Regression on Several Predictor Variables (Chapter 8)
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Simple Linear Regression
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Simple Linear Regression
Checking Assumptions Primary Assumptions Secondary Assumptions
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Regression Part II.
Presentation transcript:

MULTIPLE REGRESSION Using more than one variable to predict another

Last Week  Coefficient of Determination r2r2  Explained variance between 2 variables  Simple Linear Regression  y = mx + b  Predicting one variable from another Based on explained variance – if r 2 is large, should be a good predictor  Predicting one dependent variable from one independent variable  SEE, residuals

Tonight  Predicting one DV from one IV is simple linear regression  Predicting one DV from multiple IV’s is called multiple linear regression  More IV’s usually allow for a better prediction of the DV  If IV A explains 20% of the variance (r 2 = 0.20) and  IV B explains 30% of the variance (r 2 = 0.30), then  Can I use both to predict the dependent variable?

Example: Activity Dataset  To demonstrate, we’ll use the same data as last week, on the pedometer and armband  Goal: To predict Armband calories (real calories expended) as accurately as possible  Lets start by trying to predict Armband calories with body weight  Complete simple linear regression with body weight

Simple Regression  Here is the simple regression output from using Body Weight (kg) to predict Armband Calories

Simple Regression  Results using Body Weight (kg):  r 2 =  SEE = calories  Can we improve on this equation by adding in new variables?  First, we have to determine if other variables in the dataset might be related to Armband Calories..  Use correlation matrix

Correlations  Notice several variables have some association with armband calories: Variablerr2r2 Height Weight BMI PedSteps PedCalories

Create new regression equation  Simple regression equation looks like:  y = mx + b  Multiple regression equation looks like:  y = m 1 x 1 + m 2 X 2 + b  Subscript is used to help organize the data  All we are doing is adding an additional variable into our equation.  That new variable will have it’s own slope, m 2  For the sake of simplicity, lets add in pedometer steps as X 2

OUTPUT…

Multiple Regression Output

Simple to Multiple  Results using Body Weight (kg):  r 2 =  SEE = calories  Results using Body Weight and Pedometer Steps:  r 2 =  SEE = calories  r 2 change = – =  If 2 variables are good – would 3 be even better?

Adding one more in…  In addition to body weight (x 1 ) and pedometer steps (x 2 ), lets add in age (x 3 )

Multiple Regression Output 2

Simple to Multiple  Results using Body Weight (kg):  r 2 =  SEE = calories  Results using Body Weight and Pedometer Steps:  r 2 =  SEE = calories  r 2 change =  Results using Body Weight, PedSteps, Age  r 2 =  SEE =  r 2 change = – = 0.017

Multiple Regression Decisions  Should we recommend that age is used in the model?  These decisions can be difficult  “Model Building” or “Model Reduction” is more of an art than a science  Consider  p-value of age in model =  r 2 change by adding age = 0.017, or 1.7% of variance  More coefficients (predictors) make the model more complicated to use and interpret  Does it make sense to include age? Should age be related to caloric expenditure?

Other Regression Issues  Sample Size  With too small a sample, you lack the statistical power to generalize your results to other samples/the whole population You increase your risk of Type II Error (failing to reject the alternative hypothesis when true)  In multiple regression, the more variables you use in your model the greater your risk of Type II Error This is a complicated issue, but essentially you need large samples to use several predictors Guidelines…

Other Regression Issues  Sample Size  Tabachnick & Fidell (1996): N > m N=appropriate sample size, m=# of IV’s So, if you use 3 predictors (like we just did in our example): *3 = 74 subjects  You can find several different ‘guess-timates’, I usually just try have 30 subjects, plus another 30 for each variable in the model (ie, m) I like to play it safe…

Other Regression Issues  Multiple Regression has the same statistical assumptions as correlation/regression  Check for normal distribution, outliers, etc…  One new concern with multiple regression is the idea of Collinearity  You have to be careful that your IV’s (predictor variables) are not highly correlated with each other  Can cause a model to overestimate r 2  Can also cause one new variable to eliminate another

Example Collinearity  Results of MLR using Body Weight, PedSteps, Age  r 2 =  SEE =  Imagine we want to add in one other variable, Pedometer Calories  Look at the correlation matrix first…

 Notice that Armband calories is highly correlated with both Pedometer Steps and Pedometer Calories  Initially, this looks great because we might have two very good predictors to use  But, notice that Pedometer Calories is very highly correlated with Pedometer Steps  These two variables are probably collinear – they are very similar and may not explain ‘unique’ variance

Here is the MLR result with Weight, Steps, and Age: Here is the MLR result by adding Pedometer calories in the model: Pedometer calories becomes the only significant predictor in the model. In other words, the variance in the other 3 variables can be explained by Pedometer Calories – not all 4 variables add ‘unique’ variance to the model

Example Collinearity  Results of MLR using Body Weight, PedSteps, Age  r 2 =  SEE =  Results of MLR using Body Weight, PedSteps, Age, and PedCalories  r 2 =  SEE =  Results of MLR using just PedCalories (eliminates collinearity)  r 2 =  SEE = Which model is the best model? Remember, we’d like to pick the strongest predictor model with the fewest number of predictor variables

Model Building  Collinearity makes model building more difficult  1) When you add in new variables you have to look at r 2, r 2 change, and SEE – but you also have to notice what’s happening to the other IV’s in the model  2) Sometimes, you need to remove variables that used to be good predictors  3) This is why the model with the most variables is not always the best model – sometimes you can do just as well with 1 or 2 variables

What to do about Collinearity?  Your approach:  Use a correlation matrix to examine the variables BEFORE you try to build your model 1) Check the IV’s correlations with the DV (high correlations will probably be best predictors) but… 2) Check the IV’s correlations with the other IV’s (high correlations probably indicate collinearity)  If you do find that two IV’s are highly correlated, be aware that having them both in the model is probably not the best approach (pick the best one and keep it) QUESTIONS…?

Upcoming…  In-class activity on MLR…  Homework (not turned in due to exam):  Cronk Section 5.4  OPTIONAL: Holcomb Exercises 31 and 32 Multiple correlation, NOT full multiple linear regression Similar to MLR, but looks at the model’s r instead of making a prediction equation  Mid-Term Exam next week  Group differences after spring break (t-test, ANOVA, etc…)