January 6, 2009 - afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Differences Between Population Averages. Testing the Difference Is there a difference between two populations? Null Hypothesis: H 0 or Alternate Hypothesis:
Qualitative Variables and
1 Multiple Regression Response, Y (numerical) Explanatory variables, X 1, X 2, …X k (numerical) New explanatory variables can be created from existing.
Multiple Regression [ Cross-Sectional Data ]
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
January 7, afternoon session 1 Multi-factor ANOVA and Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
1 Qualitative Independent Variables Sometimes called Dummy Variables.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
January 5, afternoon session 1 Statistics Micro Mini Statistics Review January 5-9, 2009 Beth Ayers.
Ch. 14: The Multiple Regression Model building
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
January 7, morning session 1 Statistics Micro Mini Multi-factor ANOVA January 5-9, 2008 Beth Ayers.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Active Learning Lecture Slides
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Hypothesis Testing in Linear Regression Analysis
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
Multiple Regression Analysis Multivariate Analysis.
Moderation & Mediation
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter 14 Introduction to Multiple Regression
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Chapter 13 Multiple Regression
Lecture 4 Introduction to Multiple Regression
Categorical Independent Variables STA302 Fall 2013.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
Chapter 14 Introduction to Multiple Regression
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
Soc 3306a Lecture 11: Multivariate 4
Introduction to Logistic Regression
Regression and Categorical Predictors
General Linear Regression
Presentation transcript:

January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers

January 6, afternoon session 2 Tuesday 1pm-4pm Session Dummy Variables Multiple regression ‒ Using quantitative and categorical explanatory variables ‒ Interactions among explanatory variables ‒ Linear regression vs. ANCOVA Two article critiques

January 6, afternoon session 3 Dummy Variables Categorical explanatory variables can be used in a linear regression if they are coded as dummy variables For binary variables, the most frequently used codes are 0/1 and -1/+1 For a nominal variable with k levels, create k-1 explanatory variables that are each 0/1 ‒ each subject can have a value of one for at most one of the explanatory variables

January 6, afternoon session 4 Dummy Variables Suppose we’d like to use a categorical variable that indicates which tutor (A or B) the student used. Define:

January 6, afternoon session 5 Significance testing To test if X 2 has an affect ‒ H 0 : ¯ 2 = 0 ‒ H 1 : ¯ 2 ≠ 0 ‒ This is the usual t-test for a regression coefficient, we don’t need to do anything different for dummy variables If ¯ 2 = 0, then there is no difference between the mean response of Tutor A and B If ¯ 2 ≠ 0, then ¯ 2 is the difference between the mean response for Tutor A and Tutor B

January 6, afternoon session 6 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 Can think of this is two equations ‒ When X 2 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 ‒ When X 2 = 1 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 * 1= ( ¯ 0 + ¯ 2 ) + ¯ 1 *X 1 Then ¯ 0 + ¯ 1 is the new intercept for the case where X 2 = 1

January 6, afternoon session 7 Dummy Variables Suppose we have three tutors (A, B, C). Define: Tutor A is considered the baseline

January 6, afternoon session 8 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 + ¯ 3 *X 3 Can think of this as three equations ‒ When X 2 = 0 and X 3 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 ‒ When X 2 = 1 and X 3 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 * 1= ( ¯ 0 + ¯ 2 ) + ¯ 1 *X 1 ‒ When X 2 = 0 and X 3 = 1 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 * 1= ( ¯ 0 + ¯ 3 ) + ¯ 1 *X 1

January 6, afternoon session 9 Interpretation ¯ 2 is then the difference between the mean response for Tutor A and Tutor B ¯ 3 is then the difference between the mean response for Tutor A and Tutor C To formally compare Tutor B to Tutor C, one must rerun the regression using either Tutor B or C as the baseline ‒ To informally compare them, one can look at the difference between ¯ 2 and ¯ 3

January 6, afternoon session 10 Significance testing Again, we can use the usual t-test for a regression coefficient, we don’t need to do anything different

January 6, afternoon session 11 Example Want to see if there is a gender effect in predicting efficiency Efficiency = ¯ 0 + ¯ 1 *WPM + ¯ 2 *Gender ‒ where

January 6, afternoon session 12 Example

January 6, afternoon session 13 Example

January 6, afternoon session 14 Example Step 1 ‒ F-statistic: 791 ‒ P-value = So at least one of the two variables is important in predicting Efficiency

January 6, afternoon session 15 Example Step 2 ‒ Test words per minute ‒ T-statistic: ‒ P-value = ‒ Test Gender ‒ T-statistic: ‒ P-value = Both words per minute and gender are important in predicting efficiency

January 6, afternoon session 16 Example Regression Equations ‒ Males ‒ Efficiency = – 0.49 ¢ WPM ‒ Females ‒ Efficiency = – 0.49 ¢ WPM – 3.14 ¢ 1 ‒ Efficiency = – 0.49 ¢ WPM

January 6, afternoon session 17 Interpretation of the Parameters For words per minute: for each additional word per minute that a student can type, their efficiency increases by 0.5 minutes For Gender: Holding words per minute constant, females are, on average, more efficient by 3.14 minutes

January 6, afternoon session 18 Interaction An interaction occurs between two or more explanatory variables (not between an explanatory variable and the response variable) An interaction occurs when the effect of a change in the level or value of one explanatory variable depends on the level or value of another explanatory variable In regression we account for an interaction by adding a variable that is the product of two existing explanatory variables

January 6, afternoon session 19 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 + ¯ 3 * X 1 *X 2 Assume that X 2 is a dummy variable and X 1 *X 2 is the interaction Again, can think of this is two equations ‒ When X 2 = 0 ‒ Y = ¯ 0 + ¯ 1 *X 1 ‒ When X 2 = 1 ‒ Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *1 + ¯ 3 * X 1 *1 = ( ¯ 0 + ¯ 2 ) + ( ¯ 1 + ¯ 3 )* X 1 We can think of this as a new intercept and new slope for the case where X 2 = 1

January 6, afternoon session 20 Interpretation Y = ¯ 0 + ¯ 1 *X 1 + ¯ 2 *X 2 + ¯ 3 * X 1 *X 2 ¯ 3 is called the interaction effect ¯ 1 and ¯ 2 are called main effects

January 6, afternoon session 21 Interpretation If ¯ 3 is not significant, drop the interaction and rerun the regression ‒ Including the interaction, when it is not significant, can alter the interpretations of the other variables If ¯ 3 is significant, do not need to check if ¯ 1 and ¯ 2 are significant. We will always keep X 1 and X 2 in the regression

January 6, afternoon session 22 Interaction Example Suppose we have two versions of a tutor and we want to know which helps students study for a math test In addition, we want to know if a student’s SAT math score affects their exam score We know which tutor each student used and we also have their SAT score and

January 6, afternoon session 23 Interaction Example - EDA

January 6, afternoon session 24 Interaction Example Sample output

January 6, afternoon session 25 Interaction Example Step 1: are any of the variables significant in predicting exam score ‒ F-statistic: 6025 ‒ P-value = Step 2: check interaction first ‒ T-statistic: ‒ P-value = Do not need to check main effects since the interaction is significant

January 6, afternoon session 26 Interaction Example Regression equation Tutor A (tutor = 1) ‒ Exam score = ( ) + ( ) MathSAT ‒ Exam score = MathSAT Tutor B (tutor = 0) ‒ Exam score = MathSAT

January 6, afternoon session 27 Interpretation of Coefficients On average, students using Tutor A have scores 6.39 points higher than students using tutor B For students using Tutor A, for each point that their Math SAT score increases, their exam score increases by 0.11 For students using Tutor B, for each point that their Math SAT score increases, their exam score increases by 0.06

January 6, afternoon session 28 Interaction Example

January 6, afternoon session 29 Example Explanatory variables ‒ GPA (0-5 scale) ‒ Math SAT score ‒ Time on tutor (in hours) ‒ Tutor used (A, B, C) Response variable ‒ Exam score

January 6, afternoon session 30 Exploratory Analysis Exam Score GPAMath SATTime on Tutor Exam score GPA Math SAT Time on Tutor 1.00 ABC # students151817

January 6, afternoon session 31 Plots

32 The Regression Think that time on tutor and the type of tutor may have an interaction

January 6, afternoon session 33 Analysis Step 1 ‒ F-stat = 769.5p-value = Step 2 ‒ Test the interactions first ‒ Test Time * Tutor B ‒ T-statistic: P-value = ‒ Test Time * Tutor C ‒ T-statistic: P-value = 0.847

January 6, afternoon session 34 Next Steps Since neither interaction is significant, I would drop those two variables and rerun the regression Including the interaction, when it is not significant, can alter the interpretations of the other variables

January 6, afternoon session 35 Updated regression

January 6, afternoon session 36 Analysis Step 1 ‒ F-stat = 1111p-value = Step 2 ‒ Test gpa ‒ T-statistic: 10.28P-value = ‒ Test Math SAT score ‒ T-statistic: 70.03P-value = ‒ Test time on tutor ‒ T-statistic: -0.43P-value = ‒ Test Tutor B ‒ T-statistic: P-value = ‒ Test Tutor C ‒ T-statistic: 2.60P-value =

37 Next step Time on tutor is not significant ‒ Drop time and rerun

January 6, afternoon session 38 Analysis Step 1 ‒ F-stat = 1414p-value = Step 2 ‒ Test gpa ‒ T-statistic: 10.51P-value = ‒ Test Math SAT score ‒ T-statistic: 70.69P-value = ‒ Test Tutor B ‒ T-statistic: P-value = ‒ Test Tutor C ‒ T-statistic: 2.67P-value = 0.011

January 6, afternoon session 39 Interpretation For each addition GPA point, a student scores on average 2.1 points higher on the final exam For each addition Math SAT point, a student scores on average 0.11 points higher on the final exam

January 6, afternoon session 40 Interpretation of Dummy Variables Students who used Tutor B scored on average 4.6 points lower on the final exam, compared to students using tutor A Students who used Tutor C scored on average 1.1 points higher on the final exam, compared to students using tutor A

January 6, afternoon session 41 Interpretation of Dummy Variables We can say that students who used Tutor C scored on average 1.10-(-4.63) = 5.73 points higher than students who used Tutor B However, to say if it is a significant difference one would need to rerun the regression equation with either Tutor B or C as the baseline Although 5.73 is large, since we do NOT have a test statistic and p-value we can not make any claims about significance

January 6, afternoon session 42 Check Assumptions

January 6, afternoon session 43 Example Suppose we have the following regression Exam Score = *gpa *MathSAT + 1.3*time *TutorB *TutorC + 1.8*time*TutorB - 1.7*time*TutorC Assume that in Step 1 we reject the null and that in Step 2 gpa, Math SAT, and the interaction are significant. Remember, since the interaction is significant, we are not concerned with the significance of time or tutor alone

January 6, afternoon session 44 Interpretation Tutor A ‒ Exam Score = *gpa *MathSAT + 1.3*time Tutor B ‒ Exam Score = *gpa *MathSAT + 1.3*time *TutorB + 1.8*time*TutorB ‒ Exam Score = ( ) *gpa *MathSAT + ( )*time ‒ Exam Score = *gpa *MathSAT + 3.1*time Tutor C ‒ Exam Score = *gpa *MathSAT + 1.3*time *TutorC - 1.7*time*TutorC ‒ Exam Score = ( ) *gpa *MathSAT + ( )*time ‒ Exam Score = *gpa *MathSAT *time

January 6, afternoon session 45 Interpretation For each additional point in GPA, a student’s exam score increases by 3.21 For each additional point in Math SAT, a student’s exam score increases by 0.18 Students who use tutor B score on average 1.01 points higher on the final exam than students using tutor A Students who use tutor C score on average 1.44 points lower on the final exam than students using tutor A

January 6, afternoon session 46 Interpretation Students using Tutor A ‒ For each additional minute on the tutor, students exam scores increase by 1.3 Students using Tutor B ‒ For each additional minute on the tutor, students exam scores increase by 3.1 Students using Tutor C ‒ For each additional minute on the tutor, students exam scores decrease by 0.40

January 6, afternoon session 47 ANCOVA Analysis of Covariance ‒ At least one quantitative and one categorical explanatory variable ‒ In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable ‒ It is a blending of regression and ANOVA

January 6, afternoon session 48 ANCOVA Can either run a linear regression with a dummy variable or as an ANCOVA model, in which case output is similar to ANOVA models Will get the same results in either case! Different statistical packages make one or the other easier to run It is a matter of preference and interpretation