Module 32: Multiple Regression This module reviews simple linear regression and then discusses multiple regression. The next module contains several examples.

Slides:



Advertisements
Similar presentations
Multiple-choice question
Advertisements

Topic 12: Multiple Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
PPA 415 – Research Methods in Public Administration
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Correlation & Regression
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Introduction to Linear Regression and Correlation Analysis
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Module 30: Randomized Block Designs The first section of this module discusses analyses for randomized block designs. The second part addresses.
Simple Linear Regression
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Module 19: Simple Linear Regression This module focuses on simple linear regression and thus begins the process of exploring one of the more used.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Regression Analysis Relationship with one independent variable.
Environmental Modeling Basic Testing Methods - Statistics III.
Categorical Independent Variables STA302 Fall 2013.
Correlation & Regression Analysis
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Research Methodology Lecture No :26 (Hypothesis Testing – Relationship)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Multiple Regression.
Inference about the slope parameter and correlation
Chapter 14 Introduction to Multiple Regression
Essentials of Modern Business Statistics (7e)
12 Inferential Analysis.
Relationship with one independent variable
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Correlation and Regression
CHAPTER 29: Multiple Regression*
Multiple Regression.
Prepared by Lee Revere and John Large
STA 282 – Regression Analysis
12 Inferential Analysis.
Relationship with one independent variable
Product moment correlation
Regression Analysis.
Presentation transcript:

Module 32: Multiple Regression This module reviews simple linear regression and then discusses multiple regression. The next module contains several examples Reviewed 19 July 05/MODULE 32

Module Content A.Review of Simple Linear Regression B.Multiple Regression C.Relationship to ANOVA and Analysis of Covariance

A. Review of Simple Linear Regression

B. Multiple Regression For simple linear regression, we used the formula for a straight line, that is, we used the model: Y=α +  x For multiple regression, we include more than one independent variable and for each new independent variable, we need to add a new term in the model, such as: Y=  0 +  1 x 1 +  2 x 2 +…+  k x k

The population equation is: Y = β 0 + β 1 x 1 + β 2 x 2 + … + β k x k the β’s are coefficients for the independent variables in the true or population equation and the x’s are the values of the independent variables for the member of the population. Population Equation

The sample equation is: ŷ j = b 0 + b 1 x 1 + b 2 x 2 + … + b k x k, where ŷ j represents the regression estimate of the dependent variable for the jth member of the sample and the b’s are estimates of the β’s. Sample Equation

The process involves using data from a sample to obtain an overall expression of the relationship between the dependent variable y and the independent variables, the x’s. This is done in such a manner that the impact of the relationship of the x’s collectively and individually on the value of y can be estimated. The Multiple Regression Process

Conceptually, multiple regression is a straight forward extension of the simple linear regression procedures. Simple linear regression is a bivariate situation, that is, it involves two dimensions, one for the dependent variable Y and one for the independent variable x. Multiple regression is a multivariable situation, with one dependent variable and multiple independent variables. The Multiple Regression Concept

CARDIA Example The data in the table on the following slide are: Dependent Variable y = BMI Independent Variables x 1 = Age in years x 2 = FFNUM, a measure of fast food usage, x 3 = Exercise, an exercise intensity score x 4 = Beers per day

b0b1b2b3b4b0b1b2b3b4 One df for each independent variable in the model

b0b1b2b3b4b0b1b2b3b4

The Multiple Regression Equation We have, b 0 = , b 1 = 0.084, b 2 = 0.422, b 3 = , b 4 = So, ŷ = x x 2 – 0.001x x 4 Age

The Multiple Regression Coefficient The interpretation of the multiple regression coefficient is similar to that for the simple linear regression coefficient, except that the phrase “adjusted for the other terms in the model” should be added. For example, the coefficient for Age in the model is b 1 = 0.084, for which the interpretation is that for every unit increase in age, that is for every one year increase in age, the BMI goes up units, adjusted for the other three terms in the model.

The first step is to test the global hypothesis: H 0 : β 1 = β 2 = β 3 = β 4 = 0 vsH 1 : β 1 ≠ β 2 ≠ β 3 ≠ β 4 ≠ 0 The ANOVA highlighted in the green box at the top of the next slide tests this hypothesis: F = > F 0.95 (4,15) = 3.06, so the hypothesis is rejected. Thus, we have evidence that at least on of the β i ≠ 0. Global Hypothesis

Variation in BMI Explained by Model The amount of variation in the dependent variable, BMI, explained by its regression relationship with the four independent variables is R 2 = SS(Model)/SS(Total) = / = 0.79 or 79%

If the global hypothesis is rejected, it is then appropriate to examine hypotheses for the individual parameters, such as H 0 : β 1 = 0 vs H 1 : β 1 ≠ 0. P = for this test is greater than α = 0.05, so we accept H 0 : β 1 = 0 Tests for Individual Parameters

Outcome of Individual Parameter Tests From the ANOVA, we have b 1 = 0.084, P = 0.66 b 2 = 0.422, P = 0.01 b 3 = , P = 0.54 b 4 = 0.326, P = 0.01 So b 2 = and b 4 = appear to represent terms that should be explored further.

Backward elimination Start with all independent variables, test the global hypothesis and if rejected, eliminate, step by step, those independent variables for which  = 0. Forward Start with a “ core ” subset of essential variables and add others step by step. Stepwise Multiple Regression

Backward Elimination The next few slides show the process and steps for the backward elimination procedure.

b0b1b2b3b4b0b1b2b3b4 Global hypothesis

Forward Stepwise Regression The next two slides show the process and steps for Forward Stepwise Regression. In this procedure, the first independent variable entered into the model is the one with the highest correlation with the dependent variable.

C. Relationship to ANOVA and Analysis of Covariance Multiple regression procedures can be used to analyze data from one-way ANOVA, randomized block, or factorial designs simply by setting up the independent variables properly for the regression analyses. To demonstrate this process, we will work with the one- way ANOVA problem for diastolic blood pressure on the following slide.

Blood pressure measurements for n = 30 children randomly assigned to receive one of three drugs Drug

H 0 : µ A = µ B = µ C vsH 1 : µ A ≠ µ B ≠ µ C The ANOVA Approach Reject H 0 : µ A = µ B = µ C since F = 3.54, is greater than F 0.95 (2,27) = 3.35

Multiple Regression Approach The multiple regression approach requires a data table such as the following, which means we need to code the drug groups in such a manner that they can be handled as independent variables in the regression model. That is, we need to prepare a data table such as the one below.

Coding the Independent Variables We can use a coding scheme for the xs to indicate the drug group for each participant. For three drugs we need two xs, with x 1 = 1 if the person received drug A = 0 otherwise x 2 = 1 if the person received drug B = 0 otherwise

Implications of Coding Scheme DrugX1X1 X2X2 A10 B01 C00 The values for x 1 and x 2 for the three drug groups are: It takes only two Xs to code the three drugs.

Person 1 has (y = BP) = 100 and receives Drug A Person 2 has (y = BP) = 102 and receives Drug B Person 3 has (y = BP ) = 105 and receives Drug C Use of Coding Scheme

These “ indicator” variables provide a mechanism for including categories into analyses using multiple regression techniques. If they are used properly, they can be made to represent complex study designs. Adding such variables to a multiple regression analysis is readily accomplished. For proper interpretation, one needs to keep in mind how the different variables are defined; otherwise, the process is straight forward multiple regression. Indicator Variables

Complete Data Table

Coding Scheme and Means x 1 = 1 if the person received drug A = 0 otherwise x 2 = 1 if the person received drug B = 0 otherwise β 0 = μ C β 1 = μ A – μ C β 2 = μ B – μ C β 1 = β 2 = 0 implies μ A = μ B = μ C

The SAS System General Linear Models Procedure Dependent Variable: Y Source DFSum of Squares Mean SquareF ValuePr > F Model Error Corrected Total R-Square C.V. Root MSE Y Mean SourceDFType I SSMean Square F ValuePr > F X X SourceDFType III SSMean Square F ValuePr > F X X ParameterEstimate T for H0: Pr > |T|Std Error of Parameter=0 Estimate INTERCEPT X X Same as ANOVA

Sample Means The model provides estimates So the drug means are: Drug A = = Drug B = = 97.2 Drug C = 96.5

where, n = sample size, p = number of parameters, including  0 R 2 = usually reported R 2 Adjusted R 2 Adjusted

where, b' = s tandardized regression coefficients, s i = standard deviation for x i, and s y = standard deviation for the dependent variable y Standardized Regression Coefficients