Chapter 12 Multiple Regression.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

1-Way Analysis of Variance
Qualitative predictor variables
Inference for Regression
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Chapter 13 Multiple Regression
Chapter 12 Multiple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Simple Linear Regression Analysis
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Linear Regression/Correlation
Generalized Linear Models
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Chapter 14 Introduction to Multiple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Logistic and Nonlinear Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
A first order model with one binary and one quantitative predictor variable.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Stats Methods at IC Lecture 3: Regression.
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
CHAPTER 7 Linear Correlation & Regression Methods
Essentials of Modern Business Statistics (7e)
John Loucks St. Edward’s University . SLIDES . BY.
Multiple Regression and Model Building
Statistics for Business and Economics (13e)
Generalized Linear Models
Slides by JOHN LOUCKS St. Edward’s University.
Business Statistics Multiple Regression This lecture flows well with
11. Multiple Regression y – response variable
Multiple logistic regression
Multiple Regression Numeric Response variable (y)
Linear Regression/Correlation
CHAPTER 14 MULTIPLE REGRESSION
Multiple Regression Numeric Response variable (Y)
Multiple Linear Regression
Linear Regression and Correlation
Linear Regression and Correlation
Introduction to Regression
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
St. Edward’s University
Presentation transcript:

Chapter 12 Multiple Regression

Multiple Regression Numeric Response variable (y) k Numeric predictor variables (k < n) Model: Y = b0 + b1x1 +  + bkxk + e Partial Regression Coefficients: bi  effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant Model Assumptions (Involving Error terms e ) Normally distributed with mean 0 Constant Variance s2 Independent (Problematic when data are series in time/space)

Example - Effect of Birth weight on Body Size in Early Adolescence Response: Height at Early adolescence (n =250 cases) Predictors (k=6 explanatory variables) Adolescent Age (x1, in years -- 11-14) Tanner stage (x2, units not given) Gender (x3=1 if male, 0 if female) Gestational age (x4, in weeks at birth) Birth length (x5, units not given) Birthweight Group (x6=1,...,6 <1500g (1), 1500-1999g(2), 2000-2499g(3), 2500-2999g(4), 3000-3499g(5), >3500g(6)) Source: Falkner, et al (2004)

Least Squares Estimation Population Model for mean response: Least Squares Fitted (predicted) equation, minimizing SSE: All statistical software packages/spreadsheets can compute least squares estimates and their standard errors

Analysis of Variance Direct extension to ANOVA based on simple linear regression Only adjustments are to degrees of freedom: DFR = k DFE = n-(k+1)

Testing for the Overall Model - F-test Tests whether any of the explanatory variables are associated with the response H0: b1==bk=0 (None of the xs associated with y) HA: Not all bi = 0

Example - Effect of Birth weight on Body Size in Early Adolescence Authors did not print ANOVA, but did provide following: n=250 k=6 R2=0.26 H0: b1==b6=0 HA: Not all bi = 0

Testing Individual Partial Coefficients - t-tests Wish to determine whether the response is associated with a single explanatory variable, after controlling for the others H0: bi = 0 HA: bi  0 (2-sided alternative)

Example - Effect of Birth weight on Body Size in Early Adolescence Controlling for all other predictors, adolescent age, Tanner stage, and Birth length are associated with adolescent height measurement

Comparing Regression Models Conflicting Goals: Explaining variation in Y while keeping model as simple as possible (parsimony) We can test whether a subset of k-g predictors (including possibly cross-product terms) can be dropped from a model that contains the remaining g predictors. H0: bg+1=…=bk =0 Complete Model: Contains all k predictors Reduced Model: Eliminates the predictors from H0 Fit both models, obtaining sums of squares for each (or R2 from each): Complete: SSRc , SSEc (Rc2) Reduced: SSRr , SSEr (Rr2)

Comparing Regression Models H0: bg+1=…=bp = 0 (After removing the effects of X1,…,Xg, none of other predictors are associated with Y) Ha: H0 is false P-value based on F-distribution with k-g and n-(k+1) d.f.

Models with Dummy Variables Some models have both numeric and categorical explanatory variables (Recall gender in example) If a categorical variable has m levels, need to create m-1 dummy variables that take on the values 1 if the level of interest is present, 0 otherwise. The baseline level of the categorical variable is the one for which all m-1 dummy variables are set to 0 The regression coefficient corresponding to a dummy variable is the difference between the mean for that level and the mean for baseline group, controlling for all numeric predictors

Example - Deep Cervical Infections Subjects - Patients with deep neck infections Response (Y) - Length of Stay in hospital Predictors: (One numeric, 11 Dichotomous) Age (x1) Gender (x2=1 if female, 0 if male) Fever (x3=1 if Body Temp > 38C, 0 if not) Neck swelling (x4=1 if Present, 0 if absent) Neck Pain (x5=1 if Present, 0 if absent) Trismus (x6=1 if Present, 0 if absent) Underlying Disease (x7=1 if Present, 0 if absent) Respiration Difficulty (x8=1 if Present, 0 if absent) Complication (x9=1 if Present, 0 if absent) WBC > 15000/mm3 (x10=1 if Present, 0 if absent) CRP > 100mg/ml (x11=1 if Present, 0 if absent) Source: Wang, et al (2003)

Example - Weather and Spinal Patients Subjects - Visitors to National Spinal Network in 23 cities Completing SF-36 Form Response - Physical Function subscale (1 of 10 reported) Predictors: Patient’s age (x1) Gender (x2=1 if female, 0 if male) High temperature on day of visit (x3) Low temperature on day of visit (x4) Dew point (x5) Wet bulb (x6) Total precipitation (x7) Barometric Pressure (x7) Length of sunlight (x8) Moon Phase (new, wax crescent, 1st Qtr, wax gibbous, full moon, wan gibbous, last Qtr, wan crescent, presumably had 8-1=7 dummy variables) Source: Glaser, et al (2004)

Modeling Interactions Statistical Interaction: When the effect of one predictor (on the response) depends on the level of other predictors. Can be modeled (and thus tested) with cross-product terms (case of 2 predictors): E(Y) = a + b1X1 + b2X2 + b3X1X2 X2=0  E(Y) = a + b1X1 X2=10  E(Y) = a + b1X1 + 10b2 + 10b3X1 = (a + 10b2) + (b1 + 10b3)X1 The effect of increasing X1 by 1 on E(Y) depends on level of X2, unless b3=0 (t-test)

Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) Goal: Model the probability of a particular outcome as a function of the predictor variable(s) Problem: Probabilities are bounded between 0 and 1

Logistic Regression with 1 Predictor Response - Presence/Absence of characteristic Predictor - Numeric variable observed for each case Model - p (x)  Probability of presence at predictor level x b = 0  P(Presence) is the same at each level of x b > 0  P(Presence) increases as x increases b < 0  P(Presence) decreases as x increases

Logistic Regression with 1 Predictor b0, b1 are unknown parameters and must be estimated using statistical software such as SPSS, SAS, or STATA Primary interest in estimating and testing hypotheses regarding b1 Large-Sample test (Wald Test): (Some software runs z-test) H0: b1 = 0 HA: b1  0

Example - Rizatriptan for Migraine Response - Complete Pain Relief at 2 hours (Yes/No) Predictor - Dose (mg): Placebo (0),2.5,5,10 Source: Gijsmant, et al (1997)

Example - Rizatriptan for Migraine (SPSS)

Odds Ratio Interpretation of Regression Coefficient (b1): In linear regression, the slope coefficient is the change in the mean response as x increases by 1 unit In logistic regression, we can show that: Thus eb1 represents the change in the odds of the outcome (multiplicatively) by increasing x by 1 unit If b1=0, the odds (and probability) are equal at all x levels (eb1=1) If b1>0 , the odds (and probability) increase as x increases (eb1>1) If b1< 0 , the odds (and probability) decrease as x increases (eb1<1)

95% Confidence Interval for Odds Ratio Step 1: Construct a 95% CI for b : Step 2: Raise e = 2.718 to the lower and upper bounds of the CI: If entire interval is above 1, conclude positive association If entire interval is below 1, conclude negative association If interval contains 1, cannot conclude there is an association

Example - Rizatriptan for Migraine 95% CI for b1: 95% CI for population odds ratio: Conclude positive association between dose and probability of complete relief

Multiple Logistic Regression Extension to more than one predictor variable (either numeric or dummy variables). With p predictors, the model is written: Adjusted Odds ratio for raising xi by 1 unit, holding all other predictors constant: Inferences on bi and ORi are conducted as was described above for the case with a single predictor

Example - ED in Older Dutch Men Response: Presence/Absence of ED (n=1688) Predictors: (k=12) Age stratum (50-54*, 55-59, 60-64, 65-69, 70-78) Smoking status (Nonsmoker*, Smoker) BMI stratum (<25*, 25-30, >30) Lower urinary tract symptoms (None*, Mild, Moderate, Severe) Under treatment for cardiac symptoms (No*, Yes) Under treatment for COPD (No*, Yes) * Baseline group for dummy variables Source: Blanker, et al (2001)

Example - ED in Older Dutch Men Interpretations: Risk of ED appears to be: Increasing with age, BMI, and LUTS strata Higher among smokers Higher among men being treated for cardiac or COPD