Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.

Slides:



Advertisements
Similar presentations
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Inference for Regression
Multiple Regression [ Cross-Sectional Data ]
Chapter 13 Multiple Regression
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
The Simple Regression Model
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
Generalized Linear Models
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Simple Linear Regression
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
EQT 272 PROBABILITY AND STATISTICS
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Logistic and Nonlinear Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10: Correlation and Regression Model.
Simple linear regression Tron Anders Moger
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Applied Regression Analysis BUSI 6220
Chapter 12 Multiple Regression.
CHAPTER 29: Multiple Regression*
Multiple Regression Numeric Response variable (y)
Linear Regression/Correlation
Multiple Regression Numeric Response variable (Y)
Multiple Linear Regression
Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression and Correlation
Introduction to Regression
Presentation transcript:

Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model:  1 > 0  Positive Association  1 < 0  Negative Association  1 = 0  No Association

Least Squares Estimation of  0,  1  0  Mean response when x=0 (y-intercept)  1  Change in mean response when x increases by 1 unit (slope)  0,  1 are unknown parameters (like  )  0 +  1 x  Mean response when explanatory variable takes on the value x Goal: Choose values (estimates) that minimize the sum of squared errors (SSE) of observed values to the straight-line:

Example - Pharmacodynamics of LSD Response (y) - Math score (mean among 5 volunteers) Predictor (x) - LSD tissue concentration (mean of 5 volunteers) Raw Data and scatterplot of Score vs LSD concentration: Source: Wagner, et al (1968)

Least Squares Computations

Example - Pharmacodynamics of LSD (Column totals given in bottom row of table)

SPSS Output and Plot of Equation

Inference Concerning the Slope (  1 ) Parameter: Slope in the population model (  1 ) Estimator: Least squares estimate: Estimated standard error: Methods of making inference regarding population: –Hypothesis tests (2-sided or 1-sided) –Confidence Intervals

Hypothesis Test for  1 2-Sided Test –H 0 :  1 = 0 –H A :  1  0 1-sided Test –H 0 :  1 = 0 –H A + :  1 > 0 or –H A - :  1 < 0

(1-  )100% Confidence Interval for  1 Conclude positive association if entire interval above 0 Conclude negative association if entire interval below 0 Cannot conclude an association if interval contains 0 Conclusion based on interval is same as 2-sided hypothesis test

Example - Pharmacodynamics of LSD Testing H 0 :  1 = 0 vs H A :  1  0 95% Confidence Interval for  1 :

Confidence Interval for Mean When x=x* Mean Response at a specific level x* is Estimated Mean response and standard error (replacing unknown  0 and  1 with estimates): Confidence Interval for Mean Response:

Prediction Interval of Future x=x* Response at a specific level x* is Estimated response and standard error (replacing unknown  0 and  1 with estimates): Prediction Interval for Future Response:

Correlation Coefficient Measures the strength of the linear association between two variables Takes on the same sign as the slope estimate from the linear regression Not effected by linear transformations of y or x Does not distinguish between dependent and independent variable (e.g. height and weight) Population Parameter -  Pearson’s Correlation Coefficient:

Correlation Coefficient Values close to 1 in absolute value  strong linear association, positive or negative from sign Values close to 0 imply little or no association If data contain outliers (are non-normal), Spearman’s coefficient of correlation can be computed based on the ranks of the x and y values Test of H 0 :  = 0 is equivalent to test of H 0 :  1 =0 Coefficient of Determination (r 2 ) - Proportion of variation in y “explained” by the regression on x:

Example - Pharmacodynamics of LSD S yy SSE

Example - SPSS Output Pearson’s and Spearman’s Measures

Analysis of Variance in Regression Goal: Partition the total variation in y into variation “explained” by x and random variation These three sums of squares and degrees of freedom are: Total (SST) DFT = n-1 Error (SSE) DFE = n-2 Model (SSM) DFM = 1

Analysis of Variance in Regression Analysis of Variance - F-test H 0 :  1 = 0 H A :  1  0

Example - Pharmacodynamics of LSD Total Sum of squares: Error Sum of squares: Model Sum of Squares:

Example - Pharmacodynamics of LSD Analysis of Variance - F-test H 0 :  1 = 0 H A :  1  0

Example - SPSS Output

Multiple Regression Numeric Response variable (Y) p Numeric predictor variables Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression Coefficients:  i  effect (on the mean response) of increasing the i th predictor variable by 1 unit, holding all other predictors constant

Example - Effect of Birth weight on Body Size in Early Adolescence Response: Height at Early adolescence (n =250 cases) Predictors (p=6 explanatory variables) Adolescent Age (x 1, in years ) Tanner stage (x 2, units not given) Gender (x 3 =1 if male, 0 if female) Gestational age (x 4, in weeks at birth) Birth length (x 5, units not given) Birthweight Group (x 6 =1,...,6 3500g(6)) Source: Falkner, et al (2004)

Least Squares Estimation Population Model for mean response: Least Squares Fitted (predicted) equation, minimizing SSE: All statistical software packages/spreadsheets can compute least squares estimates and their standard errors

Analysis of Variance Direct extension to ANOVA based on simple linear regression Only adjustments are to degrees of freedom: –DFM = p DFE = n-p-1

Testing for the Overall Model - F-test Tests whether any of the explanatory variables are associated with the response H 0 :  1 =  =  p =0 (None of the x s associated with y) H A : Not all  i = 0

Example - Effect of Birth weight on Body Size in Early Adolescence Authors did not print ANOVA, but did provide following: n=250 p=6 R 2 =0.26 H 0 :  1 =  =  6 =0 H A : Not all  i = 0

Testing Individual Partial Coefficients - t-tests Wish to determine whether the response is associated with a single explanatory variable, after controlling for the others H 0 :  i = 0 H A :  i  0 (2-sided alternative)

Example - Effect of Birth weight on Body Size in Early Adolescence Controlling for all other predictors, adolescent age, Tanner stage, and Birth length are associated with adolescent height measurement

Testing for the Overall Model - F-test Tests whether any of the explanatory variables are associated with the response H 0 :  1 =  =  p =0 (None of X s associated with Y) H A : Not all  i = 0 The P-value is based on the F-distribution with p numerator and (n-p-1) denominator degrees of freedom

Comparing Regression Models Conflicting Goals: Explaining variation in Y while keeping model as simple as possible (parsimony) We can test whether a subset of p-g predictors (including possibly cross-product terms) can be dropped from a model that contains the remaining g predictors. H 0 :  g+1 =…=  p =0 –Complete Model: Contains all k predictors –Reduced Model: Eliminates the predictors from H 0 –Fit both models, obtaining the Error sum of squares for each (or R 2 from each)

Comparing Regression Models H 0 :  g+1 =…=  p = 0 (After removing the effects of X 1,…,X g, none of other predictors are associated with Y) H a : H 0 is false P-value based on F-distribution with p-g and n-p-1 d.f.

Models with Dummy Variables Some models have both numeric and categorical explanatory variables (Recall gender in example) If a categorical variable has k levels, need to create k-1 dummy variables that take on the values 1 if the level of interest is present, 0 otherwise. The baseline level of the categorical variable for which all k-1 dummy variables are set to 0 The regression coefficient corresponding to a dummy variable is the difference between the mean for that level and the mean for baseline group, controlling for all numeric predictors

Example - Deep Cervical Infections Subjects - Patients with deep neck infections Response (Y) - Length of Stay in hospital Predictors: (One numeric, 11 Dichotomous) –Age (x 1 ) –Gender (x 2 =1 if female, 0 if male) –Fever (x 3 =1 if Body Temp > 38C, 0 if not) –Neck swelling (x 4 =1 if Present, 0 if absent) –Neck Pain (x 5 =1 if Present, 0 if absent) –Trismus (x 6 =1 if Present, 0 if absent) –Underlying Disease (x 7 =1 if Present, 0 if absent) –Respiration Difficulty (x 8 =1 if Present, 0 if absent) –Complication (x 9 =1 if Present, 0 if absent) – WBC > 15000/mm 3 (x 10 =1 if Present, 0 if absent) –CRP > 100  g/ml (x 11 =1 if Present, 0 if absent) Source: Wang, et al (2003)

Example - Weather and Spinal Patients Subjects - Visitors to National Spinal Network in 23 cities Completing SF-36 Form Response - Physical Function subscale (1 of 10 reported) Predictors: –Patient’s age (x 1 ) –Gender (x 2 =1 if female, 0 if male) –High temperature on day of visit (x 3 ) –Low temperature on day of visit (x 4 ) –Dew point (x 5 ) –Wet bulb (x 6 ) –Total precipitation (x 7 ) –Barometric Pressure (x 7 ) –Length of sunlight (x 8 ) –Moon Phase (new, wax crescent, 1st Qtr, wax gibbous, full moon, wan gibbous, last Qtr, wan crescent, presumably had 8-1=7 dummy variables) Source: Glaser, et al (2004)

Analysis of Covariance Combination of 1-Way ANOVA and Linear Regression Goal: Comparing numeric responses among k groups, adjusting for numeric concomitant variable(s), referred to as Covariate(s) Clinical trial applications: Response is Post-Trt score, covariate is Pre-Trt score Epidemiological applications: Outcomes compared across exposure conditions, adjusted for other risk factors (age, smoking status, sex,...)

Nonlinear Regression Theory often leads to nonlinear relations between variables. Examples: –1-compartment PK model with 1st-order absorption and elimination –Sigmoid-E max S-shaped PD model

Example - P24 Antigens and AZT Goal: Model time course of P24 antigen levels after oral administration of zidovudine Model fit individually in 40 HIV + patients: where: E(t) is the antigen level at time t E 0 is the initial level A is the coefficient of reduction of P24 antigen k out is the rate constant of decrease of P24 antigen Source: Sasomsin, et al (2002)

Example - P24 Antigens and AZT Among the 40 individuals who the model was fit, the means and standard deviations of the PK “parameters” are given below: Fitted Model for the “mean subject”

Example - P24 Antigens and AZT

Example - MK639 in HIV + Patients Response: Y = log 10 (RNA change) Predictor: x = MK639 AUC 0-6h Model: Sigmoid-E max : where:  0 is the maximum effect (limit as x  )  1 is the x level producing 50% of maximum effect  2 is a parameter effecting the shape of the function Source: Stein, et al (1996)

Example - MK639 in HIV + Patients Data on n = 5 subjects in a Phase 1 trial: Model fit using SPSS (estimates slightly different from notes, which used SAS)

Example - MK639 in HIV + Patients

Data Sources Wagner, J.G., G.K. Aghajanian, and O.H. Bing (1968). “Correlation of Performance Test Scores with Tissue Concentration of Lysergic Acid Diethylamide in Human Subjects,” Clinical Pharmacology and Therapeutics, 9: