Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Methods for Assessing Associations.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
CHAPTER 24: Inference for Regression
Describing Relationships Using Correlation and Regression
Objectives (BPS chapter 24)
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Multiple regression analysis
Describing the Relation Between Two Variables
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
REGRESSION AND CORRELATION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Regression and Correlation
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Linear Regression and Correlation
Correlation and Linear Regression
Simple Linear Regression
Simple Linear Regression Models
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 5: Methods for Assessing Associations.
Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Associations and Confounding.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Chapter 10 Correlation and Regression
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 2: Summarization of Quantitative Information.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Correlation & Regression Analysis
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
SIMPLE LINEAR REGRESSION AND CORRELLATION
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Stats Methods at IC Lecture 3: Regression.
Session 4: Measuring Associations Among Factors in Research Studies
Lecture Slides Elementary Statistics Thirteenth Edition
6-1 Introduction To Empirical Models
Simple Linear Regression
Chapter Thirteen McGraw-Hill/Irwin
Presentation transcript:

Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Methods for Assessing Associations

Readings for Session 5 from StatisticalPractice.com Simple Linear Regression Introduction to Simple Linear Regression Transformations in Linear Regression Multiple Regression Introduction to Multiple Regression What Does Multiple Regression Look Like? Which Predictors are More Important? Also, without any reading: Correlation

Correlation Visualize Y (vertical) by X (horizontal) scatterplot Pearson correlation, r, is used to measure association between two measures X and Y Ranges from -1 (perfect inverse association) to 1 (direct association) Value of r does not depend on: scales (units) of X and Y which role X and Y assume, as in a X-Y plot Value of r does depend on: the ranges of X and Y values chosen for X, if X is fixed and Y is measured

Graphs and Values of Correlation

Correlation Depends on Ranges of X and Y Graph B contains only the graph A points in the ellipse. Correlation is reduced in graph B. Thus: correlations for the same quantities X and Y may be quite different in different study populations. BA

Regression Again: Y (vertical) by X (horizontal) scatterplot, as with correlation. See next slide. X and Y now assume different roles: Y is an outcome, response, output, dependent variable X is an input, predictor, independent variable Regression analysis is used to: Fit a straight line through the scatterplot. Measure X-Y association, as does correlation. Predict Y from X, and assess the precision of the prediction.

Regression Example

X-Y Association If slope=0 then X and Y are not associated. But the slope measured from a sample will never be 0. How different from 0 does a measured slope need to be to claim X and Y are associated? Test H 0 : slope=0 vs. H A : slope≠0, with the rule: Claim association (H A ) if t c =|slope/SE(slope)| > t ≈ 2. There is a 5% chance of claiming an X-Y association that really does not exist. Note similarity to t-test for means: t c =|mean/ SE(mean)|. Formula for SE(slope) is in statistics books.

X-Y Association, Continued Refer to the graph of the example, 2 slides back. We are 95% sure that the true line for the X-Y association is within the inner..… band about the estimated line from our limited sample data. If our test of H 0 : slope=0 vs. H A : slope≠0 results in claiming H A, then the inner..… band does not include the horizontal line, and vice-versa. X and Y are significantly associated. We can also test H 0 : ρ=0 vs. H A : ρ ≠0, where ρ is the true correlation estimated by r. The result is identical to that for the slope. Thus, correlation and regression are equivalent methods for measuring whether two variables are linearly associated.

Prediction from Regression Again, Refer to the graph of the example, 3 slides back. The regression line (e.g., y= x) is used for: 1.Predicting y for an individual with a known value of x. We are 95% sure that the individual’s true y is between the outer (---) band endpoints vertically above x. This interval is analogous to mean±2SD. 2.Predicting the mean y for “all” subjects with a known value of x. We are 95% sure that this mean is between the inner (….) band endpoints vertically above x. This interval is analogous to mean±2SE.

Example Software Output The regression equation is: Y = X Predictor Coeff StdErr T P Constant < X < S = R-Sq = 79.0% Predicted Values: X: 100 Fit: SE(Fit): % CI: % PI: Predicted y = (100) Range of Ys with 95% assurance for: Mean of all subjects with x=100. Individual with x= =2.16/0.112 should be between ~ -2 and 2 if slope=0.

Regression Issues 1.We are assuming that the relation is linear. 2.We can generalize to more complicated non-linear associations. 3.Transformations, e.g., logarithmic, can be made to achieve linearity. 4.The vertical distances between the actual y’s and the predicted y’s (on the line) are called “residuals”. Their magnitude should not depend on the value of x (e.g., should not tend to be larger for larger x), and should be symmetrically distributed about 0. If not, transformations can often achieve this.

Multiple Regression: Geometric View “Multiple” refers to using more than one X (say X 1 and X 2 ) simultaneously to predict Y. Geometrically, this is fitting a slanted plane to a cloud of points: Graph from the readings. LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B 12 ). LHCY = b0 + b1LCLC + b2LB12

Multiple Regression: More General More than 2 predictors can be used. The equation is for a “hyperplane”: y = b 0 + b 1 x 1 + b 2 x 2 + … + b k x k. A more realistic functional form, more complex than a plane, can be used. For example, to fit curvature for x 2, use y = b 0 + b 1 x 1 + b 2 x 2 + b 3 x 2 2. If predictors themselves are highly correlated, then the fitted equation is imprecise. [This is because the x1 and x2 data then lie in almost a line in the x1-x2 plane, so the fitted plane is like an unstable tabletop with the table legs not well- spaced.] How many and which variables to include? Prediction strategies (e.g, stepwise) differ from “significance” of factors.

Reading Example: HDL Cholesterol Parameter Std Standardized Estimate Error T Pr > |t| Estimate Intercept < AGE BMI < BLC PRSSY DIAST GLUM SKINF LCHOL The predictors are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. LHDL= Age+…+0.311LCHOL

Reading Example: Coefficients Interpretation of coefficients (“parameter estimates”) from output LHDL= Age+…+0.311LCHOL on previous slide: 1.Need to use entire equation for making predictions. 2.Each coefficient measures the difference in expected LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is lower in a subject whose BMI is 1 unit greater, but on other factors the same as another subject. 3.Situation in (2) may be unrealistic, or impossible.

Reading Example: Predictors P-values measure “independent” effect of the factor; i.e., whether it is associated with the outcome (LHDL here) after accounting for all of the other effects in the model. Which factors should be included in the equation? Remove those that are not significant (p<0.05)? In general, it depends on the goal: For prediction, more predictors → less bias, but less precision. “Stepwise methods” balance this. For importance of a particular factor, need to include that factor and other factors that are either biologically or statistically related to the outcome.