Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Lesson 10: Linear Regression and Correlation
Correlation and regression
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Correlation and Linear Regression.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Quantitative Techniques
Overview Correlation Regression -Definition
Understanding the General Linear Model
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
The Simple Regression Model
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Relationships Among Variables
Correlation & Regression Math 137 Fresno State Burger.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Understanding Research Results
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Correlation and Linear Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Simple Linear Regression
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 5: Methods for Assessing Associations.
CORRELATION & REGRESSION
Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Associations and Confounding.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Methods for Assessing Associations.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
SIMPLE LINEAR REGRESSION AND CORRELLATION
Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.
…. a linear regression coefficient indicates the impact of each independent variable on the outcome in the context of (or “adjusting for”) all other variables.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Regression and Correlation
Correlation & Regression
6-1 Introduction To Empirical Models
Chapter Thirteen McGraw-Hill/Irwin
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1

Revisiting the Food Additives Study Unadjusted Adjusted What does “adjusted” mean? How is it done? From Table 3

Goal One of Session 5 Earlier: Compare means for a single measure among groups. Use t-test, ANOVA. Session 5: Relate two or more measures. Use correlation or regression. Qu et al(2005), JCEM 90: Δ ΔY/ΔX

Goal Two of Session 5 Try to isolate the effects of different characteristics on an outcome. Previous slide: Gender BMI GH Peak

5 Correlation  Standard English word correlate to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship between to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship betweencorrelation  In statistics, it has a more precise meaning

6 Correlation in Statistics  Correlation: measure of the strength of LINEAR association  Positive correlation: two variables move to the same direction  As one variable increase, other variables also tends to increase LINEARLY or vice versa. Example: Weight vs Height Example: Weight vs Height  Negative correlation: two variables move opposite of each other.  As one variable increases, the other variable tends to decrease LINEARLY or vice versa (inverse relationship). Example: Physical Activity level vs. Abdominal height Example: Physical Activity level vs. Abdominal height (Visceral Fat) (Visceral Fat)

7 Pearson r correlation coefficient r can be any value from -1 to +1  r = -1 indicates a perfect negative LINEAR relationship between the two variables  r = 1 indicates a perfect positive LINEAR relationship between the two variables  r = 0 indicates that there is no LINEAR relationship between the two variables

8 Scatter Plot: r= 1.0

9 Scatter Plot: r= -1.0

10 Scatter Plot: r= 0

Anemic women: Anemia.sav n=20 Hb(g/dl)PCV(%) …… r expresses how well the data fits in a straight line. Here, Pearson’s r =0.673

Correlations in real data Correlations in real data

Logic for Value of Correlation Σ (X-X mean ) (Y-Y mean ) √Σ(X-X mean ) 2 Σ(Y-Y mean ) 2 Pearson’s r = Statistical software gives r.

Correlation Depends on Ranges of X & Y Graph B contains only the graph A points in the ellipse. Correlation is reduced in graph B. Thus: correlations for the same quantities X and Y may be quite different in different study populations. BA

Simple Linear Regression (SLR)  X and Y now assume unique roles: Y is an outcome, response, output, dependent variable. X is an input, predictor, explanatory, independent variable.  Regression analysis is used to: Measure more than X-Y association, as with correlation. Fit a straight line through the scatter plot, for: Prediction of Ymean from X. Estimation of Δ in Y mean for a unit change in X = Rate of change of Y mean as a unit change in X (slope = regression coefficient  measure “effect” of X on Y).

SLR Example eiei Minimizes Σe i 2 Range for Individuals Range for mean Statistical software gives all this info. Range for Individuals Range for individuals

Hypothesis testing for the true slope=0 H0: true slope = 0 vs. Ha: true slope ≠0, with the rule: Claim association (slope≠0) if t c =|slope/SE(slope)| > t ≈ 2. There is a 5% chance of claiming an X-Y association that really does not exist. Note similarity to t-test for means: t c =|mean/ SE(mean)| Formula for SE(slope) is in statistics books.

Example Software Output The regression equation is: Y mean = X Predictor Coeff StdErr T P Constant < X < S = R-Sq = 79.0% Predicted Values: X: 100 Fit: SE(Fit): % CI: % PI: Predicted y = (100) Range of Ys with 95% assurance for: Mean of all subjects with x=100. Individual with x= =2.16/0.112 should be between ~ -2 and 2 if “true” slope=0. Refers to Intercept

Multiple Regression We now generalize to prediction from multiple characteristics. The next slide gives a geometric view of prediction from two factors simultaneously.

Multiple Lienar Regression : Geometric View LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B 12 ). LHCY = b 0 + b 1 LCLC + b 2 LB12 is the equation of the plane Suppose multiple predictors are continuous. Geometrically, this is fitting a slanted plane to a cloud of points:

Multiple Regression: Software

Output: Values of b 0, b 1, and b 2 for LHCY mean = b 0 + b 1 LCLC + b 2 LB12

How Are Coefficients Interpreted? LHCY mean = b 0 + b 1 LCLC + b 2 LB12 Outcome Predictors LHCY LCLC LB12 LB12 may have both an independent and an indirect (via LCLC) association with LHCY Correlation b 1 ? b 2 ?

Coefficients: Meaning of their Values LHCY = b 0 + b 1 LCLC + b 2 LB12 Outcome Predictors Mean LHCY increases by b 2 for a 1-unit increase in LB12 … if other factors (LCLC) remain constant, or … adjusting for other factors in the model (LCLC) May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.

* * for age, gender, and BMI. Figure 2. Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.

Another Example: HDL Cholesterol Std Coefficient Error t Pr > |t| Intercept <.0001 AGE BMI <.0001 BLC PRSSY DIAST GLUM SKINF LCHOL The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is: Log(HDL) mean = (Age) +… (LCHOL) www. Statistical Practice.com Output:

HDL Example: Coefficients Interpretation of coefficients on previous slide: 1.Need to use entire equation for making predictions. 2.Each coefficient measures the difference in mean LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors. Continued …

HDL Example: Coefficients Interpretation of coefficients two slides back: 3.P-values measure how strong the association of a factor with Log(HDL) is, if other factors do not change. This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and is called independent association. SKINF probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.

Special Cases of Multiple Regression So far, our predictors were all measured over a continuum, like age or concentration. This is simply called multiple regression. When some predictors are grouping factors like gender or ethnicity, regression has other special names: ANOVA Analysis of Covariance

Analysis of Variance All predictors are grouping factors. One-way ANOVA: Only 1 predictor that may have only 2 “levels”, such as gender, or more levels, such as ethnicity. Two-way ANOVA: Two grouping predictors, such as decade of age and genotype.

Two way ANOVA Interaction in 2-way ANOVA: Measures whether the effect of one factor depends on the other factor. Difference of a difference in outcome. E.g., (Trt.-– control) Female – (Trt. – control) Male The effect of treatment, adjusted for gender, is a weighted average of group differences over two gender group, i.e., of : (Trt.– control) Female and (Trt. – control) Male

Analysis of Covariance At least one primary predictor is a grouping factor, such as treatment group, and at least one predictor is continuous, such as age, called a “covariate”. Interest is often on comparing the groups. The covariate is often a nuisance. Confounder: A covariate that both co-varies with the outcome and is distributed differently in the groups.