Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.

Similar presentations


Presentation on theme: "Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1."— Presentation transcript:

1 Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1

2 2 Revisiting the Food Additives Study

3 Unadjusted Adjusted What does “adjusted” mean? How is it done? From Table 3

4 Goal One of Session 5 Earlier: Compare means for a single measure among groups. Use t-test, ANOVA. Session 5: Relate two or more measures. Use correlation or regression. Qu et al(2005), JCEM 90:1563-1569. Δ ΔY/ΔX

5 Goal Two of Session 5 Try to isolate the effects of different characteristics on an outcome. Previous slide: Gender BMI GH Peak

6 6 Correlation  Standard English word correlate to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship between to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship betweencorrelation  In statistics, it has a more precise meaning

7 7 Correlation in Statistics  Correlation: measure of the strength of LINEAR association  Positive correlation: two variables move to the same direction  As one variable increase, other variables also tends to increase LINEARLY or vice versa. Example: Weight vs Height Example: Weight vs Height  Negative correlation: two variables move opposite of each other.  As one variable increases, the other variable tends to decrease LINEARLY or vice versa (inverse relationship). Example: Physical Activity level vs. Abdominal height Example: Physical Activity level vs. Abdominal height (Visceral Fat) (Visceral Fat)

8 8 Pearson r correlation coefficient r can be any value from -1 to +1  r = -1 indicates a perfect negative LINEAR relationship between the two variables  r = 1 indicates a perfect positive LINEAR relationship between the two variables  r = 0 indicates that there is no LINEAR relationship between the two variables

9 9 Scatter Plot: r= 0

10 Correlations in real data Correlations in real data

11 Logic for Value of Correlation Σ (X-X mean ) (Y-Y mean ) √Σ(X-X mean ) 2 Σ(Y-Y mean ) 2 Pearson’s r = + +- - Statistical software gives r.

12 Simple Linear Regression (SLR)  X and Y now assume unique roles: Y is an outcome, response, output, dependent variable. X is an input, predictor, explanatory, independent variable.  Regression analysis is used to: Measure more than X-Y association, as with correlation. Fit a straight line through the scatter plot, for: Prediction of Ymean from X. Estimation of Δ in Y mean for a unit change in X = Rate of change of Y mean as a unit change in X (slope = regression coefficient  measure “effect” of X on Y).

13 SLR Example eiei Minimizes Σe i 2 : Least Square Method Range for Individuals Range for mean Statistical software gives all this info. Range for Individuals Range for individuals

14 Example Software Output The regression equation is: Y mean = 81.6 + 2.16 X Predictor Coeff StdErr T P Constant 81.64 11.47 7.12 <0.0001 X 2.1557 0.1122 19.21 <0.0001 S = 21.72 R-Sq = 79.0% Predicted Values: X: 100 Fit: 297.21 SE(Fit): 2.17 95% CI: 292.89 - 301.52 95% PI: 253.89 - 340.52 Predicted y = 81.6 + 2.16(100) Range of Ys with 95% assurance for: Mean of all subjects with x=100. Individual with x=100. 19.21=2.16/0.112 should be between ~ -2 and 2 if “true” slope=0. Refers to Intercept

15 Multiple Regression We now generalize to prediction from multiple characteristics. The next slide gives a geometric view of prediction from two factors simultaneously.

16 Multiple Lienar Regression : Geometric View LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B 12 ). LHCY = b 0 + b 1 LCLC + b 2 LB12 is the equation of the plane Suppose multiple predictors are continuous. Geometrically, this is fitting a slanted plane to a cloud of points: www.StatisticalPractice.com

17 Multiple Regression: Software

18 Output: Values of b 0, b 1, and b 2 for LHCY mean = b 0 + b 1 LCLC + b 2 LB12

19 How Are Coefficients Interpreted? LHCY mean = b 0 + b 1 LCLC + b 2 LB12 Outcome Predictors LHCY LCLC LB12 LB12 may have both an independent and an indirect (via LCLC) association with LHCY Correlation b 1 ? b 2 ?

20 Coefficients: Meaning of their Values LHCY = b 0 + b 1 LCLC + b 2 LB12 Outcome Predictors Mean LHCY increases by b 2 for a 1-unit increase in LB12 … if other factors (LCLC) remain constant, or … adjusting for other factors in the model (LCLC) May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.

21 * * for age, gender, and BMI. Figure 2. Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.

22 Another Example: HDL Cholesterol Std Coefficient Error t Pr > |t| Intercept 1.16448 0.28804 4.04 <.0001 AGE -0.00092 0.00125 -0.74 0.4602 BMI -0.01205 0.00295 -4.08 <.0001 BLC 0.05055 0.02215 2.28 0.0239 PRSSY -0.00041 0.00044 -0.95 0.3436 DIAST 0.00255 0.00103 2.47 0.0147 GLUM -0.00046 0.00018 -2.50 0.0135 SKINF 0.00147 0.00183 0.81 0.4221 LCHOL 0.31109 0.10936 2.84 0.0051 The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is: Log(HDL) mean = 1.16 - 0.00092(Age) +…+ 0.311(LCHOL) www. Statistical Practice.com Output:

23 HDL Example: Coefficients Interpretation of coefficients on previous slide: 1.Need to use entire equation for making predictions. 2.Each coefficient measures the difference in mean LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is 0.012 lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors. Continued …

24 HDL Example: Coefficients Interpretation of coefficients two slides back: 3.P-values measure how strong the association of a factor with Log(HDL) is, if other factors do not change. This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and is called independent association. SKINF probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.

25

26 Special Cases of Multiple Regression So far, our predictors were all measured over a continuum, like age or concentration. This is simply called multiple regression. When some predictors are grouping factors like gender or ethnicity, regression has other special names: ANOVA Analysis of Covariance


Download ppt "Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1."

Similar presentations


Ads by Google