Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

Regression and correlation methods
Lesson 10: Linear Regression and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Correlation and Linear Regression.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression and Correlation
Quantitative Techniques
Overview Correlation Regression -Definition
Understanding the General Linear Model
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Statistics for Business and Economics
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
REGRESSION AND CORRELATION
MARE 250 Dr. Jason Turner Correlation & Linear Regression.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Correlation & Regression Math 137 Fresno State Burger.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
T-tests and ANOVA Statistical analysis of group differences.
Lecture 3-2 Summarizing Relationships among variables ©
Regression and Correlation Methods Judy Zhong Ph.D.
Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 5: Methods for Assessing Associations.
Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician
Simple Linear Regression. Correlation Correlation (  ) measures the strength of the linear relationship between two sets of data (X,Y). The value for.
© The McGraw-Hill Companies, Inc., 2000 Business and Finance College Principles of Statistics Lecture 10 aaed EL Rabai week
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Associations and Confounding.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 5: Methods for Assessing Associations.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Linear correlation and linear regression + summary of tests
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Regression Lesson 11. The General Linear Model n Relationship b/n predictor & outcome variables form straight line l Correlation, regression, t-tests,
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Lecture 10: Correlation and Regression Model.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
SIMPLE LINEAR REGRESSION AND CORRELLATION
…. a linear regression coefficient indicates the impact of each independent variable on the outcome in the context of (or “adjusting for”) all other variables.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Correlation & Regression
Multiple Regression.
Correlation and Regression Basics
Correlation and Regression Basics
Linear Regression and Correlation
Linear Regression and Correlation
Correlation & Regression
Chapter Thirteen McGraw-Hill/Irwin
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1

2 Revisiting the Food Additives Study

Unadjusted Adjusted What does “adjusted” mean? How is it done? From Table 3

Goal One of Session 5 Earlier: Compare means for a single measure among groups. Use t-test, ANOVA. Session 5: Relate two or more measures. Use correlation or regression. Qu et al(2005), JCEM 90: Δ ΔY/ΔX

Goal Two of Session 5 Try to isolate the effects of different characteristics on an outcome. Previous slide: Gender BMI GH Peak

6 Correlation  Standard English word correlate to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship between to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship betweencorrelation  In statistics, it has a more precise meaning

7 Correlation in Statistics  Correlation: measure of the strength of LINEAR association  Positive correlation: two variables move to the same direction  As one variable increase, other variables also tends to increase LINEARLY or vice versa. Example: Weight vs Height Example: Weight vs Height  Negative correlation: two variables move opposite of each other.  As one variable increases, the other variable tends to decrease LINEARLY or vice versa (inverse relationship). Example: Physical Activity level vs. Abdominal height Example: Physical Activity level vs. Abdominal height (Visceral Fat) (Visceral Fat)

8 Pearson r correlation coefficient r can be any value from -1 to +1  r = -1 indicates a perfect negative LINEAR relationship between the two variables  r = 1 indicates a perfect positive LINEAR relationship between the two variables  r = 0 indicates that there is no LINEAR relationship between the two variables

9 Scatter Plot: r= 0

Correlations in real data Correlations in real data

Logic for Value of Correlation Σ (X-X mean ) (Y-Y mean ) √Σ(X-X mean ) 2 Σ(Y-Y mean ) 2 Pearson’s r = Statistical software gives r.

Simple Linear Regression (SLR)  X and Y now assume unique roles: Y is an outcome, response, output, dependent variable. X is an input, predictor, explanatory, independent variable.  Regression analysis is used to: Measure more than X-Y association, as with correlation. Fit a straight line through the scatter plot, for: Prediction of Ymean from X. Estimation of Δ in Y mean for a unit change in X = Rate of change of Y mean as a unit change in X (slope = regression coefficient  measure “effect” of X on Y).

SLR Example eiei Minimizes Σe i 2 : Least Square Method Range for Individuals Range for mean Statistical software gives all this info. Range for Individuals Range for individuals

Example Software Output The regression equation is: Y mean = X Predictor Coeff StdErr T P Constant < X < S = R-Sq = 79.0% Predicted Values: X: 100 Fit: SE(Fit): % CI: % PI: Predicted y = (100) Range of Ys with 95% assurance for: Mean of all subjects with x=100. Individual with x= =2.16/0.112 should be between ~ -2 and 2 if “true” slope=0. Refers to Intercept

Multiple Regression We now generalize to prediction from multiple characteristics. The next slide gives a geometric view of prediction from two factors simultaneously.

Multiple Lienar Regression : Geometric View LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B 12 ). LHCY = b 0 + b 1 LCLC + b 2 LB12 is the equation of the plane Suppose multiple predictors are continuous. Geometrically, this is fitting a slanted plane to a cloud of points:

Multiple Regression: Software

Output: Values of b 0, b 1, and b 2 for LHCY mean = b 0 + b 1 LCLC + b 2 LB12

How Are Coefficients Interpreted? LHCY mean = b 0 + b 1 LCLC + b 2 LB12 Outcome Predictors LHCY LCLC LB12 LB12 may have both an independent and an indirect (via LCLC) association with LHCY Correlation b 1 ? b 2 ?

Coefficients: Meaning of their Values LHCY = b 0 + b 1 LCLC + b 2 LB12 Outcome Predictors Mean LHCY increases by b 2 for a 1-unit increase in LB12 … if other factors (LCLC) remain constant, or … adjusting for other factors in the model (LCLC) May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.

* * for age, gender, and BMI. Figure 2. Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.

Another Example: HDL Cholesterol Std Coefficient Error t Pr > |t| Intercept <.0001 AGE BMI <.0001 BLC PRSSY DIAST GLUM SKINF LCHOL The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is: Log(HDL) mean = (Age) +… (LCHOL) www. Statistical Practice.com Output:

HDL Example: Coefficients Interpretation of coefficients on previous slide: 1.Need to use entire equation for making predictions. 2.Each coefficient measures the difference in mean LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors. Continued …

HDL Example: Coefficients Interpretation of coefficients two slides back: 3.P-values measure how strong the association of a factor with Log(HDL) is, if other factors do not change. This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and is called independent association. SKINF probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.

Special Cases of Multiple Regression So far, our predictors were all measured over a continuum, like age or concentration. This is simply called multiple regression. When some predictors are grouping factors like gender or ethnicity, regression has other special names: ANOVA Analysis of Covariance