Data analysis is largely a search for patterns – that is, for meaningful relations among various items observed - K. Godfrey
Objectives of the study The starting point of any investigation must be to define clearly its objectives, since these will determine the appropriate study design and the type of data needed. Objectives might be categorized into one of the three main types as below. Estimation of certain features of a population for e.g., what percentage of children has received a full course of vaccination by their birthday? Investigation of the relationship or association between a factor of interest and a particular outcome For eg. (i) Is age related to blood pressure during adulthood? (ii) are respiratory infections more common among children whose parents smoke? Evaluation of a drug or therapy or of an intervention aimed at reducing the incidence (or severity) of disease fro e.g., Does the administration of BCG reduce the risk of tuberculosis?
Many biomedical studies are designed to explore relationship between two variables and specifically to determine whether these variables and specifically to determine whether these variables are independent or dependent. Examples: 1. Obesity and Blood pressure: Are obesity and Blood Pressure independent or Do overweight men tend to have high blood pressure?
Purposes of studying relationship between two continuous variables: To assess whether two variables are associated. i.e. if the values of one variable tend to be higher (or lower) for higher values of other variable To enable value of one variable to be predicted from any known value of other variable The method of assessing the association between two continuous variables is known as Correlation and of prediction of one continuous variable from another is by Linear Regression
What is the difference between association and correlation? Customarily association and correlation are roughly means the same, but in statistical sense Association is the term used for assessing relationship between categorical variable. Correlation is the term used for assessing relationship between continuous variables.
Correlation analysis describes linear association between two continuous variables that vary over their respective range of values (i.e. when two variables are considered to be related when a change in one is likely to be accompanied by a change in other). Simple way to depict correlation is by means of graphical representation called “scatter plot”.
More linear and diagonal pattern stronger the relationship BPBP BW o o oo o o o o o
Example Positive correlation: Gestation age (GA) against birth weight of baby (BW) Negative correlation: in relation to prognosis of myocardial infarction (MI) relationship between left ventricular ejection fraction and QRS score traced from ECG Also the correlation can be quantified by means of various indices
r = 0 r = 0.7 r = 1 r = Perfectly + ve Perfectly - ve
Different indices of Correlation Coefficient Pearson’s product moment correlation coefficient (r) - used to assess two normally distributed continuous variables Spearman’s rank correlation coefficient (p) (rho) – used to assess two continuous variables and at least one of which is not normally distributed Kendall’s Rank Correlation Coefficient (t) (tan)– used to assess agreement between two continuous variable which, are measured on the same scale. Intra class correlation coefficient (ICC) – used to assess agreement between two continuous variable which, are measured on the same scale.
How correlation coefficient is reported or interpreted? Correlation Coefficient always ranges between -1 to +1 that measures the strength and direction of relationship. Correlation of 1 refers to 100% positive relation and -1 refers to negative correlation. Correlation value of zero means no linear relation.
Calculation of Pearson’s product moment correlation coefficient ® r = Cov(X,Y) = 1 SD x * SD Y n∑ (X-X) (y-y) √1/n ∑ (x-x) 2 *√1/n∑(y-y) 2 Where Cov (X, Y) – Covariance of X and Y SDx, SDy – Standard deviation of x and y respectively.
Example: Body weight (BW) and systolic blood pressure (BP) in 10 newborn babies. Find is there any association. BW BP
Scatter plot BPBP BW o o oo o o o o o
Calculation of Correlation Coefficient Sr. No 7 (1) BW8 (2) BP (3) X-mean ((1)-2.69)) (4) Y-Mean ((2)) – 76.9) (3)* (4) X-Mean * (Y-Mean) (3) 2 (X – Mean) 2 (4) 2 (Y-Mean) Tot Mean Var ∑x= ∑y= SD
r = Cov (X, Y)/SD x * SD Y = /.684*16.92 = 0.92
95% Confidence Interval of Correlation Coefficient (r) Z = 1 1n (1+r) SE (Z) = 1 2 (1-r) n-3 Z1 = Z-1.96 * SE (Z), Z2 = Z * SE (Z) 95% CI for (r) = e 2Z1 – 1 /e 2Z1 +1 to e 2Z 2 - 1/e 2Z 2 + 1
Misuses of Correlation Coefficient Concluding no correlation from zero correlation while in fact a strong linear relationship may exist. Unwarranted conclusion from spurious correlation Concluding cause and effect relationship while just be an indirect relationship Concluding an agreement between pairs of measurement while they may not have same values at all points