CORRELATION ANALYSIS
Let X and Y are random variables and the correlation problem can be modeled by 𝐸(𝑌 𝑖 ) = 𝛼+𝛽 E(𝑥 𝑖 ) From an experimental point of view this means that we are observing random vector (X, Y ) drawn from some bivariate population.
Recall that if (X, Y ) is a bivariate random variable then the correlation coefficient 𝜌 is defined as 𝜌= 𝐸 (𝑋− 𝜇 𝑥 ) 𝑌− 𝜇 𝑦 𝐸 (𝑋− 𝜇 𝑥 ) 2 𝐸 (𝑌− 𝜇 𝑦 ) 2 where μX and μY are the mean of the random variables X and Y , respectively.
Definition 19. 1. If (X1, Y1), (X2, Y2), Definition 19.1. If (X1, Y1), (X2, Y2), ..., (Xn, Yn) is a random sample from a bivariate population, then the sample correlation coefficient is defined as 𝑅= 𝑖=1 𝑛 ( 𝑋 𝑖 − 𝑋 )( 𝑌 𝑖 − 𝑌 ) 𝑖=1 𝑛 ( 𝑋 𝑖 − 𝑋 ) 2 𝑖=1 𝑛 ( 𝑌 𝑖 − 𝑌 ) 2 The corresponding quantity computed from data (x1, y1), (x2, y2), ..., (xn, yn) will be denoted by r and it is an estimate of the correlation coefficient 𝜌.
Theorem 19.7. The sample correlation coefficient r satisfies the inequality −1≤ r ≤1. The sample correlation coefficient r = ±1 if and only if the set of points {(x1, y1), (x2, y2), ..., (xn, yn)} for n ≥3 are collinear. Hence to test the null hypothesis Ho : 𝜌 = 0 against Ha : 𝜌 ≠0, at significance level𝛼, is “Reject Ho : 𝜌 = 0 if |t| ≥ 𝑡 𝛼 2 (n − 2), ,where t = 𝑛−2 𝑟 1− 𝑟 2
Example 19.11. The following data were obtained in a study of the relationship between the weight and chest size of infants at birth: Determine the sample correlation coefficient r and then test the null hypothesis Ho : 𝜌 = 0 against the alternative hypothesis Ha : 𝜌 ≠0 at a significance level 0.01 Answer: From the above data, we have x 2.76 2.17 5.53 4.31 2.30 3.70 y 29.5 26.3 36.6 27.8 28.3 28.6
𝑆 𝑥𝑥 = 8.565 𝑆 𝑥𝑦 = 18.557 𝑆 𝑦𝑦 = 65.788 Hence 𝑟= 𝑆 𝑥𝑦 𝑆 𝑥𝑥 𝑆 𝑦𝑦 𝑟= 18.557 (8.565)(65.788) =0.782. The computed t value is give by t = 𝑛−2 𝑟 1− 𝑟 2 == 6−2 0.782 1− (0.782) 2 =2.509. Since 2.509 = |t| < 𝑡 0.005 (4) = 4.604 we do not reject the null hypothesis Ho : 𝜌 = 0.