Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs. 3. Quantitative variables: covariance, Pearson correlation coefficient, variance-covariance matrix and correlation matrix. 4. Semiquantitative variables: Spearman coefficient. 5. Qualitative variables: Indices Chi Square and Cramer's V. 6. Association between variables of different scales. 7. Concept of nonlinear relationships.
Introduction So far we have focused on measures of central tendency, variability, skewness and kurtosis of a single variable. However, in practice it is common to examine two or more variables together (e.g., relationship between performance and intelligence, etc.) Here we will focus on the relationship between two variables (from n paired observations) and calculate (in particular) an index that will give us the degree of relationship between the two variables: the coefficient of linear correlation (Pearson)
Graphical representation performance performance performance IQ IQ IQ Negative linear relation No relation Positive Linear relation Note: The Pearson correlation coefficient measures linear correlation.
Graphical representation performance performance IQ IQ Non linear relation Linear relation Note: The Pearson correlation coefficient measures linear correlation..
Graphical representation performance performance performance IQ IQ IQ Perfect linear relation Strong linear relation Weak linear relation Now we need an index that we report the extent to which both X and Y are related, and if the relationship is positive or negative
Covariance and Pearson’s index when the linear relationship is positive: When X is above its mean, Y is typically above its mean rendimiento Scenario 1 inteligencia when the linear relationship is negative: When X is above its mean, Y is typically below its mean rendimiento Scenario 2 inteligencia
Covariance Here's the formula: In case 1, the covariance will be positive, and in case 2, the covariance will be negative. Therefore the covariance gives us an idea of whether the relationship between X and Y is positive or negative. Problem: the covariance is not a bounded index (e.g., how to interpret a covariance of 6 in terms of the degree of association?), and does not account for the variability of the variables. So we use another index
Pearson coefficient The Pearson correlation coefficient: :
Properties of Pearson’s r Property 1. The Pearson correlation index is between -1 and +1. A Pearson correlation index of -1 indicates a perfect negative linear relationship An index of Pearson correlation of +1 indicates a perfect positive linear relationship. A Pearson correlation index of 0 indicates no linear relationship. (Notice that a value close to 0 the index does not imply that there is some kind of non-linear relationship: the Pearson index only measures linear relationship.)
Properties of Pearson’s r Property 2. The Pearson correlation index (in absolute value) does not change when we make a linear transformation on the variables. For example, the Pearson correlation between the temperature (in degrees Celsius) and the level of depression is the same as the correlation between the temperature (measured in degrees Fahrenheit) and the level of depression.
More on Pearson’s r Interpretation We have to consider what we are measuring to interpret how the strength of the relationship between the variables under study. In any case, it is very important to draw an scatterplot. For example, in the case of the left, it is clear that there is no relationship between intelligence and performance. However, if we calculate the Pearson correlation index will give a very high value, caused by the atypical score in the top right corner. performance IQ
More on Pearson’s r Interpretation (2) It is important to note that "correlation does not imply causation". The fact that two variables are highly correlation does not imply that X causes Y or that Y causes X.
More on Pearson’s r Interpretation (3) It is important to note that the Pearson correlation coefficient may be affected by third variables. For example, if we were to a school and measured height and had a test of verbal ability, the higher will also have more verbal ability ... of course, that may be simply because in the older children age will be taller than the younger children. If this "third“ variable is controlled (by "partial correlation”), there will hardly be a relationship between height and important numerical ability. There are many cases where the third variable is the cause of a high relationship between X and Y (and it is often difficult to identify) 14 a Habilidad numérica 12 a 10 a 8 a 6 años Estatura
More on Pearson’s r Interpretation (3) The Pearson coefficient value depends in part on the variability of the group. If we make the Pearson coefficient between intelligence and performance with all subjects, the Pearson coefficient value is quite high. However, if we use only the individuals with IC low (or high CI) and calculate the correlation with framerate, the Pearson coefficient value will be significantly lower. Performance A heterogeneous group would give a greater degree of relationship between variables than a homogeneous group. Low IQ High IQ IQ
5.4 Other coefficients Of course, it is possible to obtain measurements of the degree of relatedness of variables when they are not quantitative. The case in which the variables X and Y are ordinals Remember, when we have variables with ordinal scale, we can establish order between the values, but do not know the distances between values. (If we knew the distance between the values we would be at least an interval scale) We can calculate the correlation coefficient Spearman correlation coefficient or Kendall. (We will see the first one.)
Spearman's rank correlation coefficient What we have is 2 sequences of ordinal values. Spearman coefficient is a special case of the Pearson correlation coefficient. is the difference between the ordinal value X and the ordinal value of the subject Y i
Spearman's rank correlation coefficient (properties) First. It is bounded, as the Pearson coefficient, between -1 and +1. A Ppearman coefficient of +1 means that which is first to X is first to Y, which is the second in X is the second in Y, etc. Spearman coefficient of -1 means that which is first in X is the last in Y, etc… Second. Its calculation is simple (more than the Pearson correlation coefficient). However, with computers this is irrelevant these days ...
5.5 Qualitative Variables c2 test as a measure of association The chi-square test is a nonparametric test that is used to measure the association between two variables when we have contingency tables. It is also used, generally, to assess the divergence between observed scores (empirical) and a predicted scores (theoretical). Generally, the chi-square statistic is obtained as follows: fe are the empirical frequencies and ft represents the theoretical frequencies
c2 test as a measure of association: The case of 2 qualitative variables The empirical frequencies are those that have in the contingency table. Now, how do you compute the theoretical frequencies? This process is simple: If both variables are independent, the theoretical frequency of each cell will be the result of multiplying the sum frequency of the row by the sum of the fequencies of the column, and the result is divided by N To calculate "chi-square" with crosstabs on the Internet:http://faculty.vassar.edu/lowry/newcs.html
c2 as a test as a measure of association c2 as a test as a measure of association. derived coefficients and interpretation From the chi-square test, there are a number of measures of association between variables. They quantify the strength of the relationship between two variables. Case of 2x2 tables: phi coefficient This index is interpreted analogously to the Pearson coefficient
c2 test as a measure of association: Other coefficients If we have more than 2 rows or columns: Cramer’s index m is the smallest number among the number of rows-1 and columns-1 This index is interpreted similarly to Pearson’s r (except for the issue of the sign;; V is always positive). Note that if the table is 2x2 this index matches the “phi” index (see the previous slide)