Psychology 820 Correlation Regression & Prediction
Concept of Correlation A coefficient of correlation (r or ρ ‘rho’) is a statistical summary of the degree and direction of relationship or association between two variables (X and Y) Degree of Relationship Correlations range from 0 to 1.00 Direction of Relationship Positive (+) relationship: High score on X goes with a High score on Y Negative (-) relationship: High score on X goes with Low score on Y
The Bivariate Normal Distribution A family of three dimensional surfaces
Scatterplots The chief purpose of the scatterplot is for the study of the nature of the relationship between two variables. Components of r Pearson Product Moment Correlation
Additional Measures of Relationships Spearman Rank Correlation Both X and Y are ranks Phi Coefficient Both X and Y are dichotomies Point-Biserial Coefficient One dichotomous variable and one continuous measure Biserial Correlation One artificial dichotomy and one continuous measure Tetrachoric Coefficient Both X and Y are artificial dichotomies
Linear and Curvilinear Relationships Only the degree of linear relationship is described by r or ρ If there is a substantial nonlinear relationship between two variables, a different correlation coefficient (such as eta η) should be used
Linear Transformations and Correlation Any transformation of X or Y that is linear does not affect the correlation coefficient This includes transformations to z- scores, T-scores, addition of a constant to all values, subtracting multiplying or dividing by non-zero constants
Effects of Variability on Correlation The variability (heterogeneity) of the sample has an important influence on r Range restriction
Causation and Correlation Correlation must be carefully distinguished from causation. Third Variable Factor Effect of Outliers
Regression and Prediction Prediction and correlation are opposite sides of the same coin Regression is usually the statistical method of choice when the predicted variable is an ordinal, interval, or ratio scale. Simple linear regression (1 IV & 1 DV) extends to multiple regression (more than 1 IV)
The Regression Effect The sons of tall fathers tend to be taller than average, but shorter than their fathers. The sons of short fathers tend to be shorter than average, but taller than their fathers. Regression to the Mean
Regression Equation Y = b X + c ( the equation of a straight line) Line of best fit Line of least-squares Prediction equation
Proportion of Variance Interpretation of Correlation The coefficient of determination (r 2 ) is the proportion of variance in Y that can be accounted for by knowing X and, conversely, the proportion of variance in X that can be accounted for by knowing Y. The coefficient of nondetermination (k 2 ) is the proportion of variance “not accounted for”
Homoscedasticity This assumption means that the variance around the regression line is the same for all values of the predictor variable (X). The plot on the right shows a violation of this assumption. For the lower values on the X-axis, the points are all very near the regression line. For the higher values on the X-axis, there is much more variability around the regression line. varianceregression line In a bivariate normal distribution the variance of scores on Y will be the same for all values of X (equal variance of Y scores for each value of X) is known as homoscedasticity.
Part Correlation It is the correlation of X 1 (IQ) with X 2 (achievement posttest) after the portion of the posttest that can be predicted from the pretest has been removed.
Partial Correlation Simple extension of part correlation The correlation of X 1 and X 2 with X 3 “held constant”, removed, or partialed out is a partial correlation.
Multiple Regression Multiple regression is the statistical method most commonly employed for predicting Y from two or more independent variables.
Multiple Correlation The correlation between Y and Y predicted when the prediction is based on two or more independent variables is termed multiple correlation