Correlation and Regression Q560: Experimental Methods in Cognitive Science Lecture 13
Correlation and Regression Correlation and Regression are related techniques that differ depending on the variable type Fixed Variable: The values of the variable are determined by the experimenter. A replication of the experiment would produce same values Random Variable: The values of the variable are are beyond the experimenter’s control. We don’t know what the values will be until we collect the data E.g.: Running speed (Y) and number of trials to reach criterion (Y), or number of food pellets of reinforcement (X)
Correlation and Regression Technically, regression involves predicting a random variable (Y) using a fixed variable (X). In this situation, no sampling error is involved in X, and repeated replications will involve the same values for X (This allows for prediction) Correlation describes the situation in which both X and Y are random variables. In this case, the values for X and Y vary from one replication to another and thus sampling error is involved in both variables. Unfortunately, this distinction is rarely followed in practice…
PersonXY A11 B1 3 C3 2 D4 5 E6 4 F7 5 A B C D E F
Correlation measures the direction and degree of of the relationship between X and Y 1.Direction: positive (+) or negative (-). Examples: Correlation of beer sales and temperature (positive) Correlation of coffee sales and temperature (negative)
Pearson Correlation Coefficient Most commonly used correlation: The Pearson correlation measures the degree and direction of a linear relationship between variables. r = degree to which X and Y vary together degree to which X and Y vary separately r = covariability of X and Y variability of X and Y separately
Pearson Correlation Previously: SS (“sum of squared deviation”) was our measure of variability. Now: SP (“sum of squared products”) is our measure of covariability.
Pearson Correlation Calculation of the Pearson correlation:
Pearson Correlation An example: XY
Pearson Correlation An example: XY X 2 Y 2 XY
A) B) C) D)
Some Issues with Correlation Nonlinearity: The data may be consistently related, but not in a linear fashion Outliers: Correlation is particularly susceptible to a few extreme scores--always look at the plot!
Computing correlations with SPSS using semantic associativity metrics
Regression
What does this line accomplish: Relationship between SAT and GPA is “easier to see”. Line = “central tendency” of the relationship = simplified description of the relationship. Line can be used for prediction. X = predictor variable; Y = predicted/criterion Statistical technique for finding the best-fitting straight line for a set of data is called regression
Regression Linear equation: b = slope a = y-intercept
Regression Regression equation: b = SP SS X a = M Y - bM X
Regression Predicting performance on a statistics test given number of hours studied: Predicted Score on test = weight Mean score on Test with no studying + x Number of hours You have studied
Regression Predicting performance on a statistics test given number of hours studied: Predicted Score on test = weight Mean score on Test with no studying + x Number of hours You have studied
Regression Predicting performance on a statistics test given amount of stress: Predicted Score on test = weight Mean score on Test with no stress + x Amount of Stress
Regression Predicting performance on a statistics test given amount of stress and hours studied: Predicted Score on test = b1 Mean score on Test with no stress Or studying + + b2
Example
B)
C)
Standard Error of the Estimate How “good” is this prediction? (How close are the actual Y-values to the regression equation?) The standard error of estimate gives a measure of the distance between a regression line and the actual data points.
More error Less error
Regression Standard error of estimate is very similar to the standard deviation. SS error = (Y-Y) 2 ^ Standard error of estimate:
XY ABCDEABCDE SS error = (Y-Y) 2 ^ 38.67
Linear regression with SPSS using large-scale word recognition databases See regression crib sheet online for interpreting output