Psych 230 Psychological Measurement and Statistics Pedro Wolf September 30, 2009
Homework Question 2 a.
Homework 2 b. = c. r 2 =.93. Ninety three percent of the variance in test y can be predicted in test x. 2d. Yes the large correlation indicates that one can predict scores on test Y with scores on test X.
Homework 6 a – This statement implies a cause and effect relation. – The correlation by itself does not imply this. – It may be that people with more education have more access to mental healthcare and therefore have the opportunity to use it.
Homework 6b. – Again this statement implies a cause and effect relation. – There could be other factors. Children who play an instrument may have parents who stress the value of practice and education 6c – This statement is false. – A negative correlation means there is a relationship
Homework 8 a. Test 4, r=.633: r 2 =.4007 b x 2= √.8014; r=.8952 c. They ask the same type of question.
Last Time…. Correlation r value indicates – Direction of relationship between two variables – Strength of relationship between two variables
Today…. Regression
Correlation tells us about the strength of the relationship between 2 variables It does not let us predict We can use linear regression to do this
Correlation When you run a correlation you convert everything to z scores r = (ΣZxZy) / N
Regression We build on correlation by adding a “line of best fit” to the data The previous plot was on a stand- ardized scale Any known X score lets us predict the Y score
Line of best fit Remember this from high school? Y = mX + b We use: Y = α + b y (X)+ error Where b y is the slope of the line a is the Y intercept (where the line hits the y- axis) error is the unexplained variance
Slope Slope (b y ) is the angle of the line Change in Y / Change in X The more Y changes for every unit change of X, the steeper the slope
Y-intercept This is where the line crosses the Y axis When X = 0, the value of Y is the intercept
Line of best fit The resulting line comes as close as possible to the existing data points
Determining the Regression Line The following is the formula for determining the slope For the intercept
Y prime The line formula gives us the value of Y we would predict if given X We write this as Y’ We have to differentiate from the actual Y, because our estimate Y’ is not totally accurate
Why predict Y? We already have Y scores Y’ isn’t as good as Y But, the regression lets you predict new data Use SAT scores to predict college performance Use morbidity data to predict longevity of smokers Use past status of markets to predict their future status
Making predictions You can rewrite the line formula as: The slope is the middle term b y = r(Sy/Sx) Get the intercept by moving stuff around
Example Jessica wants to predict her final exam grade from the midterm She earned a 74 on the midterm The mean grade on the midterm was 70 and s = 4 In previous years, the mean on the final was a 75 and s = 4. The correlation between the two tests was r =.60 What score can Jessica predict? Y’ = (4/4)(74 – 70)
Example Jessica wants to predict her final exam grade from the midterm She earned a 74 on the midterm The mean grade on the midterm was 70 and s = 4 In previous years, the mean on the final was a 75 and s = 4. The correlation between the two tests was r =.60 What score can Jessica predict? Y’ = (4/4)(74 – 70)
Example Y’ = (4/4)(74 – 70) Y’ = 75 + (.6)(1)(4) Y’ = Y’ = 77.4 What if the correlation between the midterm and final was 1?
Example Y’ = Ybar + r(Sy / Sx) (X – Xbar) Y’ = 75 + (1)(4/4)(74 – 70) Y’ = = 79 The correlation is perfect here A difference in score values reflects a difference in scale The distance from the mean is identical
Example Y’ = Ybar + r(Sy / Sx) (X – Xbar) What if the correlation between the midterm and final was 0?
Example Y’ = Ybar + r(Sy / Sx) (X – Xbar) Y’ = 75 + (0)(4/4)(74 – 70) Y’ = 75 The best prediction is the mean when the variables are uncorrelated, or the correlation is unknown. Regression allows us to beat the mean
Variation If r = +-1, all variation is explained, if r = 0 all variation is unexplained The closer the points fall to the regression line, the greater the variation explained
Causation As with correlation, we can’t infer causation with regression We’re observing variables that correlate, not running experiments Beware of lurking variables. Another explanation may fit the data better
Midterm For the midterm you are going to have to integrate what you have learned. You are going to be given one or more research problems with small datasets. – Because all you know how to do right now is descriptive statistics and correlation/regression analyses they will be correlational designs. You are going to have to run all the descriptive statistics you know. (e.g. what the mean, standard deviation, range, mode, etc. for the two variables). Draw a scatterplot. You will then calculate the correlation, report whether or not it is significant. You will then do a regression, calculate the slope and intercept and draw the line of best fit through the scatterplot. – I may give you a value for x and ask you to predict a corresponding y value given your regression line.
Homework 2 a-d 10 15