With the growth of internet service providers, a researcher decides to examine whether there is a correlation between cost of internet service per month (rounded to the nearest dollar) and degree of customer satisfaction (on a scale of 1 - 10 with a 1 being not at all satisfied and a 10 being extremely satisfied). The researcher only includes programs with comparable types of services. Determine if customers should be happy about paying more. dollars satisfaction 11 6 18 8 17 10 15 4 9 5 12 3 19 22 2 25
Practice Situation 1 Based on a sample of 100 subjects you find the correlation between extraversion is happiness is r=.15. Determine if this value is significantly different than zero. Situation 2 Based on a sample of 600 subjects you find the correlation between extraversion is happiness is r=.15. Determine if this value is significantly different than zero.
Step 1 Situation 1 H1: r is not equal to 0 H0: r is equal to zero The two variables are related to each other H0: r is equal to zero The two variables are not related to each other Situation 2
Step 2 Situation 1 df = 98 t crit = +1.985 and -1.984 Situation 2
Step 3 Situation 1 r = .15 Situation 2
Step 4 Situation 1 Situation 2
Step 5 Situation 1 If tobs falls in the critical region: Reject H0, and accept H1 If tobs does not fall in the critical region: Fail to reject H0 Situation 2
Step 6 Situation 1 Based on a sample of 100 subjects you find the correlation between extraversion is happiness is r=.15. Determine if this value is significantly different than zero. There is not a significant relationship between extraversion and happiness Situation 2 Based on a sample of 600 subjects you find the correlation between extraversion is happiness is r=.15. Determine if this value is significantly different than zero. There is a significant relationship between extraversion and happiness.
Practice You collect data from 53 females and find the correlation between candy and depression is -.40. Determine if this value is significantly different than zero. You collect data from 53 males and find the correlation between candy and depression is -.50. Determine if this value is significantly different than zero.
Practice You collect data from 53 females and find the correlation between candy and depression is -.40. t obs = 3.12 t crit = 2.00 You collect data from 53 males and find the correlation between candy and depression is -.50. t obs = 4.12
Practice You collect data from 53 females and find the correlation between candy and depression is -.40. You collect data from 53 males and find the correlation between candy and depression is -.50. Is the effect of candy significantly different for males and females?
Hypothesis H1: the two correlations are different H0: the two correlations are not different
Testing Differences Between Correlations Must be independent for this to work
When the population value of r is not zero the distribution of r values gets skewed Easy to fix! Use Fisher’s r transformation Page 746
Testing Differences Between Correlations Must be independent for this to work
Testing Differences Between Correlations
Testing Differences Between Correlations
Testing Differences Between Correlations
Testing Differences Between Correlations Note: what would the z value be if there was no difference between these two values (i.e., Ho was true)
Testing Differences Z = -.625 What is the probability of obtaining a Z score of this size or greater, if the difference between these two r values was zero? p = .267 If p is < .025 reject Ho and accept H1 If p is = or > .025 fail to reject Ho The two correlations are not significantly different than each other!
Remember this: Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)
Regression allows us to predict! . . . . .
Straight Line Y = mX + b Where: Y and X are variables representing scores m = slope of the line (constant) b = intercept of the line with the Y axis (constant)
Excel Example
That’s nice but. . . . How do you figure out the best values to use for m and b ? First lets move into the language of regression
Straight Line Y = mX + b Where: Y and X are variables representing scores m = slope of the line (constant) b = intercept of the line with the Y axis (constant)
Regression Equation Y = a + bX Where: Y = value predicted from a particular X value a = point at which the regression line intersects the Y axis b = slope of the regression line X = X value for which you wish to predict a Y value
Practice Y = -7 + 2X What is the slope and the Y-intercept? Determine the value of Y for each X: X = 1, X = 3, X = 5, X = 10
Practice Y = -7 + 2X What is the slope and the Y-intercept? Determine the value of Y for each X: X = 1, X = 3, X = 5, X = 10 Y = -5, Y = -1, Y = 3, Y = 13
Finding a and b Uses the least squares method Minimizes Error Error = Y - Y (Y - Y)2 is minimized
. . . . .
. . . . . Error = Y - Y (Y - Y)2 is minimized Error = 1 Error = .5
Finding a and b Ingredients COVxy Sx2 Mean of Y and X
Regression
Regression Ingredients Mean Y =4.6 Mean X = 3 Covxy = 3.75 S2X = 2.50
Regression Ingredients Mean Y =4.6 Mean X = 3 Covxy = 3.75 S2x = 2.50
Regression Ingredients Mean Y =4.6 Mean X = 3 Covxy = 3.75 S2x = 2.50
Regression Equation Y = a + bx Equation for predicting smiling from talking Y = .10+ 1.50(x)
Regression Equation Y = .10+ 1.50(x) How many times would a person likely smile if they talked 15 times?
Regression Equation Y = .10+ 1.50(x) How many times would a person likely smile if they talked 15 times? 22.6 = .10+ 1.50(15)
Y = 0.1 + (1.5)X . . . . .
Y = 0.1 + (1.5)X X = 1; Y = 1.6 . . . . . .
Y = 0.1 + (1.5)X X = 5; Y = 7.60 . . . . . . .
Y = 0.1 + (1.5)X . . . . . . .
Mean Y = 14.50; Sy = 4.43 Mean X = 6.00; Sx= 2.16 Quantify the relationship with a correlation and draw a regression line that predicts aggression.
∑XY = 326 ∑Y = 58 ∑X = 24 N = 4
∑XY = 326 ∑Y = 58 ∑X = 24 N = 4
COV = -7.33 Sy = 4.43 Sx= 2.16
COV = -7.33 Sy = 4.43 Sx= 2.16
Regression Ingredients Mean Y =14.5 Mean X = 6 Covxy = -7.33 S2X = 4.67
Regression Ingredients Mean Y =14.5 Mean X = 6 Covxy = -7.33 S2X = 4.67
Regression Equation Y = a + bX Y = 23.92 + (-1.57)X
Y = 23.92 + (-1.57)X . 22 20 . 18 16 . 14 . 12 10
Y = 23.92 + (-1.57)X . . 22 20 . 18 16 . 14 . 12 10
Y = 23.92 + (-1.57)X . . 22 20 . 18 16 . 14 . 12 . 10
Y = 23.92 + (-1.57)X . . 22 20 . 18 16 . 14 . 12 . 10
Hypothesis Testing Have learned How to calculate r as an estimate of relationship between two variables How to calculate b as a measure of the rate of change of Y as a function of X Next determine if these values are significantly different than 0
Testing b The significance test for r and b are equivalent If X and Y are related (r), then it must be true that Y varies with X (b). Important to learn b significance tests for multiple regression
Steps for testing b value 1) State the hypothesis 2) Find t-critical 3) Calculate b value 4) Calculate t-observed 5) Decision 6) Put answer into words
Practice You are interested in if candy consumption significantly alters a persons depression. Create a graph showing the relationship between candy consumption and depression (note: you must figure out which is X and which is Y)
Practice Candy Depression Charlie 5 55 Augustus 7 43 Veruca 4 59 Mike 108 Violet 65
Step 1 H1: b is not equal to 0 H0: b is equal to zero
Step 2 Calculate df = N - 2 Page 747 df = 3 First Column are df Look at an alpha of .05 with two-tails t crit = 3.182 and -3.182
Step 3 Candy Depression Charlie 5 55 Augustus 7 43 Veruca 4 59 Mike 3 108 Violet 65 COV = -30.5 N = 5 r = -.81 Sy = 24.82 Sx = 1.52
Step 3 Y = 127 + -13.26(X) b = -13.26 COV = -30.5 N = 5 r = -.81 Sx = 1.52 Sy = 24.82
Step 4 Calculate t-observed b = Slope Sb = Standard error of slope
Step 4 Syx = Standard error of estimate Sx = Standard Deviation of X
Step 4 Sy = Standard Deviation of y r = correlation between x and y
Note
. . . . . Error = Y - Y (Y - Y)2 is minimized Error = 1 Error = .5
Step 4 Sy = Standard Deviation of y r = correlation between x and y
Step 4 Syx = Standard error of estimate Sx = Standard Deviation of X
Step 4 Syx = Standard error of estimate Sx = Standard Deviation of X
Step 4 Calculate t-observed b = Slope Sb = Standard error of slope
Step 4 Calculate t-observed b = Slope Sb = Standard error of slope
Step 4 Note: same value at t-observed for r
Step 5 If tobs falls in the critical region: Reject H0, and accept H1 If tobs does not fall in the critical region: Fail to reject H0
t distribution tcrit = -3.182 tcrit = 3.182
t distribution tcrit = -3.182 tcrit = 3.182 -2.39
Step 5 If tobs falls in the critical region: Reject H0, and accept H1 If tobs does not fall in the critical region: Fail to reject H0
Practice
Practice Page 288 9.18
9.18 The regression equation for faculty shows that the best estimate of starting salary for faculty is $15,000 (intercept). For every additional year the salary increases on average by $900 (slope). For administrative staff the best estimate of starting salary is $10,000 (slope), for every additional year the salary increases on average by $1500 (slope). They will be equal at 8.33 years of service.
Practice Page 290 9.23
9.23 r = .68 r1 = .829 r = .51 r1 = .563 Z = .797 p = .2119 Correlations are not different from each other
SPSS Problem #3 Due March 14th Page 287 9.2 9.3 9.10 and create a graph by hand