Prediction II Assumptions and Interpretive Aspects
Assumptions of Regression Normal Distribution – Both variables should be normally distributed – For non-normal distributions we use non-parametric tests Continuous Variables – Variables must be measured with a interval or ratio scale – Non-parametric tests are better for the scores collected with a nominal and ordinal scales Linearity – The relation between two variables should be linear Homoscedasticity – The variability of of actual Y values about Y I must be the same for all values of X.
Linearity In unlinear distributions, r is lower than its real value – So, prediction is less successful Some characteristics in nature are curvilinearly related. For such variables, we need to use some advanced tecniques For instance, the relationship between anxiety and success is curvilinear When anxiety is low, success is low (motivation is low) When anxiety is at its medium, success is high (motivation is high and anxiety does not have a derograting effect) When anxiety is high, success is low (the organism is shocked)
Homoscedasticity
Interpretive Aspects Factors Influencing r Range of Talent – When Y, X or both are restricted the r is lower than its real value – Because, r is a byproduct of both S 2 YX and S 2 Y That is S 2 YX / S 2 Y in formula B If we restrict the variance of Y, for instance, standart error of prediction would stay same. So, the r would get lower – See figure 11.1 on page 195 – This is what we called ceiling and floor effect
Interpretive Aspects Factors Influencing r Range of Talent Item per minute WorkerPayment $ApprerentReal A1288 B1311 C1412 D1512 E1613 F G1815 I1915 J K L M N r=0,840,98
Interpretive Aspects Factors Influencing r Heterogeneity of Samples – When samples are pooled, the correlation for aggregated data depends on where the sample values lie relative to one another in both the X and Y dimensions Let’s say professor Aktan and Göktürk prepared final exams for two courses: Statistics and Int. Resch. Methd.
Interpretive Aspects Factors Influencing r Heterogeneity of Samples – Students always gets 20 points higher in Göktürk’s exams StatisticsResearch Göktürk r=0,95 Aktan r=0,95 r=0,98
Interpretive Aspects Factors Influencing r Heterogeneity of Samples – Aktan insist on giving his own Statistics exam StatisticsResearch Göktürk r=0,95 Aktan r=0,95 r=0,58
Interpretive Aspects Regression Equation β coefficient shows the slope of the regression line. – General equation of a straight line Y=bX + c – Regression of Y on X
Interpretive Aspects Regression Equation β coefficient shows the slope of the regression line. – To see that let’s use two z score distribution in which mean is 0 and SD is 1 – Now, Zx-mean and Zy-mean becomes 0. So, c=0 – Zsy/Zsx is equal to 1/1. So, B=(r1/1)Zx= rZx – As you can see, beta is equal to r in z distributions
Interpretive Aspects Regression Equation Now, let’s say we calculated r between statistics and research scores for students of Çağ, ODTÜ and Mersin University – For Çağ University r=.82 – For Mersin University r=.62 – For ODTÜ r=.35 StatisticsResearch Çağ -3-2, ,64 -0, ,82 21,64 32,46 Mersin -3-1, ,24 -0, ,62 21,24 31,86 ODTÜ -3-1, ,7 -0, ,35 20,7 31,05
Interpretive Aspects Proportion of Variance in Y Associated with Variance in X Correlation coefficient has a special meaning – The squared correlation coefficient is equal to the proportion of variance in Y which is explained by the variance in X That is explained variance – r 2 = proportion of explained variance – 1- r 2 = proportion of unexplained variance – Let’s say correlation between depression and GPA is.67 So, change in depression explains 45% of change in GPA – r=.67, so r 2 =.45
Interpretive Aspects Proportion of Variance in Y Associated with Variance in X We can see the meaning of this in the Figure below