Download presentation
Presentation is loading. Please wait.
Published byRoss Walton Modified over 6 years ago
1
INFERENTIAL STATISTICS: REGRESSION ANALYSIS AND STANDARDIZATION
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
2
STRUCTURE OF THE CHAPTER
Regression analysis (prediction tests for parametric data) Simple linear regression (predicting the value of one variable from the known value of another variable) Multiple regression (calculating the different weightings of independent variables on a dependent variable) Standardized scores (used in calculating regressions and comparing sets of data with different means and standard deviations) © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
3
REGRESSION Regression is a statistical technique of modelling the relationship between variables. From knowing the values of one variable we can predict the values of another variable. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
4
SAFETY CHECKS FOR REGRESSSION ANALYSIS
Sample size: the larger, the better. Avoid multicollinearity, i.e. avoid strong correlation. Avoid singularity (where one variable is a combination of independent variables). Avoid outliers (remove outliers). The measurements are from a random sample (or at least a probability-based one). All variables are real numbers (ratio data) (or at least the dependent variable must be). All variables are measured without error. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
5
SAFETY CHECKS FOR REGRESSSION ANALYSIS
There is an approximate linear (straight line) relationship between the dependent variable and the independent variable(s). Normal distribution of the variables. The residuals for the dependent variable are approximately normally distributed. Homoscedasticity (the variance of the residuals for the dependent variable is the same); each residual is consistent across the range of values for all other variables. The residuals are not strongly correlated with the independent variables. Each case is independent of the others. Interaction effects of independent variables are measured. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
6
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
7
SIMPLE LINEAR REGRESSION
Simple linear regression – the model includes one explanatory variable (independent) and one explained variable (dependent) The relationship between examinations and stress © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
8
A SIMPLE REGRESSION © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
9
A SIMPLE REGRESSION (SPSS)
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
10
A SIMPLE REGRESSION The beta weighting (β) is ‘the amount of standard deviation unit of change in the dependent variable for each standard deviation unit of change in the independent variable’. Here the standardized beta weighting is .966, i.e. it is highly statistically significant ( = in the ‘Sig.’ column); this means that for every standard deviation unit change in the independent variable (‘hours per week on private study’) there is .966 of a unit rise in the dependent variable (‘score on final university examination’), i.e. there is nearly a one-to-one correspondence. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
11
MULTIPLE LINEAR REGRESSION
The model is a linear equation with at least two explanatory variables (independent) and one explained variable (dependent). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
12
SAFETY CHECKS FOR MULTIPLE REGRESSION
Sample size, random sampling and parametric data (can be checked before deciding whether to embark on multiple regression) Collinearity Normality, linearity and homoscedasticity Outliers and distributions Residual scatterplot analysis © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
13
SOME ASSUMPTIONS IN REGRESSION
Random sampling. Ratio data. The removal of outliers: check outliers by calculating the Mahalanobis distance (in SPSS). The supposed linearity of the measures is justifiable. Interaction effects of independent variables (in non-recursive models) are measured. The selection for the inclusion and exclusion of variables is justifiable. The dependent variable and the residuals (the distance of the cases from the line of best fit) is approximately normally distributed. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
14
SOME ASSUMPTIONS IN REGRESSION
The variance of each variable is consistent across the range of values for all other variables (or at least the next assumption is true). The independent variables are approximately normally distributed, the variation is even across the levels/values of the variable (homoscedasticity). Collinearity/multicollinearity is avoided. Regressions are only as robust as the variables included, and the inclusion or removal of one or more independent variables affects their relative weightings on the dependent variable. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
15
USING MULTIPLE REGRESSION
Multiple regression is useful in calculating the relative weighting of two or more independent variables on a dependent variable. Using the beta () weighting, multiple regression calculates how many standard deviation units are changed in the dependent variable for each standard deviation unit of change in each of the independent variables. For example, let us say that we wished to investigate the relative weighting of ‘hours per week of private study’ and ‘motivation level’ as independent variables acting on the dependent variable ‘score on final university examination’. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
16
Final examination score
Hours of study per week Final examination score Level of motivation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
17
USING MULTIPLE REGRESSION (SPSS)
The Adjusted R square is .938, i.e. the amount of the dependent variable explained by the two independent variables is very high (93.8 per cent). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
18
USING MULTIPLE REGRESSION
The analysis of variance (ANOVA) is highly statistically significant ( = .000), i.e. the relationship between the independent variable and the dependent variable is very strong. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
19
USING MULTIPLE REGRESSION (SPSS)
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
20
USING MULTIPLE REGRESSION
The independent variable ‘hours per week of private study’ has the strongest positive predictive power ( = .904) on the dependent variable ‘score on final university examination’, and this is statistically significant (the column ‘Sig.’ indicates that the level of significance, at .000, is stronger than .001). The independent variable ‘motivation level’ has strong positive predictive power ( = .104) on the dependent variable ‘score on final university examination’, and this is statistically significant (the column ‘Sig.’ indicates the level of significance at .001). Though both independent variables have a statistically significant weighting on the dependent variable, the beta weighting of the independent variable ‘hours per week of private study’ ( = .904) is much higher than that of the independent variable ‘motivation level’ ( = .104) on the dependent variable ‘score on final university examination’, i.e. ‘hours per week on private study’ is a stronger predictor of ‘score on final university examination’ than ‘motivation level’. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
21
USING MULTIPLE REGRESSION
The researcher can predict that, if the hours per week spent in private study were known, and if the motivation level of the student was known, then the likely score on the final university examination could be predicted. The formula would be: ‘Score on final university examination’ = ( x ‘hours per week on private study’) + ( x ‘motivation level’) In the example, the for ‘hours per week on private study’ is 0.904, and the for ‘motivation level’ is These are the relative weightings of the two independent variables. So, for example, for a student who spends 60 hours per week on private study and has a high motivation level (9), the formula becomes: ‘Score on final university examination’ = (0.904 x 60) + (0.104 x 9) = = © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
22
Examination mark = Study time + Intelligence
For example, if the beta weighting were different, and for different factors, the relationship between examination mark, study time and intelligence could be: Examination mark = Study time + Intelligence Examination mark = 0.65 Study time Intelligence A student with an intelligence score of 110 who studies for 30 hours per week will obtain the following examination mark: Examination mark = (0.65 x 30) + (0.30 x 110) = = 52.5 If the same student studies for 40 hours then: Examination mark = (0.65 x 40) + (0.30) x 110 = = 59 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
23
STRESS IN TEACHING (SPSS): BETA WEIGHTINGS OF VARIABLES
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
24
SIX STEPS IN MULTIPLE REGRESSION
1 Conduct those ‘safety checks’ which can be conducted before the calculations proceed (e.g. random sampling, sample size). 2 Run the multiple regression 3 Conduct the ‘safety checks’ once you have data: collinearity; normality; linearity; homoscedasticity; residuals scatterplot analysis; outliers; standardized residuals values. 4 Note the Adjusted R Square (to see the amount of explained variance that the independent variables have on the dependent variable). 5 Check ANOVA and its significance level (to see if the model is statistically significant). 6 Note the Standardized Beta coefficients () and their statistical significance levels. This tells you the relative weight of each of the independent variables on the dependent variable. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
25
STRESS IN TEACHING (SPSS) (removing variables affects beta weightings)
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
26
COLLINEARITY DIAGNOSTICS (MULTICOLLINEARITY)
Multicollinearity: The correlation between each independent variable should not be too high. Collinearity diagnostics indicates the level of correlation. If the collinearity is too high between two variables then it may be advisable to remove one. Multicollinearity is tested by (a) Tolerance and (b) the Variance Inflation Factor. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
27
COLLINEARITY DIAGNOSTICS (MULTICOLLINEARITY)
Tolerance: ‘An indicator of how much of the variability of the specified independent is not explained by the other independent variables in the model If this value is very small (less than .10), it indicates that the multiple correlation with other variables is high, suggesting the possibility of collinearity’ (Pallant, 2013, p. 164). ‘The VIF (Variance Inflation Factor), which is just the inverse of the Tolerance Factor (1 divided by Tolerance). VIF values above 10 would be a concern here, indicating multicollinearity’ (Pallant, 2013, p. 164). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
28
COLLINEARITY DIAGNOSTICS (SPSS) (MULTICOLLINEARITY)
Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF 1 (Constant) 22.577 1.366 16.531 .000 Hours per week on private study .714 .024 .904 29.286 .655 1.528 Motivation level .404 .119 .104 3.385 .001 a. Dependent Variable: Score on final university examination © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
29
COLLINEARITY DIAGNOSTICS (MULTICOLLINEARITY)
In the example: Hours per week on private study: Tolerance = .655 VIF = 1.528 Motivation level: Tolerance = .655 There is no problem with multicollinearity. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
30
STEPWISE REGRESSION To find a model with predictive accuracy, working with a limited number of independent variables from a longer list of independent variables, to determine which ones have a statistically significant influence on the dependent variables. Stepwise multiple regression enters variables one at a time, in a sequence, to see which adds to the explanatory power of a model, by looking at its impact on the R squared – whether it increases the R-square value. Stepwise multiple regression enables the researcher to see which variables have predictive power and which do not, which to include and which to exclude in an explanatory model. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
31
LOGISTIC REGRESSION To enable the researcher to work with categorical variables in a multiple regression where the dependent variable is a categorical variable. The independent variables may be categorical, discrete or continuous. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
32
EXAMINING MULTIPLE REGRESSION (SPSS OUTPUT)
Check collinearity statistics: Tolerance must be higher than .10; VIF (Variance Inflation Factor) must not be higher than 10. Check normality, linearity and homoscedasticity: Normality Probability Plot (Normal P-P Plot) to have points going in a straight diagonal line, bottom left to top right; Scatterplot to be a rectangle with scores concentrated in the centre (along the 0 point), avoiding curvilinear or uneven distribution. Check that the Cook’s Distance maximum value is below 1 and that the Mahal. Distance is lower than the critical value. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
33
EXAMINING MULTIPLE REGRESSION SPSS OUTPUT
Check the Adjusted R Square. Check ANOVA and its significance level. Check the Standardized Beta Coefficients and their significance levels. Square each Parts correlation coefficient to see the contribution of each variable to the total Adjusted R Square (i.e. how much of the total variance in the dependent variable is explained by each independent variable). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
34
THE NEED FOR A STANDARDIZED SCORE
A child tells his parents that he scored a mark of 75 for a maths test; his parents scold him. A child tells his parents that he scored a mark of 2 for a history test; his parents praise him. A child tells his parents that he scored a mark of 25 for an English test and a mark of 60 for a physics test; his parents praise him for both. A child tells his parents that he scored a mark of 80 for a geography test and a mark of 120 for a chemistry test; his parents scold him for both. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
35
THE NEED FOR A STANDARDIZED SCORE
We need to know how to judge whether a mark is high or low. We need to be able to compare marks between one test and another. Therefore we need to know the scale of the marks, the range of the marks, the mean of the marks and the distribution of the marks either side of the mean. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
36
THE NEED FOR A STANDARDIZED SCORE
We need to know how to compare marks from a test which: uses one scale with marks from a test which uses another scale. has one range of marks with marks from a test that has another range of marks. has a mean which is different from the mean of another test. Has a distribution around the mean which is different from the distribution of another test. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
37
THE Z-SCORE (STANDARDIZED SCORE)
Standardizing scores lets us judge whether a mark is high or low. Standardizing scores lets us compare marks between one test and another when two different tests have different scales, range, means and distributions around the mean. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
38
Z-SCORES z-scores have the same mean and standard deviation, even though the original sets of scores had different means and standard deviations, i.e. z-scores let you compare fairly. A z-score tells us how many standard deviations someone’s scores are above or below the mean. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
39
Z-SCORES To calculate the z-score subtract the mean from the raw score and divide that answer by the standard deviation. For example, if the raw score is 15, the mean is 10 and the standard deviation is 4, then 15–10 = 5 and 5 4 = 1.25. Here z-score tells us that the person’s score is 1.25 standard deviations above the mean. Is that score good or bad? How good or bad is it? © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
40
Two standard deviations either side of the mean accounts for 95
Two standard deviations either side of the mean accounts for 95.4% of the population. One standard deviation either side of the mean accounts for 68.3% of the population. The mean © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
41
STANDARDIZED SCORE (Z-SCORE)
A z-score of +1.4 indicates that someone is 1.4 standard deviations above the mean. A z-score of –1.4 indicates that someone is 1.4 standard deviations below the mean. If the z-score is positive, it indicates that the value is above the mean. If the z-score is negative, it means that the value is below the mean. Is that z-score good or bad? How good or bad is it? We need to know about the probability of a certain value falling into a certain range of value. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
42
68% of the population lie between -1 and +1 standard deviations
The normal curve lets us interpret the probability of a score falling into a certain range of scores/values. 68% of the population lie between -1 and +1 standard deviations 95% of the population lie between -2 and +2 standard deviations 99% of the population lie between -1 and +1 standard deviations © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
43
The normal curve lets us interpret the probability of a score falling into a certain range of scores/values. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
44
Let us say that, for 1,200 people: The mean = 35
The standard deviation = 13 If a member of the group says he is 61 years old, then it is clear that this person is much older than the average. But how much older? To be exact, we can convert his score into a z-score. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
45
This tells us that 61 is 2 standard deviations above the mean.
Refer to the ‘areas under the standard normal curve’ table (in statistics textbooks), for z = 2, the ‘area under curve beyond one point’ is The proportion of people who are 61 years of age or more is only 2.3 per cent of the total. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
46
For example, if the z-score is 1.25.
Is that score good or bad? How good or bad is it? Refer to the ‘areas under the standard normal curve’ table (in statistics textbooks), for z = ±1.25, the ‘area under curve beyond one point’ is The proportion of people who score 1.25 or more is only per cent of the total. So, the score is very good. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
47
CALCULATING Z-SCORES WITH SPSS
Click ‘Analyze’ ‘Descriptive Statistics’ Descriptives’. Send over the variables to ‘Variables’ ‘Click the box ‘Save standardized values as variables’ Click ‘OK’ Two new variables will be created. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
48
T-SCORES Some people are uncomfortable with z-scores, as they don’t like negative scores and they do not like an average being 0. To overcome this, z-scores can be converted to T-scores. To convert a z-score to a T-score, multiply the z-score by 10 and add 50 to the result. For example, a z-score of .5 multiplied by 10 gives 5, and then, with 50 added, gives 55. The T-score is 55. Many IQ tests and standardized tests convert z-scores. For example, a common conversion in IQ tests is to multiply the z-score by 15 and add So a z-score on an IQ test might be .5, multiplied by 15 gives 7.5, with 100 added gives 107.5, i.e. the IQ z-score converts to a T-score of © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
49
THE McCALL T-SCORE The McCall T-score has a mean of 50 and a standard deviation of 10: McCall T-score = 50±(z-score x 10) For the ± sign, the part in brackets should be added to the 50 if the z-score is positive (i.e. if the raw score is above the mean) and subtracted if the z-score is negative (i.e. if the raw score is below the mean). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.