DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
STRUCTURE OF THE CHAPTER A cautionary note about missing data Frequencies, percentages and crosstabulations Measures of central tendency and dispersal Taking stock Correlations and measures of association Partial correlations Reliability © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
MISSING DATA Data may be Missing Completely At Random (MCAR), i.e. there is no pattern to the missing data for any variables. Data may be Missing At Random (MAR), where there is a pattern to the missing data, but not for the main dependent variable. Data may be Missing Not At Random (MNAR), where there is a pattern in the missing data that affects the main dependent variable (e.g. low-income families may not respond to a survey item). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
ADDRESSING MISSING DATA If the missing data are randomly scattered, and the number of missing cases is so small that it is impossible for the results to seriously distort the overall findings, then the researcher might simply exclude those cases. If the missing data are not randomly scattered, but are systematically missing, i.e. a pattern in the non-response, then this is a major problem for the researcher, who may decide not to pursue that part of the analysis or may use imputation methods. Conduct sensitivity analysis: calculate the number of different responses/cases required to overturn or seriously change the findings of the analysis. If the number is so low that it could not upset the findings then the researcher might proceed, reporting the number of missing cases. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
ADDRESSING MISSING DATA Adopt a deletion method for missing data: exclude any cases whose data are incomplete on any variable or only use those cases which are complete on all the variables. Adopt the imputation method: a general term given to the methods of trying to calculate what the missing values might be so that they can be included in the analysis, i.e. substituting missing values with plausible, calculated values. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
FREQUENCIES AND PERCENTAGES Frequency and percentage tables Bar charts (for nominal and ordinal data) Histograms (for continuous – interval and ratio – data) Line graphs Pie charts High and low charts Scatterplots Stem and leaf displays Boxplots (box and whisker plots) Graphical forms of data presentation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
FREQUENCIES AND PERCENTAGES Bar charts present categorical and discrete data, highest and lowest. Avoid using a third dimension (e.g. depth) in a graph when it is unnecessary; a third dimension to a graph must provide additional information. Histograms present continuous data. Line graphs show trends, particularly in continuous data, for one or more variables at a time. Multiple line graphs show trends in continuous data on several variables in the same graph. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
FREQUENCIES AND PERCENTAGES Pie charts and bar charts show proportions. Crosstabulations show interdependence. Boxplots show the distribution of values for several variables in a single chart, together with their range and medians. Stacked bar charts show the frequencies of different groups within a specific variable for two or more variables in the same chart. Scatterplots show the relationship between two variables or several sets of two or more variables on the same chart. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CROSSTABULATIONS A crosstabulation is a presentational device Rows for nominal data, columns for ordinal data. Independent variables as row data, dependent variables as column data. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
BIVARIATE CROSSTABULATION © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
TRIVARIATE CROSSTABULATION Acceptability of formal, written public examinations Traditionalist Progressivist/ child-centred Formal, written public exams Socially advantaged Socially disadvantaged In favour 65% 70% 35% 20% Against 30% 80% Total per cent 100% © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
MEASURES OF CENTRAL TENDENCY AND DISPERSAL The mode (the score obtained by the greatest number of people) For categorical (nominal) and ordinal data The mean (the average score) For continuous data Used if the data are not skewed Used if there are no outliers © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
MEASURES OF CENTRAL TENDENCY AND DISPERSAL The median (the score obtained by the middle person in a ranked group of people, i.e. it has an equal number of scores above it and below it) For continuous data Used if the data are skewed Used if there are outliers Used if the standard deviation is high © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
MEASURES OF CENTRAL TENDENCY AND DISPERSAL Standard deviation (the average distance of each score from the mean, the average difference between each score and the mean, and how much, the scores, as a group, deviate from the mean. A standardized measure of dispersal For interval and ratio data © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
STANDARD DEVIATION The standard deviation is calculated, in its most simplified form, as: or d2 = the deviation of the score from the mean (average), squared = the sum of N = the number of cases A low standard deviation indicates that the scores cluster together, whilst a high standard deviation indicates that the scores are widely dispersed. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
High standard deviation 9 8 Mean 7 | 6 5 4 3 2 1 X 10 11 12 13 14 15 16 17 18 19 20 2 3 4 20 Mean = 6 High standard deviation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
Moderately high standard deviation 9 8 Mean 7 | 6 5 4 3 2 1 X 10 11 12 13 14 15 16 17 18 19 20 1 2 6 10 11 Mean = 6 Moderately high standard deviation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
Low standard deviation 9 8 Mean 7 | 6 5 4 3 X 2 1 10 11 12 13 14 15 16 17 18 19 20 5 6 6 6 7 Mean = 6 Low standard deviation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
THE RANGE AND INTERQUARTILE RANGE The difference between the minimum and maximum score. A measure of dispersal. Outliers exert a disproportionate effect. The interquartile range The difference between the first and the third quartile (the 25th and the 75th percentile), i.e. the middle 50 per cent of scores (the second and third quartiles). Overcomes problems of outliers/extreme scores. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATION Measure of association between two variables Note the direction of the correlation Positive: As one variable increases, the other variables increases Negative: As one variable increases, the other variable decreases The strongest positive correlation coefficient is +1. The strongest negative correlation coefficient is -1. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATION Foot size Hand size 1 1 2 2 3 3 4 4 5 5 1 1 2 2 3 3 4 4 5 5 Perfect positive correlation: +1 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATION Foot size Hand size 1 5 2 4 3 3 4 2 5 1 1 5 2 4 3 3 4 2 5 1 Perfect negative correlation: +1 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATION Hand size Foot size 1 2 2 1 3 4 4 3 5 5 1 2 2 1 3 4 4 3 5 5 Positive correlation: <+1 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
PERFECT POSITIVE CORRELATION © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
PERFECT NEGATIVE CORRELATION © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
MIXED CORRELATION © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATIONS Spearman correlation for nominal and ordinal data Pearson correlation for interval and ratio data © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATIONS Begin with a null hypothesis (e.g. there is no relationship between the size of hands and the size of feet). The task is not to support the hypothesis, i.e. the burden of responsibility is not to support the null hypothesis. If the hypothesis is not supported for 95 per cent or 99 per cent or 99.9 per cent of the population, then there is a statistically significant relationship between the size of hands and the size of feet at the 0.05, 0.01 and 0.001 levels of significance respectively. These levels of significance – the 0.05, 0.01 and 0.001 levels – are the levels at which statistical significance is frequently taken to be demonstrated. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CORRELATION Note the magnitude of the correlation coefficient: 0.20 to 0.35: slight association 0.35 to 0.65: sufficient for crude prediction 0.65 to 0.85: sufficient for accurate prediction >0.85: strong correlation Note the direction of the correlation (positive/negative) Ensure that the relationships are linear and not curvilinear (i.e. the line reaches an inflection point) © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CURVILINEAR RELATIONSHIP © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
MULTIPLE AND PARTIAL CORRELATIONS Multiple correlation The degree of association between three or more variables simultaneously. Partial correlation The degree of association between two variables after the influence of a third has been controlled or partialled out. controlling for the effects of a third variable means holding it constant whilst manipulating the other two variables. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
RELIABILITY Split-half reliability (correlation between one half of a test and the other matched half) The alpha coefficient © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
SPLIT-HALF RELIABILITY (Spearman-Brown) r = the actual correlation between the two halves of the instrument (e.g. 0.85); Reliability = = = 0.919 (very high) © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
CRONBACH’S ALPHA Reliability as internal consistency: Cronbach’s alpha (the alpha coefficient of reliability). A coefficient of inter-item correlations. It calculates the average of all possible split-half reliability coefficients. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors
INTERPRETING THE RELIABILITY COEFFICIENT Maximum is +1 >.90 very highly reliable .80–.90 highly reliable .70–.79 reliable .60–.69 marginally/minimally reliable <.60 unacceptably low reliability © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors