Presentation is loading. Please wait.

Presentation is loading. Please wait.

DESCRIPTIVE STATISTICS

Similar presentations


Presentation on theme: "DESCRIPTIVE STATISTICS"— Presentation transcript:

1 DESCRIPTIVE STATISTICS
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

2 STRUCTURE OF THE CHAPTER
A cautionary note about missing data Frequencies, percentages and crosstabulations Measures of central tendency and dispersal Taking stock Correlations and measures of association Partial correlations Reliability © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

3 MISSING DATA Data may be Missing Completely At Random (MCAR), i.e. there is no pattern to the missing data for any variables. Data may be Missing At Random (MAR), where there is a pattern to the missing data, but not for the main dependent variable. Data may be Missing Not At Random (MNAR), where there is a pattern in the missing data that affects the main dependent variable (e.g. low-income families may not respond to a survey item). © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

4 ADDRESSING MISSING DATA
If the missing data are randomly scattered, and the number of missing cases is so small that it is impossible for the results to seriously distort the overall findings, then the researcher might simply exclude those cases. If the missing data are not randomly scattered, but are systematically missing, i.e. a pattern in the non-response, then this is a major problem for the researcher, who may decide not to pursue that part of the analysis or may use imputation methods. Conduct sensitivity analysis: calculate the number of different responses/cases required to overturn or seriously change the findings of the analysis. If the number is so low that it could not upset the findings then the researcher might proceed, reporting the number of missing cases. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

5 ADDRESSING MISSING DATA
Adopt a deletion method for missing data: exclude any cases whose data are incomplete on any variable or only use those cases which are complete on all the variables. Adopt the imputation method: a general term given to the methods of trying to calculate what the missing values might be so that they can be included in the analysis, i.e. substituting missing values with plausible, calculated values. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

6 FREQUENCIES AND PERCENTAGES
Frequency and percentage tables Bar charts (for nominal and ordinal data) Histograms (for continuous – interval and ratio – data) Line graphs Pie charts High and low charts Scatterplots Stem and leaf displays Boxplots (box and whisker plots) Graphical forms of data presentation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

7 FREQUENCIES AND PERCENTAGES
Bar charts present categorical and discrete data, highest and lowest. Avoid using a third dimension (e.g. depth) in a graph when it is unnecessary; a third dimension to a graph must provide additional information. Histograms present continuous data. Line graphs show trends, particularly in continuous data, for one or more variables at a time. Multiple line graphs show trends in continuous data on several variables in the same graph. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

8 FREQUENCIES AND PERCENTAGES
Pie charts and bar charts show proportions. Crosstabulations show interdependence. Boxplots show the distribution of values for several variables in a single chart, together with their range and medians. Stacked bar charts show the frequencies of different groups within a specific variable for two or more variables in the same chart. Scatterplots show the relationship between two variables or several sets of two or more variables on the same chart. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

9 CROSSTABULATIONS A crosstabulation is a presentational device
Rows for nominal data, columns for ordinal data. Independent variables as row data, dependent variables as column data. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

10 BIVARIATE CROSSTABULATION
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

11 TRIVARIATE CROSSTABULATION
Acceptability of formal, written public examinations Traditionalist Progressivist/ child-centred Formal, written public exams Socially advantaged Socially disadvantaged In favour 65% 70% 35% 20% Against 30% 80% Total per cent 100% © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

12 MEASURES OF CENTRAL TENDENCY AND DISPERSAL
The mode (the score obtained by the greatest number of people) For categorical (nominal) and ordinal data The mean (the average score) For continuous data Used if the data are not skewed Used if there are no outliers © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

13 MEASURES OF CENTRAL TENDENCY AND DISPERSAL
The median (the score obtained by the middle person in a ranked group of people, i.e. it has an equal number of scores above it and below it) For continuous data Used if the data are skewed Used if there are outliers Used if the standard deviation is high © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

14 MEASURES OF CENTRAL TENDENCY AND DISPERSAL
Standard deviation (the average distance of each score from the mean, the average difference between each score and the mean, and how much, the scores, as a group, deviate from the mean. A standardized measure of dispersal For interval and ratio data © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

15 STANDARD DEVIATION The standard deviation is calculated, in its most simplified form, as: or d2 = the deviation of the score from the mean (average), squared  = the sum of N = the number of cases A low standard deviation indicates that the scores cluster together, whilst a high standard deviation indicates that the scores are widely dispersed. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

16 High standard deviation
9 8 Mean 7 | 6 5 4 3 2 1 X 10 11 12 13 14 15 16 17 18 19 20 Mean = 6 High standard deviation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

17 Moderately high standard deviation
9 8 Mean 7 | 6 5 4 3 2 1 X 10 11 12 13 14 15 16 17 18 19 20 Mean = 6 Moderately high standard deviation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

18 Low standard deviation
9 8 Mean 7 | 6 5 4 3 X 2 1 10 11 12 13 14 15 16 17 18 19 20 Mean = 6 Low standard deviation © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

19 THE RANGE AND INTERQUARTILE RANGE
The difference between the minimum and maximum score. A measure of dispersal. Outliers exert a disproportionate effect. The interquartile range The difference between the first and the third quartile (the 25th and the 75th percentile), i.e. the middle 50 per cent of scores (the second and third quartiles). Overcomes problems of outliers/extreme scores. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

20 CORRELATION Measure of association between two variables
Note the direction of the correlation Positive: As one variable increases, the other variables increases Negative: As one variable increases, the other variable decreases The strongest positive correlation coefficient is +1. The strongest negative correlation coefficient is -1. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

21 CORRELATION Foot size Hand size 1 1 2 2 3 3 4 4 5 5
1 1 2 2 3 3 4 4 5 5 Perfect positive correlation: +1 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

22 CORRELATION Foot size Hand size 1 5 2 4 3 3 4 2 5 1
1 5 2 4 3 3 4 2 5 1 Perfect negative correlation: +1 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

23 CORRELATION Hand size Foot size 1 2 2 1 3 4 4 3 5 5
Positive correlation: <+1 © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

24 PERFECT POSITIVE CORRELATION
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

25 PERFECT NEGATIVE CORRELATION
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

26 MIXED CORRELATION © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

27 CORRELATIONS Spearman correlation for nominal and ordinal data
Pearson correlation for interval and ratio data © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

28 CORRELATIONS Begin with a null hypothesis (e.g. there is no relationship between the size of hands and the size of feet). The task is not to support the hypothesis, i.e. the burden of responsibility is not to support the null hypothesis. If the hypothesis is not supported for 95 per cent or 99 per cent or 99.9 per cent of the population, then there is a statistically significant relationship between the size of hands and the size of feet at the 0.05, 0.01 and levels of significance respectively. These levels of significance – the 0.05, 0.01 and levels – are the levels at which statistical significance is frequently taken to be demonstrated. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

29 CORRELATION Note the magnitude of the correlation coefficient:
0.20 to 0.35: slight association 0.35 to 0.65: sufficient for crude prediction 0.65 to 0.85: sufficient for accurate prediction >0.85: strong correlation Note the direction of the correlation (positive/negative) Ensure that the relationships are linear and not curvilinear (i.e. the line reaches an inflection point) © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

30 CURVILINEAR RELATIONSHIP
© 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

31 MULTIPLE AND PARTIAL CORRELATIONS
Multiple correlation The degree of association between three or more variables simultaneously. Partial correlation The degree of association between two variables after the influence of a third has been controlled or partialled out. controlling for the effects of a third variable means holding it constant whilst manipulating the other two variables. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

32 RELIABILITY Split-half reliability (correlation between one half of a test and the other matched half) The alpha coefficient © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

33 SPLIT-HALF RELIABILITY (Spearman-Brown)
r = the actual correlation between the two halves of the instrument (e.g. 0.85); Reliability = = = (very high) © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

34 CRONBACH’S ALPHA Reliability as internal consistency: Cronbach’s alpha (the alpha coefficient of reliability). A coefficient of inter-item correlations. It calculates the average of all possible split-half reliability coefficients. © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors

35 INTERPRETING THE RELIABILITY COEFFICIENT
Maximum is +1 >.90 very highly reliable .80–.90 highly reliable .70–.79 reliable .60–.69 marginally/minimally reliable <.60 unacceptably low reliability © 2018 Louis Cohen, Lawrence Manion and Keith Morrison; individual chapters, the contributors


Download ppt "DESCRIPTIVE STATISTICS"

Similar presentations


Ads by Google