Download presentation
Presentation is loading. Please wait.
Published byQuentin Miller Modified over 9 years ago
1
Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9
2
Objectives To define a standard set of descriptive statistics used to analyse continuous variables To examine the Explore facility in SPSS To introduce the analysis of a continuous variable according to values of a categorical variable, an example of bivariate analysis To introduce further SPSS Help options To reinforce the use of SPSS syntax
3
SPSS Descriptive Statistics Analyse/Descriptive Statistics/Frequencies Analyse/Descriptive Statistics/Explore Analyse/Descriptive Statistics/Descriptives
4
Exercise: continuous variable Generate a set of standard summary statistics for the continuous variable Age
5
Explore: Age
6
Explore: Descriptive Statistics StatisticStd. Error AGEMean31.78.315 95% Confidence Interval for Mean Lower Bound31.16 Upper Bound32.40 5% Trimmed Mean31.31 Median31.00 Variance154.614 Std. Deviation12.434 Minimum1 Maximum77 Range76 Interquartile Range20.00 Skewness.427.062 Kurtosis-.503.124 Descriptives
7
Exercise: Help What’s This? Results Coach Case Studies
8
Measures of central tendency Most commonly: –Mode –Median –Mean 5 per cent trimmed mean
9
The mode The mode is the most frequently occurring value in a dataset Suitable for nominal data and above Example: –The mode of the first most frequently used drug is Alcohol, with 717 cases, approximately 46 per cent of valid responses
10
Bimodal Describes a distribution Two categories have a large number of cases Example: –The distribution of Employment is bimodal, employment and unemployment having a similar number of cases and more cases than the other categories
11
The median The middle value when the data are ordered from low to high is the median Half the data values lie below the median and half above The data have to be ordered so the median is not suitable for nominal data, but is suitable for ordinal levels of measurement and above
12
Example: median Seizures of opium in Germany, 1994-1998 (Kilograms) Source: United Nations (2000). World Drug Report 2000 (United Nations publication, Sales No. GV.E.00.0.10). Year19941995199619971998 Seizure36154542286
13
Sort the seizure data in ascending order The middle value is the median; the median annual seizures of opium for Germany between 1994 and 1998 was 42 kilograms Year19951994199719961998 Seizure15364245286 Ranked: 1 2 3 4 5
14
The mean Add the values in the data set and divide by the number of values The mean is only truly applicable to interval and ratio data, as it involves adding the variables It is sometimes applied to ordinal data or ordinal scales constructed from a number of Likert scales, but this requires the assumption that the difference between the values in the scale is the same, e.g. between 1 and 2 is the same as between 5 and 6
15
Example: mean Seizures of opium in Germany, 1994-1998 Sample size = 5 36 + 15 + 45 + 42 + 286 = 424 424/5 = 84.8 Year19941995199619971998 Seizure36154542286
16
The 5 per cent trimmed mean The 5 per cent trimmed mean is the mean calculated on the data set with the top 5 per cent and bottom 5 per cent of values removed An estimator that is more resistant to outliers than the mean
17
95 per cent confidence interval for the mean An indication of the expected error (precision) when estimating the population mean with the sample mean In repeated sampling, the equation used to calculate the confidence interval around the sample mean will contain the population mean 95 times out of 100
18
Measures of dispersion The range The inter-quartile range The variance The standard deviation
19
The range A measure of the spread of the data Range = maximum – minimum
20
Quartiles 1 st quartile: 25 per cent of the values lie below the value of the 1 st quartile and 75 per cent above 2 nd quartile: the median: 50 per cent of values below and 50 per cent of values above 3 rd quartile: 75 per cent of values below and 25 per cent of the values above
21
Inter-quartile range IQR = 3 rd Quartile – 1 st Quartile The inter-quartile range measures the spread or range of the mid 50 per cent of the data Ordinal level of measurement or above
22
Variance The average squared difference from the mean Measured in units squared Requires interval or ratio levels of measurement
23
Standard deviation The square root of the variance Returns the units to those of the original variable
24
Example: standard deviation and variance Seizures of opium in Germany, 1994-1998 YearSeizureDeviationsSquared deviations 199436-48.82381.44 199515-69.84872.04 199645-39.81584.04 199742-42.81831.84 1998286201.240481.44 Total424051150.8 Count55 Mean84.8Variance10230 Standard deviation 101
25
Distribution or shape of the data The normal distribution Skewness: –Positive or right-hand skewed –Negative or left-hand skewed Kurtosis: –Platykurtic –Mesokurtic –Leptokurtic
26
Symmetrical data: the mean, the median and the mode coincide Mean Median Mode f(X) X The normal distribution
27
Right-hand skew (+) Right-hand skew: the extreme large values drag the mean towards them f(X) XModeMedianMean
28
Left-hand skew (-) Left-hand skew: the extreme small values drag the mean towards them ModeMeanMedianX f(X)
29
Bivariate analysis Continuous Dependent Variable Categorical Independent Variable
30
Explore
31
Explore: Options button
32
Explore: Plots button
33
Explore: Statistics button
34
GenderStatisticStd. Error AGEMaleMean31.43.340 95% Confidence Interval for Mean Lower Bound30.76 Upper Bound32.09 5% Trimmed Mean31.03 Median30.00 Variance144.286 Std. Deviation12.012 Minimum1 Maximum70 Range69 Interquartile Range19.00 Skewness.370.069 Kurtosis-.573.138 FemaleMean33.39.789 95% Confidence Interval for Mean Lower Bound31.84 Upper Bound34.94 5% Trimmed Mean32.77 Median33.00 Variance193.593 Std. Deviation13.914 Minimum14 Maximum77 Range63 Interquartile Range23.00 Skewness.472.138 Kurtosis-.602.376 Descriptives
35
Male Female
36
Boxplot of Age vs Gender Median Inter-quartile range Outlier
37
Syntax: Explore EXAMINE VARIABLES=age BY gender /ID=id /PLOT BOXPLOT HISTOGRAM /COMPARE GROUP /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.
38
Summary Measures of central tendency Measures of variation Quantiles Measures of shape Bivariate analysis for a categorical independent variable and continuous dependent variable Histograms Boxplots
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.