Download presentation
Presentation is loading. Please wait.
Published byPreston Rich Modified over 6 years ago
1
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Measures of Central Tendency And Dispersion
2
Measures of Central Tendency
1. Mode = can be used for any kind of data but only measure of central tendency for nominal or qualitative data. Formula: value that occurs most often or the category or interval with highest frequency. Note: Omit Formula 3.1 Variation Ratio in Healey and Prus 2nd Cdn.
3
Example for Nominal Variables:
Religion frequency cf proportion % Cum% Catholic Protestant Jewish Muslim Other None Total % Central Tendency: MODE = largest category = Catholic
4
Central Tendency (cont.)
2. Median = exact centre or middle of ordered data. The 50th percentile. Formula: Array data. When sample size is even, median falls halfway between two middle numbers. To calculate: find (n/2) and (n/2)+1, and divide the total by 2 to find the exact median. When sample size is odd, median is exact middle (n+1) /2
5
Example for Raw Data: Suppose you have the following set of test scores: 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69 1. Array (put in order) your data: N = 20 (N is even)
6
To calculate: - find middle numbers(n/2)+(n/2 )+1 - add together the two middle numbers - divide the total by 2 First middle number: (20/2) = the 10th number 2nd middle number: (20/2)+1 = the 11th number Look at data: the middle numbers are 69 and 68 The median would be (69+68)/2 = 68.5
7
Median for Aggregate (grouped) Data
This formula is shown in Healey 1st Cdn Edition but NOT in 2/3 Cdn We will NOT COVER this one!
8
Properties of median: - for numerical data at interval or ordinal level -"balance point“ -not affected by outliers -median is appropriate when distribution is highly skewed.
9
3. Mean for Raw Data The mean is the sum of measurements / number of subjects Formula: (X-bar) = ΣXi / N Data (from above): 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69
10
Example for Mean Formula: = ΣXi / N = 1446 / 20 = 72.3
The mean for these test scores is 72.30
11
Mean for Aggregate (Grouped) Data (Note: not in text but covered in class)
To calculate the mean for grouped data, you need a frequency table that includes a column for the midpoints, for the product of the frequencies times the midpoints (fm). Formula: = Σ (fm) N
12
Frequency table: Score f m* (fm) 41-50 1 45.5 45.5 51-60 3 55.5 166.5
N = Σ (fm) = 1420 * Find midpoints first
13
Calculating Mean for Grouped Data:
Formula: = Σ (fm) N = 1420 / 20 = 71 The mean for the grouped data is 71.
14
Properties of the Mean:
- only for numerical data at interval level - "balance point“ - can be affected by outliers = skewed distribution - tail becomes elongated and the mean is pulled in direction of outlier. Example… no outlier: $30000, 30000, 35000, 25000, then mean = $30000 but if outlier is present, then: $130000, 30000, 35000, 25000, then mean = $50000 (the mean is pulled up or down in the direction of the outlier)
15
NOTE: When distribution is symmetric, mean = median = mode
For skewed, mean will lie in direction of skew. i.e. skewed to right (tail pulled to right) mean > median (positive skew) skewed to left (tail pulled to left) median > mean (negative skew)
16
Measures of Dispersion
Describe how variable the data are. i.e. how spread out around the mean Also called measures of variation or variability
17
Variability for Non-numerical Data (Nominal or Ordinal Level Data)
Measures of variability for non-numerical nominal or ordinal) data are rarely used We will not be covering these in class Omit Formula 4.1 IQV in Healey and Prus 1st Canadian Edition Omit Formula 3.1 Variation Ratio in Healey and Prus 2/3 Canadian Edition
18
2. Range (for numerical data)
Range = difference between largest and smallest observations i.e. if data are $130000, 35000, 30000, 30000, 30000, 30000, 25000, 25000 then range = = $105000
19
Interquartile Range (Q):
This is the difference between the 75th and the 25th percentiles (the middle 50%) Gives better idea than range of what the middle of the distribution looks like. Formula: Q = Q3 - Q1 (where Q3 = N x .75, and Q1 = N x .25) Using above data: Q = Q3 - Q1 = (6th – 2nd case) = $ =$5000 The interquartile range (Q) is $5000.
20
3. Variance and Standard Deviation:
For raw data at the interval/ratio level. Most common measure of variation. The numerator in the formula is known as the sum of squares, and the denominator is either the population size N or the sample size n-1 The variance is denoted by S2 and the standard deviation, which is the square root of the variance, by S
21
Definitional Formula for Variance and Standard Deviation:
Variance: s2 = Σ (xi )2 / N Standard Deviation: s = (the standard deviation is the square root of the variance; the variance is simply the standard deviation squared)
22
Example for S and S2 : Data: 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69 Find ∑ Xi2 : Square each Xi and find total. Find (∑ Xi)2 : Find total of all Xi and square. Substitute above and N into formula for S. For S2 , simply square S. S = S2 =
23
A working formula for the standard deviation:
Note: the definitional formula for standard deviation is not practical for use with data when N>10. The working formula, which is much easier to do on your calculator, should be used instead. Both formulae give exactly the same result. Try it!
24
Properties of S: always greater than or equal to 0
the greater the variation about mean, the greater S is n-1 corrects for bias when using sample data. S tends to underestimate the real population standard deviation when based on sample data so to correct for this, we use n-1. The larger the sample size, the smaller difference this correction makes. When calculating the standard deviation for the whole population, use N in the denominator.
25
NOTE: σ, N and Mu (µ) denote population parameters
s, n, x-bar ( ) denote sample statistics
26
Remember the Rounding Rules!
Always use as many decimal places as your calculator can handle. Round your final answer to 2 decimal places, rounding to nearest number. Engineers Rule: When last digit is exactly 5 (followed by 0’s), round the digit before the last digit to nearest EVEN number.
27
Homework Questions Healey and Prus 1e:
#3.1, #3.5, #3.11 (compute s for 8 nations also), #3.15 SPSS: Read the SPSS sections for Ch. 3 and 4 in 1st Cdn. Edition and for Ch. 4 in 2/3 Cdn. Edition Try some of the SPSS exercises for practice
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.