Download presentation
Presentation is loading. Please wait.
Published byBrice Stevens Modified over 9 years ago
1
TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal
2
Ratio scale constant size interval between adjacent values on the measurement scale existence of a meaningful zero point
3
Interval scale constant size interval between adjacent values on the measurement scale no true zero value N S EW 0 -10 10
4
Ordinal scale data that convey only relative magnitude TallMediumShort Dark Medium Light
5
Nominal scale data in which there is no meaningful numerical information Single Married Divorced Widowed
6
Another useful classification Continuous Discrete data can take-on any value data can take-on only certain values Eg height 150 to 210cm range Bill - 174.25 cm Eg # of hands 0 to 3 range Bill - 2 hands
7
2 more important issues with data Accuracy Accuracy how close is a measured value to the real value Precision Precision how close repeated measurements are to one another real Let’s say Bill’s real height is 174.25 cm.
8
Accurate Precise 174.25 Not Accurate Not Precise 172 178 171 174 182 168 Not Accurate Precise 170.25
9
Frequency Distribution occurrence of the various values observed for the variable raw frequency counts relative frequency counts divided by total number of observations
11
Variable: Hair Colour Sample size = 5 Frequency of Black Hair = 2 Frequency of Brown Hair = 3 Must add to 5 Relative Frequency of Black Hair = 2/5 = 0.4 Relative Frequency of Brown Hair = 3/5 = 0.6 Must add to 1
12
Variable: Height Sample size = 5 Frequency of 168 cm = 1 Frequency of 172 cm = 1 Frequency of 175 cm = 1 Frequency of 178 cm = 1 Frequency of 183 cm = 1 Relative Frequency of 168 cm = 1/5 = 0.2 Relative Frequency of 172 cm = 1/5 = 0.2 Relative Frequency of 175 cm = 1/5 = 0.2 Relative Frequency of 178 cm = 1/5 = 0.2 Relative Frequency of 183 cm = 1/5 = 0.2
13
Make categories Eg. Number above and number below mid- point of range Range: Maximum - Minimum 183 cm - 168 cm = 15 cm Mid-point: half way between Min and Max = Min + (Range / 2) = 168 cm + 7.5 cm = 175.5 cm
14
Frequency of Heights Below 175.5 cm = 3 Frequency of Heights Above 175.5 cm = 2 Relative Frequency of Heights Below 175.5 cm = 3/5 = 0.6 Relative Frequency of Heights Above 175.5 cm = 2/5 = 0.4
15
THREE Could make THREE categories Divide range by 3: 15 cm / 3 = 5 cm Category 1: 168 cm to 168 cm + 5 cm 168 cm to 173 cm Category 2: 174 cm to 174 cm + 5 cm 174 cm to 179 cm Category 3: 180 cm to 180 cm + 5 cm 180 cm to 185 cm
16
Frequency of Heights in 168 cm to 172 cm = 2 Frequency of Heights in 173 cm to 178 cm = 2 Frequency of Heights in 179 cm to 184 cm = 1 Relative Frequency of Heights in 168 cm to 172 cm = 2/5 = 0.4 Relative Frequency of Heights in 173 cm to 178 cm = 2/5 = 0.4 Relative Frequency of Heights in 179 cm to 184 cm = 1/5 = 0.2
17
Mother’s age and babies birth weight data from Massachusetts
18
Range of the Birth Weight data: Minimum: 709 g Maximum: 4990 g Difference: 4281 g Let’s say we want to look at the distribution of data across 10 categories. Each category would span 428.1 g, but for convenience we’ll round to 430 g. Also, instead of starting our first category at 709 g we’ll use 700g
19
Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 0.015873016 0.074074074 0.153439153 0.17989418 0.232804233 0.174603175 0.121693122 0.021164021 0.010582011 3 14 29 34 44 33 23 4 2 Freq.Rel. Freq.
20
Previous breakdown ok as long as I have measured weight to the nearest gram. BUT, if I’ve measure to the nearest 0.1 gram --> my categories may miss some observations So need to adjust…
21
Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 Range 700-1130.9 1131-1560.9 1561-1990.9 1991-2420.9 2421-2850.9 2851-3280.9 3281-3710.9 3711-4140.9 4141-4750.9 4751-5000.9 Measured to the nearest gramMeasured to the nearest 0.1 gram
22
Histogram Histogram - graphical representation of a frequency distribution Frequency Hair colour
23
Birth Weight Category Frequency Frequency distribution of neonatal birth weight
24
Birth Weight Category Relative Frequency Frequency distribution of neonatal birth weight
25
Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 Mid-point 915 1346 1776 2206 2636 3066 3496 3926 4356 4966
26
Birth Weight Category Mid-point Frequency Frequency distribution of neonatal birth weight
27
Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 0.0158 0.07407 0.15343 0.17989 0.23280 0.17460 0.12169 0.02116 0.01058 3 14 29 34 44 33 23 4 2 Freq. Rel. Freq.Cum. Freq. 0.0158 0.0317 0.1058 0.2592 0.4391 0.6719 0.8465 0.9682 0.9894 1.0 Cumulative Frequency Cumulative Frequency - Cum. Freq. at any category is equal to the frequency at that category plus the frequency in each previous category.
28
Birth Weight Category Cumulative Frequency Frequency distribution of neonatal birth weight
29
Measures of Central Tendency Mean Median Mode These generally tell you where the majority of the observations lie Each one tells something slightly different Average Middle Value Most Frequent Value
30
The Mean: The mean is calculated by summing the observed values and dividing the sum by the total number of observations. Population Mean = μ Sample Mean =
31
A die has 6 sides, 1 dot, 2, 3, 4, 5, and 6
34
Rishi Anne Bill Cristin Rich Observation i Height X i 1234512345 172 185 132 191 205 n = 5 = 885
35
n = 189
37
Another way to calculate the mean Suppose you had a frequency distribution for the number of cancerous moles on people who regularly visit Club Med # cancerous moles (X) Frequency (f) 012345012345 8 4 8 10 2 1
38
# cancerous moles (x) Frequency (f) 012345012345 8 4 8 10 2 1 0 4 16 30 8 5 f * x n = f’s X’s = f*x n = 33 f*x = 63
39
The Mode: The Mode: the most frequently occurring value in a set of measurements Birth Weight Category Frequency Frequency distribution of neonatal birth weight
40
Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 0.015873016 0.074074074 0.153439153 0.17989418 0.232804233 0.174603175 0.121693122 0.021164021 0.010582011 3 14 29 34 44 33 23 4 2 Freq.Rel. Freq. Mid-point is 3065.5 --> report the MODE as 3065.5
41
The Median: the middle measurement of a set of data --> data must be ordered Heights (cm) 178 143 123 189 187 205 168 173 198 Ordered Heights (cm) 123 143 168 173 178 187 189 198 205 Observation (X) 1 2 3 4 5 6 7 8 9 Median is 178 cm
42
Heights (cm) 178 143 123 189 187 205 168 173 198 162 Ordered Heights (cm) 123 143 162 168 173 178 187 189 198 205 Observation (X) 1 2 3 4 5 6 7 8 9 10 Middle observation is 5.5 --> median is midway between observation 5 and observation 6 Median is (173+178)/2 = 175.5
43
General formula for Median: odd If n is an odd number:
44
General formula for Median: even If n is an even number:
45
# cancerous moles (X) Frequency (f) 012345012345 8 4 8 10 2 1 Cumulative Frequency 8 12 20 30 32 33 M = X (n+1)/2 =X 17 =2
46
000000001111000000001111 222222223333222222223333 333333445333333445
47
Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 3 6 20 49 83 127 160 183 187 189 3 14 29 34 44 33 23 4 2 Freq.Cum. Freq. M = X (n+1)/2 = X 190/2 = X 95
48
Median = (lower limit of class) + ((0.5*n - cum.freq.)/#obs in interval)(interval size) = 2851 + ((0.5*189- 83)/44) * (430) = 2851 + (94.5-83)/44 *430 = 2963.4 Of the previous class
49
Birth Weight Category Frequency Frequency distribution of neonatal birth weight
50
Symetrical, unimodal distribution Mean, Mode and Median
51
Symetrical, bimodal distribution Mean Medain Mode
52
Asymmetric distribution Mode MedianMean
53
Asymmetric distribution MeanMedianMode
54
Measures of Dispersion and Variability
55
Birth Weight Category Frequency Frequency distribution of neonatal birth weight
56
Birth Weight (g) Mean Maximum Minimum Range
58
Birth Weight (g) Mean Maximum Minimum Observation i Deviation
60
Average Deviation from the Mean --> on average, how much do the individual observations differ from the mean?
61
X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 1.2-1.8 = -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 X=12.6 n=7 i1234567i1234567
63
Average Absolute Deviation from the Mean --> on average, how much do the individual observations differ from the mean?
64
X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 1.2-1.8 = -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.0 X=12.6 n=7 i1234567i1234567 |1.2-1.8| = 0.6 0.4 0.2 0.0 0.2 0.4 0.6
65
Sum of Squared Deviations “Sum of Squares”
66
X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.0 X=12.6 n=7 i1234567i1234567 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.34 (-0.6) 2 = 0.36 0.16 0.04 0 0.04 0.16 0.36 1.12
67
Variance --> mean sum of squares Population Sample
68
X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.0 X=12.6 n=7 i1234567i1234567 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.34 (-0.6) 2 = 0.36 0.16 0.04 0 0.04 0.16 0.36 1.12
69
Standard Deviation Population Sample
70
Coefficient of Variation --> allows comparison of variability among samples measured in different units or scales. S expressed as a % of the mean
71
Mean Deviation Variance Standard deviation CV 0.34 0.1867 0.43 0.24 0.26 0.1367 0.37 0.21
72
Standard Error of the Mean Recall: x and s are estimates of μ and σ How good are these measures?? Need level of uncertainty (due to sampling error) in the mean: SE x = s/√ n
73
Confidence Intervals SE = measure of how far x is likely to be from μ 2 * SE = 95% confidence I.e. μ is inside 2 * SE 95% of the time
74
Reporting variability about the mean. Text In a table as in previous slide. Or, for example, in a manuscript, I might write: The mean (± 95% CI) for the random samples of 100, 50, 25 and 10 was 24.84079 (±0.1816), 24.91241(±0.31996), 24.86719 (±0.40142) and 25.16212 (±0.859) respectively. You are not restricted to using the confidence intervals when reporting variability about the mean, ie I could have used mean ± std dev, or mean ± std error
75
Graphically: Box Plot or Box and Whisker Plot Mean Standard Error 95% CI
76
Graphically: Box Plot or Box and Whisker Plot Mean Standard Error 95% CI
77
Graphically: Box Plot or Box and Whisker Plot Mean 95% CI
78
Graphically: Box Plot or Box and Whisker Plot Mean 95% CI
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.