Presentation is loading. Please wait.

Presentation is loading. Please wait.

TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal.

Similar presentations


Presentation on theme: "TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal."— Presentation transcript:

1 TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal

2 Ratio scale  constant size interval between adjacent values on the measurement scale  existence of a meaningful zero point

3 Interval scale  constant size interval between adjacent values on the measurement scale  no true zero value N S EW 0 -10 10

4 Ordinal scale  data that convey only relative magnitude TallMediumShort Dark Medium Light

5 Nominal scale  data in which there is no meaningful numerical information Single Married Divorced Widowed

6 Another useful classification Continuous Discrete  data can take-on any value  data can take-on only certain values Eg height 150 to 210cm range Bill - 174.25 cm Eg # of hands 0 to 3 range Bill - 2 hands

7 2 more important issues with data Accuracy Accuracy  how close is a measured value to the real value Precision Precision  how close repeated measurements are to one another real Let’s say Bill’s real height is 174.25 cm.

8 Accurate Precise 174.25 Not Accurate Not Precise 172 178 171 174 182 168 Not Accurate Precise 170.25

9 Frequency Distribution  occurrence of the various values observed for the variable  raw frequency  counts  relative frequency  counts divided by total number of observations

10

11 Variable: Hair Colour Sample size = 5 Frequency of Black Hair = 2 Frequency of Brown Hair = 3 Must add to 5 Relative Frequency of Black Hair = 2/5 = 0.4 Relative Frequency of Brown Hair = 3/5 = 0.6 Must add to 1

12 Variable: Height Sample size = 5 Frequency of 168 cm = 1 Frequency of 172 cm = 1 Frequency of 175 cm = 1 Frequency of 178 cm = 1 Frequency of 183 cm = 1 Relative Frequency of 168 cm = 1/5 = 0.2 Relative Frequency of 172 cm = 1/5 = 0.2 Relative Frequency of 175 cm = 1/5 = 0.2 Relative Frequency of 178 cm = 1/5 = 0.2 Relative Frequency of 183 cm = 1/5 = 0.2

13 Make categories Eg. Number above and number below mid- point of range Range: Maximum - Minimum 183 cm - 168 cm = 15 cm Mid-point: half way between Min and Max = Min + (Range / 2) = 168 cm + 7.5 cm = 175.5 cm

14 Frequency of Heights Below 175.5 cm = 3 Frequency of Heights Above 175.5 cm = 2 Relative Frequency of Heights Below 175.5 cm = 3/5 = 0.6 Relative Frequency of Heights Above 175.5 cm = 2/5 = 0.4

15 THREE Could make THREE categories Divide range by 3: 15 cm / 3 = 5 cm Category 1: 168 cm to 168 cm + 5 cm  168 cm to 173 cm Category 2: 174 cm to 174 cm + 5 cm  174 cm to 179 cm Category 3: 180 cm to 180 cm + 5 cm  180 cm to 185 cm

16 Frequency of Heights in 168 cm to 172 cm = 2 Frequency of Heights in 173 cm to 178 cm = 2 Frequency of Heights in 179 cm to 184 cm = 1 Relative Frequency of Heights in 168 cm to 172 cm = 2/5 = 0.4 Relative Frequency of Heights in 173 cm to 178 cm = 2/5 = 0.4 Relative Frequency of Heights in 179 cm to 184 cm = 1/5 = 0.2

17 Mother’s age and babies birth weight data from Massachusetts

18 Range of the Birth Weight data: Minimum: 709 g Maximum: 4990 g Difference: 4281 g Let’s say we want to look at the distribution of data across 10 categories. Each category would span 428.1 g, but for convenience we’ll round to 430 g. Also, instead of starting our first category at 709 g we’ll use 700g

19 Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 0.015873016 0.074074074 0.153439153 0.17989418 0.232804233 0.174603175 0.121693122 0.021164021 0.010582011 3 14 29 34 44 33 23 4 2 Freq.Rel. Freq.

20 Previous breakdown ok as long as I have measured weight to the nearest gram. BUT, if I’ve measure to the nearest 0.1 gram --> my categories may miss some observations So need to adjust…

21 Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 Range 700-1130.9 1131-1560.9 1561-1990.9 1991-2420.9 2421-2850.9 2851-3280.9 3281-3710.9 3711-4140.9 4141-4750.9 4751-5000.9 Measured to the nearest gramMeasured to the nearest 0.1 gram

22 Histogram Histogram - graphical representation of a frequency distribution Frequency Hair colour

23 Birth Weight Category Frequency Frequency distribution of neonatal birth weight

24 Birth Weight Category Relative Frequency Frequency distribution of neonatal birth weight

25 Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 Mid-point 915 1346 1776 2206 2636 3066 3496 3926 4356 4966

26 Birth Weight Category Mid-point Frequency Frequency distribution of neonatal birth weight

27 Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 0.0158 0.07407 0.15343 0.17989 0.23280 0.17460 0.12169 0.02116 0.01058 3 14 29 34 44 33 23 4 2 Freq. Rel. Freq.Cum. Freq. 0.0158 0.0317 0.1058 0.2592 0.4391 0.6719 0.8465 0.9682 0.9894 1.0 Cumulative Frequency Cumulative Frequency - Cum. Freq. at any category is equal to the frequency at that category plus the frequency in each previous category.

28 Birth Weight Category Cumulative Frequency Frequency distribution of neonatal birth weight

29 Measures of Central Tendency Mean Median Mode  These generally tell you where the majority of the observations lie  Each one tells something slightly different Average Middle Value Most Frequent Value

30 The Mean: The mean is calculated by summing the observed values and dividing the sum by the total number of observations. Population Mean = μ Sample Mean =

31 A die has 6 sides, 1 dot, 2, 3, 4, 5, and 6

32

33

34 Rishi Anne Bill Cristin Rich Observation i Height X i 1234512345 172 185 132 191 205 n = 5  = 885

35 n = 189

36

37 Another way to calculate the mean Suppose you had a frequency distribution for the number of cancerous moles on people who regularly visit Club Med # cancerous moles (X) Frequency (f) 012345012345 8 4 8 10 2 1

38 # cancerous moles (x) Frequency (f) 012345012345 8 4 8 10 2 1 0 4 16 30 8 5 f * x n =  f’s  X’s =  f*x n = 33  f*x = 63

39 The Mode: The Mode: the most frequently occurring value in a set of measurements Birth Weight Category Frequency Frequency distribution of neonatal birth weight

40 Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 0.015873016 0.074074074 0.153439153 0.17989418 0.232804233 0.174603175 0.121693122 0.021164021 0.010582011 3 14 29 34 44 33 23 4 2 Freq.Rel. Freq. Mid-point is 3065.5 --> report the MODE as 3065.5

41 The Median: the middle measurement of a set of data --> data must be ordered Heights (cm) 178 143 123 189 187 205 168 173 198 Ordered Heights (cm) 123 143 168 173 178 187 189 198 205 Observation (X) 1 2 3 4 5 6 7 8 9 Median is 178 cm

42 Heights (cm) 178 143 123 189 187 205 168 173 198 162 Ordered Heights (cm) 123 143 162 168 173 178 187 189 198 205 Observation (X) 1 2 3 4 5 6 7 8 9 10 Middle observation is 5.5 --> median is midway between observation 5 and observation 6 Median is (173+178)/2 = 175.5

43 General formula for Median: odd If n is an odd number:

44 General formula for Median: even If n is an even number:

45 # cancerous moles (X) Frequency (f) 012345012345 8 4 8 10 2 1 Cumulative Frequency 8 12 20 30 32 33 M = X (n+1)/2 =X 17 =2

46 000000001111000000001111 222222223333222222223333 333333445333333445

47 Category 1 2 3 4 5 6 7 8 9 10 Range 700-1130 1131-1560 1561-1990 1991-2420 2421-2850 2851-3280 3281-3710 3711-4140 4141-4750 4751-5000 3 6 20 49 83 127 160 183 187 189 3 14 29 34 44 33 23 4 2 Freq.Cum. Freq. M = X (n+1)/2 = X 190/2 = X 95

48 Median = (lower limit of class) + ((0.5*n - cum.freq.)/#obs in interval)(interval size) = 2851 + ((0.5*189- 83)/44) * (430) = 2851 + (94.5-83)/44 *430 = 2963.4 Of the previous class

49 Birth Weight Category Frequency Frequency distribution of neonatal birth weight

50 Symetrical, unimodal distribution Mean, Mode and Median

51 Symetrical, bimodal distribution Mean Medain Mode

52 Asymmetric distribution Mode MedianMean

53 Asymmetric distribution MeanMedianMode

54 Measures of Dispersion and Variability

55 Birth Weight Category Frequency Frequency distribution of neonatal birth weight

56 Birth Weight (g) Mean Maximum Minimum Range

57

58 Birth Weight (g) Mean Maximum Minimum Observation i Deviation

59

60 Average Deviation from the Mean --> on average, how much do the individual observations differ from the mean?

61 X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 1.2-1.8 = -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6  X=12.6 n=7 i1234567i1234567

62

63 Average Absolute Deviation from the Mean --> on average, how much do the individual observations differ from the mean?

64 X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 1.2-1.8 = -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.0  X=12.6 n=7 i1234567i1234567 |1.2-1.8| = 0.6 0.4 0.2 0.0 0.2 0.4 0.6

65 Sum of Squared Deviations “Sum of Squares”

66 X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.0  X=12.6 n=7 i1234567i1234567 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.34 (-0.6) 2 = 0.36 0.16 0.04 0 0.04 0.16 0.36 1.12

67 Variance --> mean sum of squares Population Sample

68 X i 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.0  X=12.6 n=7 i1234567i1234567 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.34 (-0.6) 2 = 0.36 0.16 0.04 0 0.04 0.16 0.36 1.12

69 Standard Deviation Population Sample

70 Coefficient of Variation --> allows comparison of variability among samples measured in different units or scales. S expressed as a % of the mean

71 Mean Deviation Variance Standard deviation CV 0.34 0.1867 0.43 0.24 0.26 0.1367 0.37 0.21

72 Standard Error of the Mean  Recall: x and s are estimates of μ and σ  How good are these measures??  Need level of uncertainty (due to sampling error) in the mean: SE x = s/√ n

73 Confidence Intervals  SE = measure of how far x is likely to be from μ  2 * SE = 95% confidence  I.e. μ is inside 2 * SE 95% of the time

74 Reporting variability about the mean. Text In a table as in previous slide. Or, for example, in a manuscript, I might write: The mean (± 95% CI) for the random samples of 100, 50, 25 and 10 was 24.84079 (±0.1816), 24.91241(±0.31996), 24.86719 (±0.40142) and 25.16212 (±0.859) respectively. You are not restricted to using the confidence intervals when reporting variability about the mean, ie I could have used mean ± std dev, or mean ± std error

75 Graphically: Box Plot or Box and Whisker Plot Mean Standard Error 95% CI

76 Graphically: Box Plot or Box and Whisker Plot Mean Standard Error 95% CI

77 Graphically: Box Plot or Box and Whisker Plot Mean 95% CI

78 Graphically: Box Plot or Box and Whisker Plot Mean 95% CI


Download ppt "TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal."

Similar presentations


Ads by Google