Download presentation
Presentation is loading. Please wait.
Published byColleen Grant Modified over 9 years ago
1
Chapter-2 Statistical description of quantitative variable
2
Teaching contents In this section, we shall study descriptive techniques of quantitative variable. Section 1 Frequency distribution table and frequency distribution graph Section 2 Measures of central tendency Section 3 Measures of dispersion tendency
3
Teaching aims To learn the usage of frequency table and graph. To master the application of different indexes.
4
Department of Health Statistics Section 1 Frequency distribution table and frequency distribution graph
5
part 1 Frequency distribution table and graph of qualitative variable qualitative variable part 2 Frequency distribution table and graph of quantitative variable quantitative variable part 3 Usage of frequency distribution graph Department of Health Statistics NEXT
6
[Example 1.1] university officials periodically review the distribution of undergraduate majors to help determine a fair allocation of resources, and the following data were obtained Department of Health Statistics Table 1.1 the distribution of undergraduate majors
7
Department of Health Statistics back Fig 1.1 the distribution of undergraduate majors
8
[Example 1. 2 ] The techniques will be illustrated using the Scottish Heart Health Study, but for simplicity we shall now take only one variable recorded on 50 subjects. Department of Health Statistics
9
5.756.296.136.786.46 6.765.986.256.315.99 6.475.715.194.355.35 7.116.896.057.015.86 5.424.927.125.855.64 7.046.235.716.746.36 5.757.716.197.556.76 7.145.736.737.865.51 6.026.545.346.927.15 6.557.164.796.646.83 Table 1.2 Serum total cholesterol (mmol/L) of 50 subjects from the Scottish Heart Health Study
10
How to describe the data in table 1.2? List all the data one by one, but it is difficult for the reader to learn the distribution character of 50 individuals. Summarize it using specific index, which is economical in space and easier for the reader to understand.specific index Department of Health Statistics
11
FREQUENCY DISTRIBUTION TABLE and FREQUENCY DISTRIBUTION GRAPH Step 1 to find MIN and MAX, and compute range Step 2 set up class intervals Step 3 set all the data in one of the class intervals Department of Health Statistics
12
MIN 4.35 MAX 7.86 RANGE 3.51 Range is the difference between MAX and MIN Department of Health Statistics Step 1
13
Divide the range by the approximate number of class intervals. Generally we will wish to have 7 to 15 class intervals, which is related with sample size. The larger sample size is, the more class intervals there are accordingly. Department of Health Statistics Step 2
14
Suppose we wish to have 7 class intervals, then the interval width is 3.51(range)/7 ≈ 0.5 So we choose 0.5 as the interval width. Department of Health Statistics Step 2
15
Divide the range by the desired number of subintervals. Department of Health Statistics Step 2 Your attention: The first subinterval must contain MIN, and the last one must include MAX.
16
Construct frequency distribution and keep a tally of the number of measurements falling in a each interval. Department of Health Statistics Step 3
17
Your attention: Each class interval include the lower limit (L), but not the upper limit (U). For example, there is a data of 5.5, it should be in the forth group. Department of Health Statistics Step 3
18
Department of Health Statistics Lower limit Upper limit Table 1.3 frequency distribution table for serum total cholesterol Percentage is frequency divided by sample size(50)
19
Department of Health Statistics Fig 1.2 frequency distribution graph for serum total cholesterol
20
Department of Health Statistics The difference
21
Usage of frequency distribution graph 1 To describe the distribution characters of frequency. From table 3 and figure 2, we can know serum total cholesterol of most people is from 5.0 to 7.0 mol/L, the proportion beyond is very small. Department of Health Statistics
22
How to describe the distribution characters of data? Central tendency Dispersion tendency Department of Health Statistics
23
Describe How Data Are Distributed Positive-SkewedNegative-SkewedSymmetric
24
Table 2 Mercury concentration Of hair in 238 health people Mercury concentration Of hair number Positive-Skewed
25
table3 Myoglobin concentration in blood serum of 101 normal people number Negative-Skewed Myoglobin concentration In blood serum
26
2 From the frequency distribution, we can find the outlier ( too large or too small value) very easily. For instance, all the serum total cholesterol is from 4.0 to 8.0, if one value is 28 (too large, we think it ’ s impossible), we called it outlier and should check whether it is right. 3 It is a way of describing data. Department of Health Statistics
27
Section 2 Measures of central tendency
28
arithmetic mean geometric mean Median and Percentile Mode 2 1 3 4 Central tendency Central tendency reflects the average level of a series of measurements.
29
The arithmetic mean [Definition] The arithmetic mean, also called mean, is defined to be the sum of the measurements divided by the total number measurements. Department of Health Statistics
30
[symbols] the population mean is denoted by the Greek letter μ (read “ mu ” ) and the sample mean is denoted by the symbol (read “ X-bar ” ) [Sample mean] Department of Health Statistics n is the total number of observations. X is a particular value. (read “sigma”) indicates the operation of adding. mean [Population Mean]
31
[example2.1] The mean score on a given test can be found for an entire class. Take a look at this American History class : Department of Health Statistics mean
32
[solution] We find the mean score, by adding all the scores together and dividing by 10 (the number of scores). Department of Health Statistics mean
33
Department of Health Statistics All the values are included while computing the mean. The mean is easily affected by largest or smallest values. mean [ Properties of the Arithmetic Mean]
34
Department of Health Statistics [notice] Mean can only be used in homogenous data. For example, we can compute the mean height of ten-year-old boys. But it is unscientific to calculate the mean height of boys from 1 to 14 years. Only when the distribution is normal, can we compute mean. mean
35
Department of Health Statistics mean Mean can be used.
36
Department of Health Statistics Geometric Mean [Definition] The geometric mean is defined as the n th root of the product of the n numbers. [symbol] G Geometric Mean
37
[formula] Department of Health Statistics Geometric Mean
38
Department of Health Statistics [Example 2.3] The antibody ’ s levels of serum of six patients are listed. 1:10 , 1:20 , 1:40 , 1:80 , 1:80 , 1:160, Please calculate the geometric mean? Geometric Mean
39
[solution] Department of Health Statistics Geometric Mean So the Geometric Mean is 1:45 X is reciprocal of antibody ’ s level; and lgX is the logarithm of reciprocal. Sample size Inverse logarithm
40
Department of Health Statistics [Usage of G ] Geometric mean is often used in geometric proportion data. Such as 1:2 1:4 1:8 1:16 1:32 Geometric Mean
41
Median [Definition] The median, also called 50th percentile, is the midpoint of the observations when they are arranged in ascending order. Department of Health Statistics median
42
[formula] When n is odd, the median is still the middle value when the data are arranged in ascending order. Department of Health Statistics When n is even, the median is the mean of the middle two values when the data are arranged in ascending order.. median
43
[Example 2.5] Each of 7children in the second grade was given a reading aptitude test, the scores were as shown below. 95 86 64 81 75 76 69 Determine the median test score. Department of Health Statistics median
44
[solution] Firstly, we must arrange the scores in ascending order 64 69 75 76 81 86 95 There are 7 measurements, and the forth is the midpoint value, so the median is 76, or we can use formula Department of Health Statistics median
45
[Example 2.6] An experiment was conducted to measure the effectiveness of a new procedure pruning grapes. 10 were assigned the task of pruning an acre of grapes. The productivity, measured in worker-hours/acre, is recorded for each person 4.4 4.9 3.8 5.2 4.7 4.6 5.4 3.8 4.0 4.3 Determine the median productivity for the group. Department of Health Statistics median
46
[solution] Arrange the data in ascending order 3.8 3.8 4.0 4.3 4.4 4.6 4.7 4.9 5.2 5.4 Compute the mean of the 5 th and 6 th Department of Health Statistics median
47
[exercise] Exercise capacity (in seconds) was determined for each of 11 patients being treated for chronic heart failure. Department of Health Statistics 906 684 897 1320 1200 882 711 837 1008 1170 1056 Determine the median and mean. median Answer Mean 970 Median 906
48
When sample size is very larger or to the grouped data, we can chose other formula to compute median(P 50 ). Department of Health Statistics median Min P 0 Max P 100 X% ( 100-X ) % PxPx M P 50
49
f x =frequency of the group including median I = interval width L: lower limit of the group including median. is the cumulative frequency less than the group including median. Department of Health Statistics median
50
[Example 2.7 ] Determine the median in example 1.2 Department of Health Statistics median
51
Department of Health Statistics Lower limit Upper limit median
52
Department of Health Statistics To determine which interval the median belongs to we must find the first interval for which the cumulative frequency reaches 0.50. This interval will be the one containing the median. median
53
For these data, the interval from 6.0 to 6.5 is the first interval for which the cumulative frequency reaches 0.50, as shown in the table, column 6. So this interval contains the median. Then, L=6.0 f m =11 n=50 i=0.5 =18 Department of Health Statistics median
54
[Exercise] Calculate P 25 and P 75 in example 1.2 Department of Health Statistics median
55
Department of Health Statistics [Properties of the Median] It is not affected by extreme values. It is the best index when there is no exact value in one or two ends of the distribution. median
56
[Exercise] One doctor measured the delitescence (days) of some infectious disease in 10 patients. The outcomes are as follows: 6 , 13 , 5 , 9 , 12 , 10 , 8 , 11 , 8 ,> 14 Please calculate the average delitescence. Department of Health Statistics median
57
There is no exact value at the right end of distribution, so we should choose median. Firstly, we Sort the data from the smallest to the largest one 5 6 8 8 9 10 11 12 13 > 14 calculate the mean of 9 and 10, it is 9.5 So the average delitescence is 9.5 days Department of Health Statistics [answer] median
58
Department of Health Statistics [Usage of median] Median can be used in any type of quantitative variable, not only for the data with the normal distribution, but also for the data with the skewed distribution or when there are some unknown values in the data. In symmetrical data, mean equals to median theoretically. median
59
Mode [Definition] The mode of a set of measurements is defined to be the measurement that occurs most often(with the highest frequency). Department of Health Statistics
60
[Example 2.8] Please find out the mode of 9 undergraduates’ English scores 76 87 69 76 85 80 79 81 83 We will find that there are two ’76’ in this example, so the mode is 76.
61
Mode is the observation unit which occur most often. In some cases, perhaps there are more than one modes. Department of Health Statistics
62
[Example 2.9] Please find out the mode of 10 boy’s heights (m). 1.45,1.50,1.32,1.37,1.45,1.60 1.48,1.41,1.35,1.50 We will find that there are two modes in this example: 1.45 and 1.50.
63
Department of Health Statistics Summary In a normal distribution, the mean, median, and mode are identical. For normal distributions, the mean is the most efficient and can reflect character of all measurements.
64
Department of Health Statistics
65
Section 3 Measures of dispersion tendency
66
Central tendency can reflect the average level of quantitative variable. But it is not enough to know the central tendency of the distribution only, we should also describe the variation of the observations. Department of Health Statistics
67
Group A: 3 4 5 6 7 Group B: 1 3 5 7 9 Mean of group A=(3+4+5+6+7)/5=5 Mean of group B=(1+3+5+7+9)/5=5 The dispersions of the two groups are different.
68
Range Quartile range Variance or standard deviation Coefficient of variation 2 1 3 4 Dispersion tendency Dispersion tendency reflects the degree of variability of different measurements.
69
[Definition] Department of Health Statistics Range is the difference between MAX and MIN. range
70
[example 3.1] Determine the range of the following data set. 1, 6, 2, 3, 9, 7, 5 [solution 3.1] RANGE=9-1=8. Department of Health Statistics range
71
Merit of range It is the simplest measurement of data variability. limitation of range It is least useful for it can only reflect the difference between MAX and MIN. And it is easily affected by extreme value. Department of Health Statistics range
72
Department of Health Statistics The interquartile range is the distance between the third quartile Q 3 (P 75 ) and the first quartile Q 1 (P 25 ). This distance will include the middle 50 percent of the observations. Interquartile range = Q 3 - Q 1 [Definition] interquartile Range
73
[Example 3.2] Calculate the IQR in example 1.1 in virtue of the following table. Department of Health Statistics interquartile Range
74
Department of Health Statistics Lower limit Upper limit interquartile Range
75
[Solution 3.2] Above all, we should calculate P 25 and P 75 Department of Health Statistics IQR=6.87-5.75=1.12 interquartile Range
76
Department of Health Statistics IQR(Q), although more sensitive to data pileup about the midpoint than the range, is still not sufficient for our purpose. It can only reflect the variability of middle 50% measurements. And also, it is limited in interpreting the variability of s single set of measurements. [Properties] interquartile Range
77
The population variance of a set of n measurements x 1,x 2 … with arithmetic mean μ is the sum of the squared deviations divided by n. Department of Health Statistics [ Definition] variance
78
The sample variance of a set of n measurements x 1,x 2 … with arithmetic mean is the sum of the squared deviations divided by n-1. Department of Health Statistics [ Definition] variance
79
Department of Health Statistics variance mean Degree of freedom is the squared deviation
80
[Example 3.3] The time between an electric light stimulus and a bar press to avoid a shock was noted for each of five conditioned rats. Use the data below to compute the sample variance. Shock avoidance times (seconds): 5,4,3,1,3 Department of Health Statistics variance
81
[Solution 3.3] Department of Health Statistics XiXi 5 1.8 3.24 4 0.8 0.64 3 -0.2 0.04 1 -2.2 4.84 3 - 0.2 0.04 TOTAL 16 0 8.80 The deviations and the squared deviations are shown below. The sample mean is 3.2 variance
82
[Solution 3.3] Using the total of the squared deviations column, we find the sample variance to be Department of Health Statistics variance
83
Department of Health Statistics All values are used in the calculation. Not influenced by extreme values. The units of variance is difficult to explain, It is the square of the original units. [Properties] variance
84
[definition] Standard deviation is the positive square root of the variance. [symbol] Population standard deviation σ Sample standard deviation S Department of Health Statistics Standard deviation
85
[Example 3.4] Calculate the sample standard deviation in Example 3.3 [solution 3.4] Department of Health Statistics Standard deviation
86
Department of Health Statistics –It is the best measurement describing the variability of quantitative variable, which can reflect the variability of any data. –Only when the data come from normal distribution, can it be used. [Properties ] Standard deviation
87
[definition] The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage: Department of Health Statistics Coefficient of Variation
88
[Usage] The measurements with different units, such as the variability comparison of height (cm) and weight (kg) When the mean of two groups is quite different, one is very small, while the other is very large. such as the weight of elephants and infants Department of Health Statistics Coefficient of Variation
89
[example 3.6] Department of Health Statistics One doctor measured the heights and weights of 50 people, the outcome is Compare which variability is much larger between height and weight? Coefficient of Variation
90
[Solution 3.6] Department of Health Statistics So the variability of weight is much larger. Coefficient of Variation
91
Department of Health Statistics
93
Description of data from normal distribution Description of data from skewed distribution
94
94
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.