Chapter-2 Statistical description of quantitative variable
Teaching contents In this section, we shall study descriptive techniques of quantitative variable. Section 1 Frequency distribution table and frequency distribution graph Section 2 Measures of central tendency Section 3 Measures of dispersion tendency
Teaching aims To learn the usage of frequency table and graph. To master the application of different indexes.
Department of Health Statistics Section 1 Frequency distribution table and frequency distribution graph
part 1 Frequency distribution table and graph of qualitative variable qualitative variable part 2 Frequency distribution table and graph of quantitative variable quantitative variable part 3 Usage of frequency distribution graph Department of Health Statistics NEXT
[Example 1.1] university officials periodically review the distribution of undergraduate majors to help determine a fair allocation of resources, and the following data were obtained Department of Health Statistics Table 1.1 the distribution of undergraduate majors
Department of Health Statistics back Fig 1.1 the distribution of undergraduate majors
[Example 1. 2 ] The techniques will be illustrated using the Scottish Heart Health Study, but for simplicity we shall now take only one variable recorded on 50 subjects. Department of Health Statistics
Table 1.2 Serum total cholesterol (mmol/L) of 50 subjects from the Scottish Heart Health Study
How to describe the data in table 1.2? List all the data one by one, but it is difficult for the reader to learn the distribution character of 50 individuals. Summarize it using specific index, which is economical in space and easier for the reader to understand.specific index Department of Health Statistics
FREQUENCY DISTRIBUTION TABLE and FREQUENCY DISTRIBUTION GRAPH Step 1 to find MIN and MAX, and compute range Step 2 set up class intervals Step 3 set all the data in one of the class intervals Department of Health Statistics
MIN 4.35 MAX 7.86 RANGE 3.51 Range is the difference between MAX and MIN Department of Health Statistics Step 1
Divide the range by the approximate number of class intervals. Generally we will wish to have 7 to 15 class intervals, which is related with sample size. The larger sample size is, the more class intervals there are accordingly. Department of Health Statistics Step 2
Suppose we wish to have 7 class intervals, then the interval width is 3.51(range)/7 ≈ 0.5 So we choose 0.5 as the interval width. Department of Health Statistics Step 2
Divide the range by the desired number of subintervals. Department of Health Statistics Step 2 Your attention: The first subinterval must contain MIN, and the last one must include MAX.
Construct frequency distribution and keep a tally of the number of measurements falling in a each interval. Department of Health Statistics Step 3
Your attention: Each class interval include the lower limit (L), but not the upper limit (U). For example, there is a data of 5.5, it should be in the forth group. Department of Health Statistics Step 3
Department of Health Statistics Lower limit Upper limit Table 1.3 frequency distribution table for serum total cholesterol Percentage is frequency divided by sample size(50)
Department of Health Statistics Fig 1.2 frequency distribution graph for serum total cholesterol
Department of Health Statistics The difference
Usage of frequency distribution graph 1 To describe the distribution characters of frequency. From table 3 and figure 2, we can know serum total cholesterol of most people is from 5.0 to 7.0 mol/L, the proportion beyond is very small. Department of Health Statistics
How to describe the distribution characters of data? Central tendency Dispersion tendency Department of Health Statistics
Describe How Data Are Distributed Positive-SkewedNegative-SkewedSymmetric
Table 2 Mercury concentration Of hair in 238 health people Mercury concentration Of hair number Positive-Skewed
table3 Myoglobin concentration in blood serum of 101 normal people number Negative-Skewed Myoglobin concentration In blood serum
2 From the frequency distribution, we can find the outlier ( too large or too small value) very easily. For instance, all the serum total cholesterol is from 4.0 to 8.0, if one value is 28 (too large, we think it ’ s impossible), we called it outlier and should check whether it is right. 3 It is a way of describing data. Department of Health Statistics
Section 2 Measures of central tendency
arithmetic mean geometric mean Median and Percentile Mode Central tendency Central tendency reflects the average level of a series of measurements.
The arithmetic mean [Definition] The arithmetic mean, also called mean, is defined to be the sum of the measurements divided by the total number measurements. Department of Health Statistics
[symbols] the population mean is denoted by the Greek letter μ (read “ mu ” ) and the sample mean is denoted by the symbol (read “ X-bar ” ) [Sample mean] Department of Health Statistics n is the total number of observations. X is a particular value. (read “sigma”) indicates the operation of adding. mean [Population Mean]
[example2.1] The mean score on a given test can be found for an entire class. Take a look at this American History class : Department of Health Statistics mean
[solution] We find the mean score, by adding all the scores together and dividing by 10 (the number of scores). Department of Health Statistics mean
Department of Health Statistics All the values are included while computing the mean. The mean is easily affected by largest or smallest values. mean [ Properties of the Arithmetic Mean]
Department of Health Statistics [notice] Mean can only be used in homogenous data. For example, we can compute the mean height of ten-year-old boys. But it is unscientific to calculate the mean height of boys from 1 to 14 years. Only when the distribution is normal, can we compute mean. mean
Department of Health Statistics mean Mean can be used.
Department of Health Statistics Geometric Mean [Definition] The geometric mean is defined as the n th root of the product of the n numbers. [symbol] G Geometric Mean
[formula] Department of Health Statistics Geometric Mean
Department of Health Statistics [Example 2.3] The antibody ’ s levels of serum of six patients are listed. 1:10 , 1:20 , 1:40 , 1:80 , 1:80 , 1:160, Please calculate the geometric mean? Geometric Mean
[solution] Department of Health Statistics Geometric Mean So the Geometric Mean is 1:45 X is reciprocal of antibody ’ s level; and lgX is the logarithm of reciprocal. Sample size Inverse logarithm
Department of Health Statistics [Usage of G ] Geometric mean is often used in geometric proportion data. Such as 1:2 1:4 1:8 1:16 1:32 Geometric Mean
Median [Definition] The median, also called 50th percentile, is the midpoint of the observations when they are arranged in ascending order. Department of Health Statistics median
[formula] When n is odd, the median is still the middle value when the data are arranged in ascending order. Department of Health Statistics When n is even, the median is the mean of the middle two values when the data are arranged in ascending order.. median
[Example 2.5] Each of 7children in the second grade was given a reading aptitude test, the scores were as shown below Determine the median test score. Department of Health Statistics median
[solution] Firstly, we must arrange the scores in ascending order There are 7 measurements, and the forth is the midpoint value, so the median is 76, or we can use formula Department of Health Statistics median
[Example 2.6] An experiment was conducted to measure the effectiveness of a new procedure pruning grapes. 10 were assigned the task of pruning an acre of grapes. The productivity, measured in worker-hours/acre, is recorded for each person Determine the median productivity for the group. Department of Health Statistics median
[solution] Arrange the data in ascending order Compute the mean of the 5 th and 6 th Department of Health Statistics median
[exercise] Exercise capacity (in seconds) was determined for each of 11 patients being treated for chronic heart failure. Department of Health Statistics Determine the median and mean. median Answer Mean 970 Median 906
When sample size is very larger or to the grouped data, we can chose other formula to compute median(P 50 ). Department of Health Statistics median Min P 0 Max P 100 X% ( 100-X ) % PxPx M P 50
f x =frequency of the group including median I = interval width L: lower limit of the group including median. is the cumulative frequency less than the group including median. Department of Health Statistics median
[Example 2.7 ] Determine the median in example 1.2 Department of Health Statistics median
Department of Health Statistics Lower limit Upper limit median
Department of Health Statistics To determine which interval the median belongs to we must find the first interval for which the cumulative frequency reaches This interval will be the one containing the median. median
For these data, the interval from 6.0 to 6.5 is the first interval for which the cumulative frequency reaches 0.50, as shown in the table, column 6. So this interval contains the median. Then, L=6.0 f m =11 n=50 i=0.5 =18 Department of Health Statistics median
[Exercise] Calculate P 25 and P 75 in example 1.2 Department of Health Statistics median
Department of Health Statistics [Properties of the Median] It is not affected by extreme values. It is the best index when there is no exact value in one or two ends of the distribution. median
[Exercise] One doctor measured the delitescence (days) of some infectious disease in 10 patients. The outcomes are as follows: 6 , 13 , 5 , 9 , 12 , 10 , 8 , 11 , 8 ,> 14 Please calculate the average delitescence. Department of Health Statistics median
There is no exact value at the right end of distribution, so we should choose median. Firstly, we Sort the data from the smallest to the largest one > 14 calculate the mean of 9 and 10, it is 9.5 So the average delitescence is 9.5 days Department of Health Statistics [answer] median
Department of Health Statistics [Usage of median] Median can be used in any type of quantitative variable, not only for the data with the normal distribution, but also for the data with the skewed distribution or when there are some unknown values in the data. In symmetrical data, mean equals to median theoretically. median
Mode [Definition] The mode of a set of measurements is defined to be the measurement that occurs most often(with the highest frequency). Department of Health Statistics
[Example 2.8] Please find out the mode of 9 undergraduates’ English scores We will find that there are two ’76’ in this example, so the mode is 76.
Mode is the observation unit which occur most often. In some cases, perhaps there are more than one modes. Department of Health Statistics
[Example 2.9] Please find out the mode of 10 boy’s heights (m). 1.45,1.50,1.32,1.37,1.45, ,1.41,1.35,1.50 We will find that there are two modes in this example: 1.45 and 1.50.
Department of Health Statistics Summary In a normal distribution, the mean, median, and mode are identical. For normal distributions, the mean is the most efficient and can reflect character of all measurements.
Department of Health Statistics
Section 3 Measures of dispersion tendency
Central tendency can reflect the average level of quantitative variable. But it is not enough to know the central tendency of the distribution only, we should also describe the variation of the observations. Department of Health Statistics
Group A: Group B: Mean of group A=( )/5=5 Mean of group B=( )/5=5 The dispersions of the two groups are different.
Range Quartile range Variance or standard deviation Coefficient of variation Dispersion tendency Dispersion tendency reflects the degree of variability of different measurements.
[Definition] Department of Health Statistics Range is the difference between MAX and MIN. range
[example 3.1] Determine the range of the following data set. 1, 6, 2, 3, 9, 7, 5 [solution 3.1] RANGE=9-1=8. Department of Health Statistics range
Merit of range It is the simplest measurement of data variability. limitation of range It is least useful for it can only reflect the difference between MAX and MIN. And it is easily affected by extreme value. Department of Health Statistics range
Department of Health Statistics The interquartile range is the distance between the third quartile Q 3 (P 75 ) and the first quartile Q 1 (P 25 ). This distance will include the middle 50 percent of the observations. Interquartile range = Q 3 - Q 1 [Definition] interquartile Range
[Example 3.2] Calculate the IQR in example 1.1 in virtue of the following table. Department of Health Statistics interquartile Range
Department of Health Statistics Lower limit Upper limit interquartile Range
[Solution 3.2] Above all, we should calculate P 25 and P 75 Department of Health Statistics IQR= =1.12 interquartile Range
Department of Health Statistics IQR(Q), although more sensitive to data pileup about the midpoint than the range, is still not sufficient for our purpose. It can only reflect the variability of middle 50% measurements. And also, it is limited in interpreting the variability of s single set of measurements. [Properties] interquartile Range
The population variance of a set of n measurements x 1,x 2 … with arithmetic mean μ is the sum of the squared deviations divided by n. Department of Health Statistics [ Definition] variance
The sample variance of a set of n measurements x 1,x 2 … with arithmetic mean is the sum of the squared deviations divided by n-1. Department of Health Statistics [ Definition] variance
Department of Health Statistics variance mean Degree of freedom is the squared deviation
[Example 3.3] The time between an electric light stimulus and a bar press to avoid a shock was noted for each of five conditioned rats. Use the data below to compute the sample variance. Shock avoidance times (seconds): 5,4,3,1,3 Department of Health Statistics variance
[Solution 3.3] Department of Health Statistics XiXi TOTAL The deviations and the squared deviations are shown below. The sample mean is 3.2 variance
[Solution 3.3] Using the total of the squared deviations column, we find the sample variance to be Department of Health Statistics variance
Department of Health Statistics All values are used in the calculation. Not influenced by extreme values. The units of variance is difficult to explain, It is the square of the original units. [Properties] variance
[definition] Standard deviation is the positive square root of the variance. [symbol] Population standard deviation σ Sample standard deviation S Department of Health Statistics Standard deviation
[Example 3.4] Calculate the sample standard deviation in Example 3.3 [solution 3.4] Department of Health Statistics Standard deviation
Department of Health Statistics –It is the best measurement describing the variability of quantitative variable, which can reflect the variability of any data. –Only when the data come from normal distribution, can it be used. [Properties ] Standard deviation
[definition] The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage: Department of Health Statistics Coefficient of Variation
[Usage] The measurements with different units, such as the variability comparison of height (cm) and weight (kg) When the mean of two groups is quite different, one is very small, while the other is very large. such as the weight of elephants and infants Department of Health Statistics Coefficient of Variation
[example 3.6] Department of Health Statistics One doctor measured the heights and weights of 50 people, the outcome is Compare which variability is much larger between height and weight? Coefficient of Variation
[Solution 3.6] Department of Health Statistics So the variability of weight is much larger. Coefficient of Variation
Department of Health Statistics
Description of data from normal distribution Description of data from skewed distribution
94