Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter-2 Statistical description of quantitative variable.

Similar presentations


Presentation on theme: "Chapter-2 Statistical description of quantitative variable."— Presentation transcript:

1 Chapter-2 Statistical description of quantitative variable

2 Teaching contents In this section, we shall study descriptive techniques of quantitative variable. Section 1 Frequency distribution table and frequency distribution graph Section 2 Measures of central tendency Section 3 Measures of dispersion tendency

3 Teaching aims To learn the usage of frequency table and graph. To master the application of different indexes.

4 Department of Health Statistics Section 1 Frequency distribution table and frequency distribution graph

5 part 1 Frequency distribution table and graph of qualitative variable qualitative variable part 2 Frequency distribution table and graph of quantitative variable quantitative variable part 3 Usage of frequency distribution graph Department of Health Statistics NEXT

6 [Example 1.1] university officials periodically review the distribution of undergraduate majors to help determine a fair allocation of resources, and the following data were obtained Department of Health Statistics Table 1.1 the distribution of undergraduate majors

7 Department of Health Statistics back Fig 1.1 the distribution of undergraduate majors

8 [Example 1. 2 ] The techniques will be illustrated using the Scottish Heart Health Study, but for simplicity we shall now take only one variable recorded on 50 subjects. Department of Health Statistics

9 5.756.296.136.786.46 6.765.986.256.315.99 6.475.715.194.355.35 7.116.896.057.015.86 5.424.927.125.855.64 7.046.235.716.746.36 5.757.716.197.556.76 7.145.736.737.865.51 6.026.545.346.927.15 6.557.164.796.646.83 Table 1.2 Serum total cholesterol (mmol/L) of 50 subjects from the Scottish Heart Health Study

10 How to describe the data in table 1.2? List all the data one by one, but it is difficult for the reader to learn the distribution character of 50 individuals. Summarize it using specific index, which is economical in space and easier for the reader to understand.specific index Department of Health Statistics

11 FREQUENCY DISTRIBUTION TABLE and FREQUENCY DISTRIBUTION GRAPH Step 1 to find MIN and MAX, and compute range Step 2 set up class intervals Step 3 set all the data in one of the class intervals Department of Health Statistics

12 MIN 4.35 MAX 7.86 RANGE 3.51 Range is the difference between MAX and MIN Department of Health Statistics Step 1

13 Divide the range by the approximate number of class intervals. Generally we will wish to have 7 to 15 class intervals, which is related with sample size. The larger sample size is, the more class intervals there are accordingly. Department of Health Statistics Step 2

14 Suppose we wish to have 7 class intervals, then the interval width is 3.51(range)/7 ≈ 0.5 So we choose 0.5 as the interval width. Department of Health Statistics Step 2

15 Divide the range by the desired number of subintervals. Department of Health Statistics Step 2 Your attention: The first subinterval must contain MIN, and the last one must include MAX.

16 Construct frequency distribution and keep a tally of the number of measurements falling in a each interval. Department of Health Statistics Step 3

17 Your attention: Each class interval include the lower limit (L), but not the upper limit (U). For example, there is a data of 5.5, it should be in the forth group. Department of Health Statistics Step 3

18 Department of Health Statistics Lower limit Upper limit Table 1.3 frequency distribution table for serum total cholesterol Percentage is frequency divided by sample size(50)

19 Department of Health Statistics Fig 1.2 frequency distribution graph for serum total cholesterol

20 Department of Health Statistics The difference

21 Usage of frequency distribution graph 1 To describe the distribution characters of frequency. From table 3 and figure 2, we can know serum total cholesterol of most people is from 5.0 to 7.0 mol/L, the proportion beyond is very small. Department of Health Statistics

22 How to describe the distribution characters of data? Central tendency Dispersion tendency Department of Health Statistics

23 Describe How Data Are Distributed Positive-SkewedNegative-SkewedSymmetric

24 Table 2 Mercury concentration Of hair in 238 health people Mercury concentration Of hair number Positive-Skewed

25 table3 Myoglobin concentration in blood serum of 101 normal people number Negative-Skewed Myoglobin concentration In blood serum

26 2 From the frequency distribution, we can find the outlier ( too large or too small value) very easily. For instance, all the serum total cholesterol is from 4.0 to 8.0, if one value is 28 (too large, we think it ’ s impossible), we called it outlier and should check whether it is right. 3 It is a way of describing data. Department of Health Statistics

27 Section 2 Measures of central tendency

28 arithmetic mean geometric mean Median and Percentile Mode 2 1 3 4 Central tendency Central tendency reflects the average level of a series of measurements.

29 The arithmetic mean [Definition] The arithmetic mean, also called mean, is defined to be the sum of the measurements divided by the total number measurements. Department of Health Statistics

30 [symbols] the population mean is denoted by the Greek letter μ (read “ mu ” ) and the sample mean is denoted by the symbol (read “ X-bar ” ) [Sample mean] Department of Health Statistics n is the total number of observations. X is a particular value.  (read “sigma”) indicates the operation of adding. mean [Population Mean]

31 [example2.1] The mean score on a given test can be found for an entire class. Take a look at this American History class : Department of Health Statistics mean

32 [solution] We find the mean score, by adding all the scores together and dividing by 10 (the number of scores). Department of Health Statistics mean

33 Department of Health Statistics  All the values are included while computing the mean.  The mean is easily affected by largest or smallest values.  mean [ Properties of the Arithmetic Mean]

34 Department of Health Statistics [notice] Mean can only be used in homogenous data. For example, we can compute the mean height of ten-year-old boys. But it is unscientific to calculate the mean height of boys from 1 to 14 years. Only when the distribution is normal, can we compute mean. mean

35 Department of Health Statistics mean Mean can be used.

36 Department of Health Statistics Geometric Mean [Definition] The geometric mean is defined as the n th root of the product of the n numbers. [symbol] G Geometric Mean

37 [formula] Department of Health Statistics Geometric Mean

38 Department of Health Statistics [Example 2.3] The antibody ’ s levels of serum of six patients are listed. 1:10 , 1:20 , 1:40 , 1:80 , 1:80 , 1:160, Please calculate the geometric mean? Geometric Mean

39 [solution] Department of Health Statistics Geometric Mean So the Geometric Mean is 1:45 X is reciprocal of antibody ’ s level; and lgX is the logarithm of reciprocal. Sample size Inverse logarithm

40 Department of Health Statistics [Usage of G ] Geometric mean is often used in geometric proportion data. Such as 1:2 1:4 1:8 1:16 1:32 Geometric Mean

41 Median [Definition] The median, also called 50th percentile, is the midpoint of the observations when they are arranged in ascending order. Department of Health Statistics median

42 [formula] When n is odd, the median is still the middle value when the data are arranged in ascending order. Department of Health Statistics When n is even, the median is the mean of the middle two values when the data are arranged in ascending order.. median

43 [Example 2.5] Each of 7children in the second grade was given a reading aptitude test, the scores were as shown below. 95 86 64 81 75 76 69 Determine the median test score. Department of Health Statistics median

44 [solution] Firstly, we must arrange the scores in ascending order 64 69 75 76 81 86 95 There are 7 measurements, and the forth is the midpoint value, so the median is 76, or we can use formula Department of Health Statistics median

45 [Example 2.6] An experiment was conducted to measure the effectiveness of a new procedure pruning grapes. 10 were assigned the task of pruning an acre of grapes. The productivity, measured in worker-hours/acre, is recorded for each person 4.4 4.9 3.8 5.2 4.7 4.6 5.4 3.8 4.0 4.3 Determine the median productivity for the group. Department of Health Statistics median

46 [solution] Arrange the data in ascending order 3.8 3.8 4.0 4.3 4.4 4.6 4.7 4.9 5.2 5.4 Compute the mean of the 5 th and 6 th Department of Health Statistics median

47 [exercise] Exercise capacity (in seconds) was determined for each of 11 patients being treated for chronic heart failure. Department of Health Statistics 906 684 897 1320 1200 882 711 837 1008 1170 1056 Determine the median and mean. median Answer Mean 970 Median 906

48 When sample size is very larger or to the grouped data, we can chose other formula to compute median(P 50 ). Department of Health Statistics median Min P 0 Max P 100 X% ( 100-X ) % PxPx M P 50

49  f x =frequency of the group including median  I = interval width  L: lower limit of the group including median.  is the cumulative frequency less than the group including median. Department of Health Statistics median

50 [Example 2.7 ] Determine the median in example 1.2 Department of Health Statistics median

51 Department of Health Statistics Lower limit Upper limit median

52 Department of Health Statistics To determine which interval the median belongs to we must find the first interval for which the cumulative frequency reaches 0.50. This interval will be the one containing the median. median

53 For these data, the interval from 6.0 to 6.5 is the first interval for which the cumulative frequency reaches 0.50, as shown in the table, column 6. So this interval contains the median. Then, L=6.0 f m =11 n=50 i=0.5 =18 Department of Health Statistics median

54 [Exercise] Calculate P 25 and P 75 in example 1.2 Department of Health Statistics median

55 Department of Health Statistics [Properties of the Median]  It is not affected by extreme values.  It is the best index when there is no exact value in one or two ends of the distribution. median

56 [Exercise] One doctor measured the delitescence (days) of some infectious disease in 10 patients. The outcomes are as follows: 6 , 13 , 5 , 9 , 12 , 10 , 8 , 11 , 8 ,> 14 Please calculate the average delitescence. Department of Health Statistics median

57 There is no exact value at the right end of distribution, so we should choose median. Firstly, we Sort the data from the smallest to the largest one 5 6 8 8 9 10 11 12 13 > 14 calculate the mean of 9 and 10, it is 9.5 So the average delitescence is 9.5 days Department of Health Statistics [answer] median

58 Department of Health Statistics [Usage of median] Median can be used in any type of quantitative variable, not only for the data with the normal distribution, but also for the data with the skewed distribution or when there are some unknown values in the data. In symmetrical data, mean equals to median theoretically. median

59 Mode [Definition] The mode of a set of measurements is defined to be the measurement that occurs most often(with the highest frequency). Department of Health Statistics

60 [Example 2.8] Please find out the mode of 9 undergraduates’ English scores 76 87 69 76 85 80 79 81 83 We will find that there are two ’76’ in this example, so the mode is 76.

61 Mode is the observation unit which occur most often. In some cases, perhaps there are more than one modes. Department of Health Statistics

62 [Example 2.9] Please find out the mode of 10 boy’s heights (m). 1.45,1.50,1.32,1.37,1.45,1.60 1.48,1.41,1.35,1.50 We will find that there are two modes in this example: 1.45 and 1.50.

63 Department of Health Statistics Summary  In a normal distribution, the mean, median, and mode are identical.  For normal distributions, the mean is the most efficient and can reflect character of all measurements.

64 Department of Health Statistics

65 Section 3 Measures of dispersion tendency

66 Central tendency can reflect the average level of quantitative variable. But it is not enough to know the central tendency of the distribution only, we should also describe the variation of the observations. Department of Health Statistics

67 Group A: 3 4 5 6 7 Group B: 1 3 5 7 9 Mean of group A=(3+4+5+6+7)/5=5 Mean of group B=(1+3+5+7+9)/5=5 The dispersions of the two groups are different.

68 Range Quartile range Variance or standard deviation Coefficient of variation 2 1 3 4 Dispersion tendency Dispersion tendency reflects the degree of variability of different measurements.

69 [Definition] Department of Health Statistics Range is the difference between MAX and MIN. range

70 [example 3.1] Determine the range of the following data set. 1, 6, 2, 3, 9, 7, 5 [solution 3.1] RANGE=9-1=8. Department of Health Statistics range

71 Merit of range It is the simplest measurement of data variability. limitation of range It is least useful for it can only reflect the difference between MAX and MIN. And it is easily affected by extreme value. Department of Health Statistics range

72 Department of Health Statistics The interquartile range is the distance between the third quartile Q 3 (P 75 ) and the first quartile Q 1 (P 25 ). This distance will include the middle 50 percent of the observations. Interquartile range = Q 3 - Q 1 [Definition] interquartile Range

73 [Example 3.2] Calculate the IQR in example 1.1 in virtue of the following table. Department of Health Statistics interquartile Range

74 Department of Health Statistics Lower limit Upper limit interquartile Range

75 [Solution 3.2] Above all, we should calculate P 25 and P 75 Department of Health Statistics IQR=6.87-5.75=1.12 interquartile Range

76 Department of Health Statistics IQR(Q), although more sensitive to data pileup about the midpoint than the range, is still not sufficient for our purpose. It can only reflect the variability of middle 50% measurements. And also, it is limited in interpreting the variability of s single set of measurements. [Properties] interquartile Range

77 The population variance of a set of n measurements x 1,x 2 … with arithmetic mean μ is the sum of the squared deviations divided by n. Department of Health Statistics [ Definition] variance

78 The sample variance of a set of n measurements x 1,x 2 … with arithmetic mean is the sum of the squared deviations divided by n-1. Department of Health Statistics [ Definition] variance

79 Department of Health Statistics variance mean Degree of freedom is the squared deviation

80 [Example 3.3] The time between an electric light stimulus and a bar press to avoid a shock was noted for each of five conditioned rats. Use the data below to compute the sample variance. Shock avoidance times (seconds): 5,4,3,1,3 Department of Health Statistics variance

81 [Solution 3.3] Department of Health Statistics XiXi 5 1.8 3.24 4 0.8 0.64 3 -0.2 0.04 1 -2.2 4.84 3 - 0.2 0.04 TOTAL 16 0 8.80 The deviations and the squared deviations are shown below. The sample mean is 3.2 variance

82 [Solution 3.3] Using the total of the squared deviations column, we find the sample variance to be Department of Health Statistics variance

83 Department of Health Statistics All values are used in the calculation. Not influenced by extreme values. The units of variance is difficult to explain, It is the square of the original units. [Properties] variance

84 [definition] Standard deviation is the positive square root of the variance. [symbol] Population standard deviation σ Sample standard deviation S Department of Health Statistics Standard deviation

85 [Example 3.4] Calculate the sample standard deviation in Example 3.3 [solution 3.4] Department of Health Statistics Standard deviation

86 Department of Health Statistics –It is the best measurement describing the variability of quantitative variable, which can reflect the variability of any data. –Only when the data come from normal distribution, can it be used. [Properties ] Standard deviation

87 [definition] The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage: Department of Health Statistics Coefficient of Variation

88 [Usage] The measurements with different units, such as the variability comparison of height (cm) and weight (kg) When the mean of two groups is quite different, one is very small, while the other is very large. such as the weight of elephants and infants Department of Health Statistics Coefficient of Variation

89 [example 3.6] Department of Health Statistics One doctor measured the heights and weights of 50 people, the outcome is Compare which variability is much larger between height and weight? Coefficient of Variation

90 [Solution 3.6] Department of Health Statistics So the variability of weight is much larger. Coefficient of Variation

91 Department of Health Statistics

92

93 Description of data from normal distribution Description of data from skewed distribution

94 94


Download ppt "Chapter-2 Statistical description of quantitative variable."

Similar presentations


Ads by Google