Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual University of Pakistan Lecture No. 7 Statistics and Probability by Miss Saleha Naghmi Habibullah.

Similar presentations


Presentation on theme: "Virtual University of Pakistan Lecture No. 7 Statistics and Probability by Miss Saleha Naghmi Habibullah."— Presentation transcript:

1 Virtual University of Pakistan Lecture No. 7 Statistics and Probability by Miss Saleha Naghmi Habibullah

2 IN THE LAST LECTURE, YOU LEARNT: Stem and Leaf Plot Dot Plot The concept of Central Tendency Mode

3

4 Mode: where l= lower class boundary of the modal class, f m = frequency of the modal class, f 1 = frequency of the class preceding the modal class, f 2 = frequency of the class following modal class, and h= length of class interval of the modal class

5 Hence, we obtained:

6 = 37.825

7 In general, it was noted that, for most of the frequency distributions, the mode lies somewhere in the middle of our frequency distribution, and hence is eligible to be called a measure of central tendency.

8 Example The following table contains the ages of 50 managers of child-care centers in five cities of a developed country.

9 Ages of a sample of managers of Urban child-care centers 4226323457 3058375030 5340304749 5040323140 5228233525 3036322650 5530586452 4933434632 6131304060 7437294354 Convert this data into Frequency Distribution and find the modal age.

10 Solution Following the various steps involved in the construction of frequency distribution we obtained:

11 Frequency Distribution of Child-Care Managers Age Class Interval Frequency 20 – 29 6 30 – 39 18 40 – 49 11 50 – 59 11 60 – 69 3 70 – 79 1 Total50

12 Mode: where l= lower class boundary of the modal class, fm= frequency of the modal class, f1= frequency of the class preceding the modal class, f2= frequency of the class following modal class, and h= length of class interval of the modal class

13 Hence, the mode is given by

14 X 79.559.5 49.5 39.529.519.5 20 15 10 5 0 69.5 Ages of Managers Y No. of Managers Mode = 35.8

15 DESIRABLE PROPERTIES OF THE MODE The mode is easily understood and easily ascertained in case of a discrete frequency distribution. It is not affected by a few very high or low values. The question arises, “When should we use the mode?” The answer to this question is that the mode is a valuable concept in certain situations such as the one described below:

16 Suppose the manager of a men’s clothing store is asked about the average size of hats sold. He will probably think not of the arithmetic or geometric mean size, or indeed the median size. Instead, he will in all likelihood quote that particular size which is sold most often. This average is of far more use to him as a businessman than the arithmetic mean, geometric mean or the median. The modal size of all clothing is the size which the businessman must stock in the greatest quantity and variety in comparison with other sizes. On the other hand, sometimes a frequency distribution contains two modes in which case it is called a bi-modal distribution as shown below: EXAMPLE

17 f 0 X THE BI-MODAL FREQUENCY DISTRIBUTION

18 THE ARITHMETIC MEAN The arithmetic mean is the statistician’s term for what the layman knows as the average. It can be thought of as that value of the variable series which is numerically MOST representative of the whole series. “The arithmetic mean or simply the mean is a value obtained by dividing the sum of all the observations by their number.”

19 where n represents the number of observations in the sample that has been the ith observation in the sample (i = 1, 2, 3, …, n), and represents the mean of the sample. For simplicity, the above formula can be written as (In other words, it is not necessary to insert the subscript ‘i’.)

20 EXAMPLE Information regarding the receipts of a news agent for seven days of a particular week are given below: Mean sales per day in this week : = £ 259.85/7 = £ 37.12 (to the nearest penny).

21 Interpretation: The mean, £ 37.12, represents the amount (in pounds sterling) that would have been obtained on each day if the same amount were to be obtained on each day. To calculate the approximate value of the mean, the observations in each class are assumed to be identical with the class midpoint Xi. As was just mentioned, the observations in each class are assumed to be identical with the midpoint i.e. the class- mark.,(This is based on the assumption that the observations in the group are evenly scattered between the two extremes of the class interval). The mid-point of every class is known as its class-mark. In other words, the midpoint of a class ‘marks’ that class.

22 FREQUENCY DISTRIBUTION In case of a frequency distribution, the arithmetic mean is defined as:

23 For simplicity, the above formula can be written as (The subscript ‘i’ can be dropped.) EPA MILEAGE RATINGS OF 30 CARS OF A CERTAIN MODEL

24 CLASS-MARK (MID-POINT): The mid-point of each class is obtained by adding the sum of the two limits of the class and dividing by 2. Hence, in this example, our mid-points are computed in this manner: 30.0 plus 32.9 divided by 2 is equal to 31.45, 33.0 plus 35.9 divided by 2 is equal to 34.45, and so on.

25 Applying the formula: we obtain

26 GROUPING ERROR “Grouping error” refers to the error that is introduced by the assumption that all the values falling in a class are equal to the mid-point of the class interval. In reality, it is highly improbable to have a class for which all the values lying in that class are equal to the mid-point of that class. This is why the mean that we calculate from a frequency distribution does not give exactly the same answer as what we would get by computing the mean of our raw data. This grouping error arises in the computation of many descriptive measures such as the geometric mean, harmonic mean, mean deviation and standard deviation.

27 But, experience has shown that in the calculation of the arithmetic mean, this error is usually small and never serious. Only a slight difference occurs between the true answer that we would get from the raw data, and the answer that we get from the data that has been grouped in the form of a frequency distribution. In this example, if we calculate the arithmetic mean directly from the 30 EPA mileage ratings, we obtain:

28 The arithmetic mean is predominantly used as a measure of central tendency. The question is, “Why is it that the arithmetic mean is known as a measure of central tendency?” The answer to this question is that we have just obtained i.e. 37.85 falls more or less in the centre of our frequency distribution. Mean = 37.85

29 DESIRABLE PROPERTIES OF THE ARITHMETIC MEAN Best understood average in statistics. Relatively easy to calculate Takes into account every value in the series. But there is one limitation to the use of the arithmetic mean: As we are aware, every value in a data-set is included in the calculation of the mean, whether the value be high or low. Where there are a few very high or very low values in the series, their effect can be to drag the arithmetic mean towards them. this may make the mean unrepresentative. Let us consider an example:

30 Example of the Case Where the Arithmetic Mean Is Not a Proper Representative of the Data: Suppose one walks down the main street of a large city centre and counts the number of floors in each building. Suppose, the following answers are obtained: 5, 4, 3, 4, 5, 4, 3, 4, 5, 20, 5, 6, 32, 8, 27 The mean number of floors is 9 even though 12 out of 15 of the buildings have 6 floors or less. The three skyscraper blocks are having a disproportionate effect on the arithmetic mean.

31 EXAMPLE Suppose that in a particular high school, there are:- 100–freshmen 80–sophomores 70–juniors 50–seniors And suppose that on a given day, 15% of freshmen, 5% of sophomores, 10% of juniors, 2% of seniors are absent. The problem is that: What percentage of students is absent for the school as a whole on that particular day? Now a student is likely to attempt to find the answer by adding the percentages and dividing by 4 i.e.

32 As we have already noted, 15% of the freshmen are absent on this particular day. Since, in all, there are 100 freshmen in the school, hence the total number of freshmen who are absent is also 15. But as far as the sophomores are concerned, the total number of them in the school is 80, and if 5% of them are absent on this particular day, this means that the total number of sophomores who are absent is only 4.

33 Dividing the total number of students who are absent by the total number of students enrolled in the school, and multiplying by 100, we obtain:

34 In this example, the number of students enrolled in each category acts as the weight for the number of absences pertaining to that category i.e.

35 WEIGHTED MEAN And, in this example, the weighted mean is equal to: An important point to note here is the criterion for assigning weights. Weights can be assigned in a number of ways depending on the situation and the problem domain. In the example that we have just considered, greater weights are assigned to larger groups.

36 MEDIAN The median is the middle value of the series when the variable values are placed in order of magnitude. The median is defined as a value which divides a set of data into two halves, one half comprising of observations greater than and the other half smaller than it. More precisely, the median is a value at or below which 50% of the data lie. The median value can be ascertained by inspection in many series. For instance, in this very example, the data that we obtained was:

37 EXAMPLE: The average number of floors in the buildings at the centre of a city: 5, 4, 3, 4, 5, 4, 3, 4, 5, 20, 5, 6, 32, 8, 27 Arranging these values in ascending order, we obtain 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 8, 20, 27, 32 Picking up the middle value, we obtain the median equal to 5.

38 Interpretation: The median number of floors is 5. Out of those 15 buildings, 7 have up to 5 floors and 7 have 5 floors or more. We noticed earlier that the arithmetic mean was distorted toward the few extremely high values in the series and hence became unrepresentative. The median = 5 is much more representative of this series.

39 EXAMPLE

40

41

42 Example of Discrete a Frequency Distribution Comprehensive School No. of Pupils per class No. of class 2324252627282930311013698107

43 EXAMPLE OF A DISCRETE FREQUENCY DISTRIBUTION Comprehensive School:

44 In this school, there are 45 classes in all, so that we require as the median that class-size below which there are 22 classes and above which also there are 22 classes. In other words, we must find the 23rd class in an ordered list. We could simply count down noticing that there is 1 class of 23 children, 2 classes with up to 25 children, 5 classes with up to 26 children. Proceeding in this manner, we find that 20 classes contain up to 28 children whereas 28 classes contain up to 29 children. This means that the 23rd class --- the one that we are looking for --- is the one which contains exactly 29 children.

45 Raw Data 23, 25, 26, 26, 26, 27, 27, 27, 27, 27 27, 28, 28, 28, 28, 28, 28, 28, 28, 28 29, 29, 29, 29, 29, 29, 29, 29, 30, 30 30, 30, 30, 30, 30, 30, 30, 30, 31, 31 31, 31, 31, 31, 31 Median = 23 rd Value

46 Comprehensive School: median

47 Median number of pupils per class: This means that 29 is the middle size of the class. In other words, 22 classes are such which contain 29 or less than 29 children, and 22 classes are such which contain 29 or more than 29 children.

48 Example Displayed in the following table are the annual attendance figures in millions of visitors of 32 U.S public zoological parks:

49 Attendance figures of 32 zoos (in millions) 0.61.41.30.60.91.01.20.90.21.40.32.70.50.46.00.12.01.61.10.31.30.61.31.51.40.71.00.60.40.80.30.9 Source: The World Almanac and Book of Facts, Funk & Wagnalls, 1995.

50 For these data, measures of location can yield such information as the average attendance of zoos, the middle attendance figure and the most frequently occurring figure.

51 Compute the mean, median and the mode for the attendance figure listed in the above table.

52 Solution 1. Computation of Mean: For these data we have Hence, the mean is:

53 2. Computation of the Median: Step-1 Arrange the data in an ordered array 0.3 0.4 0.5 0.6 0.6 0.6 0.6 0.6 0.7 0.8 0.9 0.9 0.9 1.0 1.0 1.0 1.1 1.2 1.3 1.3 1.3 1.4 1.4 1.4 1.5 1.6 2.0 2.0 2.7 3.0 3.0 4.0

54 Step-2 In order to compute the median, the first point to be noted is that, in this example, we are dealing with an even number of values i.e. 32

55 We compute the average of the values of the ordered data-set. values of the ordered data-set. Here, we discuss another way to solve this problem.

56 As, there are n = 32 values, we can say that the median is located at

57 X 15th value 16th value 17th value 18th value “16.5th value”

58 Hence, we have Median =

59 3. Computation of the Mode By inspecting the attendance figures, we find that 0.6 is occurring five times whereas all the other figures are occurring less often. Hence, Mode = 0.6 million

60 Conclusion  The mean or average, attendance at these 32 zoological parks is 1.3 million.  The median or middle attendance figure is 1.05 million  The mode i.e. the most frequently occurring attendance figure is 0.6 million

61 IN TODAY’S LECTURE, YOU LEARNT The importance of the mode The non-modal and the bi-modal situation The (simple) arithmetic mean The weighted arithmetic mean The median (in case of raw data and in case of the frequency distribution of a discrete variable).

62 IN THE NEXT LECTURE, YOU WILL LEARN Computation of the median in the case of the frequency distribution of a continuous variable. Empirical relation between the mean, median and the mode.


Download ppt "Virtual University of Pakistan Lecture No. 7 Statistics and Probability by Miss Saleha Naghmi Habibullah."

Similar presentations


Ads by Google