Presentation is loading. Please wait.

Presentation is loading. Please wait.

DESCRIPTIVE MEASURES.

Similar presentations


Presentation on theme: "DESCRIPTIVE MEASURES."— Presentation transcript:

1 DESCRIPTIVE MEASURES

2 DESCRIPTIVE MEASURES Frequency distributions and graphs may be considered a first kind of summarization. However, they are not helpful when we need to describe verbally the main futures of a data set. Summaries are extremely useful in understanding and communicating the most important characteristics of a data set.

3 DESCRIPTIVE MEASURES Example: these techniques can help us to graph data on family incomes. However, we may want to know the income of a “typical” family, the spread of the distribution of incomes, or the location of a family with particular income Spread Position of a particular family $56,260 Center Income

4 TYPE OF DESCRIPTIVE MEASURES
Such questions can be answered using the summary measures. Included among these are: Measures of Central Tendency Measures of Position Measures of Dispersion

5 MEASURES OF CENTRAL TENDENCIES

6 Measures of Central Tendencies
The simplest and most extreme kind of summary is to reduce the entire group of observations down to one single value that best represents the data set This single-value summary should be value that is typical of the observations of the group (a population or a sample). Measures of central tendency are measures of the location of the middle or the center of a distribution. The definition of "middle" or "center" is purposely left somewhat vague so that the term "central tendency" can refer to a wide variety of measures.

7 Measures of Central Tendencies
Mode for every kind of qualitative variables (“nominal” and “ordinal”) and quantitative variables Median for qualitative “ordinal” variables and quantitative variables Mean only for quantitative variables Nominal variables: the values do not have any quantitative meaning and there is no ordering relationship between them. E.g.: gender. Ordinal variables: there is an order relationship between values, but the difference between two successive modalities is not quantifiable. E.g: a three-point rating scale measuring customer satisfaction (“Not Satisfied”, “Satisfied”, “Very Satisfied”).

8 Mode Definition: Mode is a French word that means fashion. In statistics, the mode represents the most common value in a data set. The mode is the value that occurs with the highest frequency in a data set.

9 Mode: examples Mode Highest frequency Mode Highest frequency
Stress on Job Frequency (ni) Very Somewhat None 10 14 6 Mode Highest frequency Vehicles Owned Number of Households (ni) 1 2 3 4 5 18 11 Mode Highest frequency

10 Mode - Grouped data: example
When quantitative variables are grouped in classes the mode is defined as the class interval where most observations lie. This is called the modal-class interval. MODAL-CLASS INTERVAL The class interval that occurs with the highest frequency in a data set. Weekly Earnings (dollars) Number of Employees n 400 -| 600 600 -| 800 800 -| 1000 1000 -| 1200 1200 -| 1400 1400 -| 1600 14 22 49 20 9 6 Modal-class interval Highest frequency

11 Mode: other features One advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative. The mode is rarely used as a measure of central tendency for numeric variables. However, for categorical variables, the mode is more useful because the mean and median do not make sense. A data set may have none or many modes: The data set with only one mode is called unimodal. The data set with two modes is called bimodal. The data set with more than two modes is called multimodal.

12 Median Definition The median is the value of the middle term in a data set that has been ranked in increasing order. In other words, the median divides a ranked data set into two equal parts. The calculation of the median consists of the following two steps Rank the data set in increasing order; Find the middle term in a data set with n values. The value of this term is the median

13 Median The position of the middle term in a data set with n values is obtained as follows: Position of the middle term= If n is odd If n is even the average of the two middle values Thus, we can redefine the median as follows Median = value of the th term in a ranked data set if n is odd Median = value of the th term in a ranked data set if n is even

14 Median: example 1 The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership: We rank the given data in increasing order as follows: We find the position of the middle term (n is odd):

15 Median: example 1 Therefore, the median is the value of the third term in the ranked data. The median weight loss for this sample of five members of this health club is 8 pounds. Median

16 Median: example 2 The table lists the total revenue for the 12 top-grossing North American concert tours of all time Tour Artist Total Revenue (millions of dollars) Steel Wheels, 1989 Magic Summer, 1990 Voodoo Lounge, 1994 The Division Bell, 1994 Hell Freezes Over, 1994 Bridges to Babylon, 1997 Popmart, 1997 Twenty-Four Seven, 2000 No Strings Attached, 2000 Elevation, 2001 Popodyssey, 2001 Black and Blue, 2001 The Rolling Stones New Kids on the Block Pink Floyd The Eagles U2 Tina Turner ‘N-Sync The Backstreet Boys 98.0 74.1 121.2 103.5 79.4 89.3 79.9 80.2 76.4 109.7 86.8 82.1

17 Median: example 2 We rank the given data in increasing order as follows: We find the position of the middle term (n is even): The median is given by the mean of the sixth and the seventh values in the ranked data.

18 Median: example 3 The following data consider a questionnaire item on the time involvement of 11 scientists in the 'perception and identification of research problems': Great, Very low or nil, Very great, Very great, Great, Very great, Medium, Low, Great, Medium, Medium. We rank the given data in increasing order in the 'perception and identification of research problems‘ as follows: Very low or nil, Low, Medium, Medium, Medium, Great, Great, Great, Very great, Very great, Very great. Median

19 Median: frequency distribution
In order to find the median using frequency distributions, you must calculate the cumulative frequency distribution. Then, the first value with a cumulative frequency greater than or equal to the position of the middle value is the median. If the position of the middle value is exactly 0.5 more than the cumulative frequency of the previous value, then the median is the midpoint between the two values.

20 Median: example 4 First value  15.5 Median Stress on Job
Frequency (ni) Cumulative frequency None Somewhat Very 6 14 10 20 30 First value  15.5 Median

21 Median: example 5 First value  20.5 Median Vehicles Owned Number of
Households (ni) Cumulative frequency 1 2 3 4 5 18 11 = 20 = 31 = 35 = 38 =40 First value  20.5 Median 0.5 more than the previous cumulative frequency

22 Mean Definition Also called the arithmetic mean is the most frequently used measure of central tendency and is obtained by dividing the sum of all values by the number of values in the data set. The mean calculated for sample data is denoted by and the mean calculated for the population is denoted by

23 Mean: example 1 The following data are the 2002 total payrolls of 5 Major League Baseball (MLB) teams. MLB Team 2002 Total Payroll (millions of dollars) Anaheim Angels Atlanta Braves New York Yankees St. Louis Cardinals Tampa Bay Devil Rays 62 93 126 75 34

24 Mean: example 2 The following are the ages of all 8 employees of a small company.

25 Mean: frequency distribution
To find the mean of a frequency distribution multiply each value by its frequency and add them up. Then divide by the total number of elements in your data set:

26 Mean: example 3 Vehicles Owned ( ) Number of Households ( ) 1 2 3 4 5
( ) Number of Households ( ) 1 2 3 4 5 18 11 0*2=0 1*18=18 2*11=22 3*4=12 4*3=12 5*2=10 Sum 40 74

27 Mean: frequency distribution with classes
When data are organized in classes, we don’t know the values of individuals observations. In these cases the mean is computed as follows: where mi is the midpoint of each class interval

28 Mean: example 4 The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Number of orders (x) Number of days ( ni ) mi mi*ni 10 – 12 13 – 15 16 – 18 19 – 21 4 12 20 14 (10+12)/2=11 17 44 168 340 280 Sum n=50 ∑m*n = 832

29 Mode, Median Mean: relationships
Mode: one advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative. The mode is rarely used as a measure of central tendency for numeric variables. Median: the median can be computed for at least qualitative ordinal data. The advantage of using the median is that it is not influenced by outliers, and hence in this case it is preferred over the mean Mean: The mean can be calculated only for quantitative data. The mean is influenced by outliers. OUTLIERS or EXTREME VALUES Values that are very small or very large relative to the majority of the values in a data set.

30 Median, Mean, outlier: example1
The following Table shows the 2000 populations (in thousands) of the five Pacific states. State Population (thousands) Washington Oregon Alaska Hawaii California 5894 3421 627 1212 33.872 Outlier Mean without California= thousand Mean with California= thousand Median=3421 thousand

31 Mode, Median Mean: relationships
For a symmetric histogram and frequency curve with one peak the values of the mean, median, and mode are identical, and they lie at the center of the distribution. For a histogram and a frequency curve skewed to the right, the value of the mean is the largest, that of the mode is the smallest, and the value of the median lies between these two. If a histogram and a distribution curve are skewed to the left, the value of the mean is the smallest and that of the mode is the largest, with the value of the median lying between these two.

32 MEASURES OF POSITION

33 MEASURES OF POSITION Definition
A measure of position determines the position of a single value in relation to other values in a sample or a population data set. There are many measures of position; we will see quartiles and percentiles

34 QUARTILES QUARTILES Quartiles are three summary measures that divide a ranked data set into four equal parts. SECOND QUARTILE The second quartile is the same as the median of a data set. FIRST QUARTILE The first quartile is the value of the middle term among the observations that are less than the median. THIRD QUARTILE The third quartile is the value of the middle term among the observations that are greater than the median.

35 QUARTILES Each of these portions contains 25% of the observations of a data set arranged in increasing order 25% Q1 Q2 Q3 Approximately 25% of the values in a ranked data set are less than Q1 and about 75% are greater than Q1. The second quartile, Q2, divides a ranked data set into two equal parts (median). Approximately 75% of the values in a ranked data set are less than Q3 and about 25% are greater than Q3.

36 QUARTILES: example 1 The following table lists the total revenue for the 12 top-grossing North American concert tours of all time. Tour Artist Total Revenue (millions of dollars) Steel Wheels, 1989 Magic Summer, 1990 Voodoo Lounge, 1994 The Division Bell, 1994 Hell Freezes Over, 1994 Bridges to Babylon, 1997 Popmart, 1997 Twenty-Four Seven, 2000 No Strings Attached, 2000 Elevation, 2001 Popodyssey, 2001 Black and Blue, 2001 The Rolling Stones New Kids on the Block Pink Floyd The Eagles U2 Tina Turner ‘N-Sync The Backstreet Boys 98.0 74.1 121.2 103.5 79.4 89.3 79.9 80.2 76.4 109.7 86.8 82.1

37 QUARTILES: example 1 Also the median Values less than the median
Values greater than the median Also the median

38 QUARTILES: example 2 The following are the ages of nine employees of an insurance company: Values less than the median

39 PERCENTILES Percentiles are the summary measures that divide a ranked data set into 100 equal parts. Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The kth percentile is denoted by Pk 1% 1% 1% 1% 1% 1% Each of these portions contains 1% of the observations of a data set arranged in increasing order

40 PERCENTILES Calculating Percentiles
The (approximate) value of the kth percentile, denoted by Pk, is where k denotes the number of the percentile and n represents the sample size.

41 PERCENTILES: example Refer to the data on revenues for the 12 top-grossing North American concert tours of all time Tour Artist Total Revenue (millions of dollars) Steel Wheels, 1989 Magic Summer, 1990 Voodoo Lounge, 1994 The Division Bell, 1994 Hell Freezes Over, 1994 Bridges to Babylon, 1997 Popmart, 1997 Twenty-Four Seven, 2000 No Strings Attached, 2000 Elevation, 2001 Popodyssey, 2001 Black and Blue, 2001 The Rolling Stones New Kids on the Block Pink Floyd The Eagles U2 Tina Turner ‘N-Sync The Backstreet Boys 98.0 74.1 121.2 103.5 79.4 89.3 79.9 80.2 76.4 109.7 86.8 82.1

42 PERCENTILES: example Find the value of the 42nd percentile
The data arranged in increasing order as follows: The position of the 42nd percentile is Pk = 42nd percentile = 80.2 = $80.2 million Thus, approximately 42% of the revenues in the given data are equal to or less than $80.2 million and 58 % are greater than $80.2 million


Download ppt "DESCRIPTIVE MEASURES."

Similar presentations


Ads by Google