Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Central Tendency Measures of Variability

Similar presentations


Presentation on theme: "Measures of Central Tendency Measures of Variability"— Presentation transcript:

1 Measures of Central Tendency Measures of Variability
The University of the West Indies School of Education Introduction to Statistics 5/12/2019 Dr. Madgerie Jameson UWI School of Education Madgerie Jameson Lessons 3 and 4 Measures of Central Tendency and Measures of Variability

2 Topics Measures of central tendency Measures of position
5/12/2019 Measures of central tendency Mean Median Mode Measures of position Percentiles Quartiles Measures of Variability Range Variance Standard Deviation Standard Score Madgerie Jameson

3 OBJECTIVES By the end of the session you will be able to
Compute the mean, median and mode in a given set of scores. Compute the inter-percentile range of a given set of scores. Explain the importance of variability in a data set. Compute the range, standard deviation and variance of a data set. Compare and Contrast standard deviation and variance. Compute and interpret the Z score Describe the normal curve

4 Summary Measures Describing Data Numerically Variation Shape Skewness
Central Tendency Quartiles Variation Shape Arithmetic Mean Range Skewness Median Interquartile Range Mode Variance Standard Deviation Coefficient of Variation

5 Measures of Central Tendency
5/12/2019 Measures of Central Tendency Mean Mode Median Madgerie Jameson

6 Measures of Central Tendency
Overview Central Tendency Arithmetic Mean Median Mode Midpoint of ranked values Most frequently observed value

7 Mean 5/12/2019 The most common type of average. It is simply the sum of all the values in the group, divided by the number of values. Population Mean: Sample Mean: Frequency Mean: Madgerie Jameson

8 Computing the Mean 5/12/2019 Compute the average number of shoppers in three different Pennywise locations. Location Number of monthly customers Trincity 2150 Tunapuna 1534 Port of Spain 3564 Madgerie Jameson

9 5/12/2019 = Madgerie Jameson

10 Note 5/12/2019 The mean is sometimes represented by the letter M and is also called the typical average, or most central score. In the formula, a lower case n represents the sample size. An uppercase N represents the population size. Madgerie Jameson

11 Computing the mean from a grouped frequency distribution
5/12/2019 List all the values in the sample for which the mean is being computed. List the frequency with which each value occurs Multiply the value by the frequency Sum all the values Divide by the total frequency Madgerie Jameson

12 Example 5/12/2019 Here is a table that shows the values and frequency in a General Paper test for 100 Form 3 students Value Frequency Value x frequency 97 4 388 11 1034 92 12 1104 91 21 1911 90 30 2700 89 1068 78 9 702 60 1 Total 100 8967 Madgerie Jameson

13 5/12/2019 The weighted mean is Madgerie Jameson

14 Median Defined as the midpoint in a set of scores.
5/12/2019 Defined as the midpoint in a set of scores. No standard formula for computation. The following steps are useful List all values in order of magnitude. From the highest to lowest or lowest to highest. Find the middle score Madgerie Jameson

15 Example Here are the incomes from five different households
5/12/2019 Here are the incomes from five different households $ , $25 500, $32 456, $ and $37 668 Here is the list ordered from the highest to the lowest. $ , $54 365, $37,668, $ and $ There are five values the mid-most value is $ and that is the median Madgerie Jameson

16 When there is an even number the median is
If there are even values, for example, if I added a sixth value $34, 500 to the list. $ , $54 365, $37,668,$34 500, $32 456 and $ When there is an even number the median is simply the mean between the two middle numbers. $37,668 and $ The mean is $ That is the median for the set of six numbers. If the two middle most values are the same then the median is both numbers, e.g. 9, 7, 6, 6, 5, 4. The median is 6. 5/12/2019 Madgerie Jameson

17 Percentile Points 5/12/2019 If you know about medians, you should know about percentile points. They are used to define the percentage of cases equal to and below a certain point. The kth percentile is the number which has k% of the value below it. For example, If a particular score is at the 75th percentile. It means that the score is at or above 75% of the other scores in the distribution. The median is known as the 50th percentile because it is the point below which 50% of the cases in the distribution fall. Madgerie Jameson

18 Quartiles Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25% Q1 Q2 Q3 The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile

19 Quartile Formulas Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q1 = (n+1)/4 Second quartile position: Q2 = (n+1)/2 (the median position) Third quartile position: Q3 = 3(n+1)/4 where n is the number of observed values

20 Quartiles Example: Find the first quartile
Sample Data in Ordered Array: (n = 9) Q1 = is in the (9+1)/4 = 2.5 position of the ranked data so use the value half way between the 2nd and 3rd values, so Q1 = 12.5 Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency

21 Quartiles The quartile divide the data into 4 equal regions.
5/12/2019 The quartile divide the data into 4 equal regions. Q1 : The first quartile is the 25th percentile Q2 : The second quartile is the 50th percentile or the median Q3 : the third quartile is the 75th percentile Q4 : the fourth quartile is the 100th percentile. Madgerie Jameson

22 Interquartile Range (IQR)
5/12/2019 The distance between the 75th percentile and the 25th percentile. It is the range of the middle 50% of the data set. It is not affected by the extreme outliers or extreme values. To compute the value in the 25th percentile .25* ( n +1) n is the number of values For the 75th percentile .75* ( n +1). Madgerie Jameson

23 Example Compute the IRQ for the following data
5/12/2019 Compute the IRQ for the following data 18, 33, 58, 67, 73, 93, and 147 The 25th and 75th percentiles are .25* ( 7 +1) and .75 *( 7 +1). = 2nd and 6th observations respectively. IQR = 93 – 33 = 60. Madgerie Jameson

24 The Mode The value that occurs most frequently in a data set.
5/12/2019 The value that occurs most frequently in a data set. To compute the mode List all values in a distribution, but list it only once Tally the number of times that each value occurs. The value that occurs the most often is the mode. Madgerie Jameson

25 Mode A measure of central tendency Value that occurs most often
Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes No Mode Mode = 9

26 Example 5/12/2019 An example of the marital status of 300 students resulted in the following distribution of scores The mode is the value that occurs most frequently, which in the above example is single students. Marital status Frequency Single 140 Married 100 Divorce 60 Madgerie Jameson

27 Bimodal 5/12/2019 If every value in a distribution contains the same number of occurrences then there is no mode. If more than one value appears with equal frequency, the distribution is multimodal. A data set can be bimodal with two modes. Madgerie Jameson

28 Example 5/12/2019 Hairstyle Frequency Extensions 45 Chemically treated Natural 12 Bald 7 Madgerie Jameson In the above example the distribution is bimodal because the frequency values of extensions and chemically treated hair occurs equally.

29 Review Example Five houses on a hill by the beach
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Five houses on a hill by the beach House Prices: $2,000, , , , ,000

30 Review Example: Summary Statistics
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. House Prices: $2,000,000 500, , , ,000 Sum 3,000,000 Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000

31 Which measure of location is the “best”?
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be reported for a region – less sensitive to outliers

32 Practice time 5/12/2019 Compute the mean, mode and median for the following data set. Identify the score in Q1 and Q3. Calculate the IQR for the data Set. Madgerie Jameson

33 Reading Scores 5/12/2019 31 32 43 42 24 34 25 44 23 36 41 28 14 21 17 13 26 12 52 Madgerie Jameson

34 Lunch Time 5/12/2019 Madgerie Jameson

35 Coefficient of Variation
Measures of Variation Variation Range Interquartile Range Variance Standard Deviation Coefficient of Variation Measures of variation give information on the spread or variability of the data values. Same center, different variation

36 Measures of variability
Illustrate the space between scores in a data set. Example: Student A: 10, 12, 15, 18 and 20. Student B: 8, 2, 8, 15, 22 and 28 Same mean of 15 Student B’s scores are more varied than student A’s.

37 Range = Xlargest – Xsmallest
Simplest measure of variation Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest Example: Range = = 13

38 Disadvantages of the Range
Ignores the way in which data are distributed Sensitive to outliers Range = = 5 Range = = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = = 119

39 Range A simple measure of variability. Recap: r = h – l. Range for Student A = 20 – 10 = 10 Range for Student B = 28 – 2 = 26 Disadvantage: not reliable when using large samples.

40 variance The average of the squared deviation from the mean.
Formula for computing variance Population Variance 2 = Sample Variance S2 =

41 Formula Applied Step 1 Compute the mean of the sample
Step 2 Subtract the mean from each score Step 3 Square each deviation from the mean Step 4 Add up the squared Deviations from the mean. This number is called the Sum of Squares (ss)

42 Formula applied contd Step 5 Compute the sample variance by dividing the sum of squares by (n-1) s2 = Practice page 5 learning module.

43 Formula for computing Standard Deviation
Describes variability in terms of average distance from the mean. Formula for computing Standard Deviation Population SD Sample SD

44 Calculation Example: Sample Standard Deviation
Sample Data (Xi) : n = Mean = X = 16 A measure of the “average” scatter around the mean

45 Measuring variation Small standard deviation Large standard deviation

46 Comparing Standard Deviations
Data A Mean = 15.5 S = 3.338 Data B Mean = 15.5 S = 0.926 Data C Mean = 15.5 S = 4.570

47 Advantages of Variance and Standard Deviation
Each value in the data set is used in the calculation Values far from the mean are given extra weight (because deviations from the mean are squared)

48 The normal curve Shows a distribution of values where the mean and mode are equal to each other. Characteristics of the Normal Curve Symmetric Extends to +/- infinity Area under the curve is = 1 Can be completely specified by two parameters mean and standard deviation Empirical Rule: A handy quick estimate of the spread of the data given the mean and standard deviation of the data set.

49 The Empirical Rule If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample Data within 1 standard deviation from the mean. The empirical rule state that 68% of data elements are within one standard deviation from the mean. 68%

50 The Empirical Rule contains about 95% of the values in
the population or the sample contains about 99.7% of the values in the population or the sample 95% 99.7%

51 Skewness The quality of the distribution that defines the lack of symmetry or lopsidedness of a distribution of scores. The tail of the distribution is longer at one end Positively skewed: when the right tail of the distribution is longer than the left. Negatively skewed when the left tail of the distribution is longer than the right.

52

53 Kurtosis Kurtosis is how flat or peaked a distribution appears
Platykurtic refers to a distribution that is relatively flat compared to the bell curve. More dispers Leptokurtic refers to a distribution that is relatively peaked compared to the normal or bell curve. Mesokurtic refers to the bell curve.

54

55 Standard score or z score
The z score standardises any distribution so that he mean is equal to zero and the standard deviation is equal to one. Formula to calculate the z score Z =

56 summary Measures of variability enhance the ways we describe data. They describe the spread of scores around the average. The normal curve shows the distribution of values where the mean, median and mode scores are equal to each other. The z score standardises any distribution so that the mean is equal to zero and the standard deviation is equal to one.

57 References 5/12/2019 Berenson (2004) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Best, J.W. & Kahn, J.V. ( 2006). Research in Education. Boston: Pearson Phillips, JL (2000). How to think about statistics. New York Salkind, N. J. (2008) Statistics for people who (think they) hate Statistics. CA: Sage Madgerie Jameson


Download ppt "Measures of Central Tendency Measures of Variability"

Similar presentations


Ads by Google