Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sections 3-1 & 3-2 Measures of Center.

Similar presentations


Presentation on theme: "Sections 3-1 & 3-2 Measures of Center."— Presentation transcript:

1 Sections 3-1 & 3-2 Measures of Center

2 Concepts In this chapter, we will revisit the concepts of Center, Variation, and Distribution in more detail.

3 Notation ∑ means add all values in a set x is a data value
n is the number of values in a sample N is the number of values in a population

4 Definition Measure of Center
Tells you something about the general center of the data. But there are different ways to measure this, which give very different information. Four Main Measures of Center: Mean Median Mode Midrange

5 (usually just called Mean)
Definition Arithmetic Mean (usually just called Mean) the average, obtained by adding the values and dividing the total by the number of values Notation sample mean (x-bar) population mean (mu)

6 Mean, Median, and Midrange
Round-off Rule for Mean, Median, and Midrange Round to one more decimal place than is present in the original set of values. If a number comes out even and you need more decimal places, add 0’s on the end. Even in cases where it seems odd (like having half a person), to be accurate a mean should not be rounded to a whole number.

7 Trimmed Mean Sometimes, to avoid the effect of extremely high or extremely low values (outliers), we will find a trimmed mean. A percentage is chosen, and we delete that percentage from both the top and the bottom. Then we find the mean of what is left.

8 Example Find the 5% trimmed mean of the following test scores:
99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45

9 Solution Find the 5% trimmed mean of the following test scores: 99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45 1. There are 19 scores. Find 5% of 19: 19(0.05) = 0.95, round to 1. 2. Remove the 1 highest and 1 lowest score to get: 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66 3. Find the mean of these 17 scores, and the trimmed mean is 83.5.

10 Weighted Mean  (w • x) x =  w
A mean in which some values count more than others (many professors use weighting of different categories to figure grades). x = w  (w • x)

11 Example Suppose your syllabus states that in this class, Homework is worth 10%, Quizzes are worth 20%, Exams are worth 50%, and the Final is worth 20%. You have the following scores: Homework: total of 130 out of 130 possible points Quizzes, each out of 20: 18, 17, 20, 19, 18, 16 Exams, each out of 100: 78, 75, 72 Final: 104 out of 150 possible Find the weighted mean.

12 Solution First, find the average for each category
Homework: total of 130 out of 130 possible points = 100% Quizzes, each out of 20: 18, 17, 20, 19, 18, 16 = 108/120 = 90% Exams, each out of 100: 78, 75, 72 = 225/300 = 75% Final: 104 out of 150 possible = 69.3%

13 Solution, continued Multiply each category score by what it is worth, divide by total for all weights: It does not matter whether you use the percents as they are, or change them to decimals, as long as you do the same thing when you add up the total for the bottom. Homework is worth 10%, Quizzes are worth 20%, Exams are worth 50%, and the Final is worth 20% So the weighted mean is 79.4, in this case 79.4%.

14 Mean of a Frequency Distribution
A mean of a frequency distribution is similar to a weighted mean. It can be estimated by using the midpoint of a class as the value for each item in that class. x = n  (f • x)

15 Example Suppose you are given the following frequency distribution for the ages of members in a particular social club. You are not given the original data. Estimate the mean age. Age Frequency

16 Solution To estimate the mean age, start by finding the midpoint of each class. Also find the total frequency. Age Midpoint Frequency Total Frequency = 28

17 Solution, Continued Now as an estimate, we can assume that each person in a class has the midpoint age, and use this to find the mean. Age Midpoint Frequency (4 people) = 98 (8 people) = 276 (9 people) = 400.5 (7 people) = 381.5 Total Frequency = Total Age = 1156 Estimated mean age = 1156  28 ≈ 41.3 years old

18 Definitions Median the center when the data values are put in order
denoted by x (pronounced ‘x-tilde’) is not affected by an extreme value the way a mean can be ~

19 Finding the Median If you have an odd number of values, take the one in the middle If you have an even number of values, average the two in the middle

20 Examples 1. Find the median for the test scores: 86, 52, 73, 82, 79
Put them in order: 52, 73, 79, 82, 86 *Odd number of values, so the median is the one in the middle: 79 Round one decimal place further than the originals, so 79.0 2. Find the median for the test scores: 100, 99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45 *An even number of values (20), so the median is the average of the middle two (the 10th and 11th): ( ) / 2 = 86.0

21 Calculator: TI–83 or TI-84 First we will talk about entering data into the calculator. Then we will use the calculator to put data in order (in case it is given to you all mixed up), which is helpful for finding the median or a trimmed mean. Then we will be using a function in your calculator to find the mean and median (and eventually lots of other stuff) without even having to sort the list. For the examples, let’s use these test scores: 86, 52, 73, 82, 79

22 Calculator Enter Data Press STAT Choose 1: Edit
Enter the list of data values under L1

23 Calculator Putting Data in Order
Press STAT Choose 2: SortA( This sorts in Ascending order Enter the list you want it to sort L1 is 2nd, 1; L2 is 2nd, 2; etc. Go back to STAT, 1: Edit Look at your list—it is now sorted.

24 Calculator 1-Var Stats Press STAT Arrow over to CALC
Select 1: 1-Var Stats A screen will pop up that says 1-Var Stats and has a blinking cursor at the end of it. It is waiting for you to clarify which set of data or list. If your data is all under L1, you can just hit ENTER now. If you put your data under a different column, you need to type in which list before hitting enter. (L2 is the 2nd 2, etc.) OR

25 1-Var Stats 1-Var Stats will give you a screen that looks somewhat like this. You can hit the down arrow to see more information. There are three important values for now is the sample mean, n is the number of data values you entered. Use the down arrow to see more statistics. Toward the bottom, Med is the median. Note that the calculator does not round correctly, but you should. You should be able to find these values by hand and on the calculator.

26 1-Var Stats for Frequency Distributions
Press STAT, 1: Edit Enter the class midpoints under L1 and the frequencies under L2 Press STAT, CALC Select 1: 1-Var Stats You need to tell it to use both columns. Type L1 (2nd 1), then press the comma button (over the 7), then type L2 (2nd 2), then ENTER. OR

27 highest score + lowest score
Definitions Midrange the value midway between the highest and lowest values in the original data set Midrange = highest score + lowest score 2

28 Definitions Mode The value that occurs most in the list denoted by M
The mode is not always unique. You can have one, two (bimodal), or more. If there is a tie for the value that occurs most often, then the mode is a list of all values in the tie. If none of the values repeat, there is NO mode. For a frequency distribution, the mode would be the class with the highest frequency.

29 Distributions The distribution is the shape when a frequency distribution or other data is represented visually. (Histogram, polygon, etc.) Symmetric Data is symmetric if the left half of its histogram is roughly a mirror image of its right half. (Need not be perfect.) Skewed Data is skewed if it is not symmetric and if it extends more to one side than the other.

30 Distributions

31 Section 3-3 Measures of Variation

32 In the next slide, imagine waiting in line at two different banks
In the next slide, imagine waiting in line at two different banks. Old Bank has three separate lines, and the next person in each line all have three different wait times. New Bank has one line, with different wait times for the next three people in line. The mean wait times are the same, but which bank would you rather go to? New Bank spreads the wait times out more evenly, so most people would prefer to go there.

33 Variation Old Bank New Bank 3 min 1 min 4 min 14 min 7 min 7 min

34 While both banks have the same average wait, at Old Bank you run the risk of picking the line with the 14 minute wait. To avoid this risk, most people would prefer to choose New Bank, because its wait times are more consistent. This shows that measures of center are not the only important issue in analyzing data. We are also interested in consistency, or variation.

35 3 Types of Variation Range Standard Deviation Variance
Variation is an EXTREMELY important topic in statistics. We will be emphasizing range and standard deviation more than variance.

36 Round-off Rule for Measures of Variation
Like mean and median, all measures of variation get rounded to one decimal place more than the original data values.

37 Easy to compute, but only gives limited information
Definition The range is the difference between the highest value and the lowest value Easy to compute, but only gives limited information

38 Range Old Bank New Bank 3 min 1 min 4 min 14 min 7 min 7 min

39 Standard Deviation Notation sample standard deviation s
Standard deviation measures how far the different data values tend to be from the mean. We will be finding this on the calculator, not by hand. (Though it can be helpful to look at the formula in the book to see exactly what is happening.) Notation sample standard deviation s population standard deviation σ (sigma)

40 Calculator Standard Deviation
We will be learning how to find standard deviation on the calculator instead of memorizing formulas. To do this, you are doing the exact same thing you did for finding the mean and median. (See earlier notes for how to do this for a frequency distribution.) STAT, 1: Edit, enter data under L1 STAT  Calc, 1: 1-Var Stats The calculator uses Sx to stand for s, the sample standard deviation. (But the real symbol is s.) If this is the whole population, look at σx, the calculator’s notation for the population standard deviation. (We rarely use this.) Try this for the sample of 3 times at Old Bank, and then again for New Bank.

41 Results Old Bank s = 7.0 min New Bank s = 1.7 min
New Bank has a smaller standard deviation, and therefore is the preferable bank to wait in line at. In general, smaller standard deviations are better, because they indicate less variation in values. In other words, we expect values to be close to the mean more often.

42 Things to Remember Standard deviation measures how spread out the data is—how far the data is from the mean, on average The value of the standard deviation is usually positive, sometimes 0, and never negative. Extreme values (outliers) can have a big effect on standard deviation The units (labels) for standard deviation are the same as the units of the original data values

43 Range Rule of Thumb This rule helps us to make sense out of standard deviation. It states that at least 75% of the data (95% in some cases) is within 2 standard deviations away from the mean. Thus, values farther than that are considered unusual. minimum usual value = mean – 2(standard deviation) maximum usual value = mean + 2(standard deviation)

44 Example We found that New Bank had a mean wait time of 6.0 minutes, with a standard deviation of 1.7 minutes. Find the usual range and interpret it.

45 Solution mean + 2(s) = 6.0 + 2(1.7) = 9.4
We found that New Bank had a mean wait time of 6.0 minutes, with a standard deviation of 1.7 minutes. Find the usual range and interpret it. mean + 2(s) = (1.7) = 9.4 mean – 2(s) = 6.0 – 2(1.7) = 2.6 We would expect a usual wait at New Bank to be anywhere between 2.6 and 9.4 minutes.

46 Definition Empirical Rule
For data sets having a symmetrical, bell-shaped distribution, we can be even more specific about percentages that fall within certain ranges. About 68% of all values fall within 1 standard deviation of the mean About 95% of all values fall within 2 standard deviations of the mean (This is the one we use the most!) About 99.7% of all values fall within 3 standard deviations of the mean

47 The Empirical Rule

48 The Empirical Rule

49 The Empirical Rule 2.35% 2.35% 0.15% 0.15%

50 Definition The coefficient of variation (or CV) for a set of sample or population data, expressed as a percent, describes the standard deviation relative to the mean. Sample Population This tells you what percent of the mean the standard deviation is.

51 Coefficient of Variation
The coefficient of variation expresses the standard deviation as a percent of the mean. This helps you determine whether the standard deviation is very large when compared to the mean. For example, a standard deviation of 100 is very large if the mean is only 10. The coefficient of variation is 1000%, which is an enormous variation. But a standard deviation of 100 is not so big if the mean is 10,000. The coefficient of variation is 1%, which is not a big variation at all.

52 In this class, we will not work with variance very much
In this class, we will not work with variance very much. It becomes much more important, however, in advanced statistics classes.

53 Measures of Relative Standing and Boxplots
Section 3-4 Measures of Relative Standing and Boxplots

54 Concepts In this section, we talk about comparing different data values.

55 Definition z-score (or standard score)
the number of standard deviations that a given value x is above or below the mean. This allows you to compare two values from different data sets.

56 Sample Population How To Find Z-Scores
Note: Whenever a value is less than the mean, its corresponding z-score is negative Round z-scores to 2 decimal places (always).

57 Interpreting Z-Scores
Note: Whenever a value is less than the mean, its corresponding z-score is negative Using the range rule of thumb: Ordinary values: z-score between –2 and 2 Unusual Values: z-score < -2 or z score > 2

58 Example NBA superstar Michael Jordan is 78 in. tall and WNBA basketball player Rebecca Lobo is 76 in. tall. Which player is relatively taller? Does Jordan’s height among men exceed Lobo’s height among women? Men have heights with a population mean of 69.0 in. and a pop. standard deviation of 2.8 in.; Women have heights with a pop. mean of 63.6 and a pop. standard deviation of 2.5 in.

59 Solution Find the z-score for each: Michael Jordan: Rebecca Lobo:
Rebecca Lobo is taller as compared to other women than Michael Jordan is when compared to other men because her z-score is higher. Rebecca is 4.96 standard deviations above the mean for women, and Michael Jordan is only 3.21 standard deviations above the mean for men. Both are considered unusually tall.

60 Percentiles You often hear about percentiles for standardized tests or height/weight of children. There are 99 percentiles denoted P1, P2, P99, which partition the data into 100 groups.

61 Finding the Percentile
of a Given Score Percentile of value x = • 100 number of values less than x total number of values Round to the nearest whole number

62 Example Use the heights of MAT 108 students listed in the table below. Find the percentile corresponding to a height of 65 in. Interpret the meaning of this number. 64 67 71 72 76 74 69 63 62 68 73 60 66 65 70

63 Solution First, we need to arrange the data in order. This can be done using your calculator. (See instructions in 3-2 PowerPoint.) 60 62 63 64 65 66 67 68 69 70 71 72 73 74 76

64 Solution Use the heights of MAT 108 students listed in the table below. Find the percentile corresponding to a height of 65 in. Interpret the meaning of this number. We need to count the numbers that are less than 65 in the table. There are nine data values less than 65. How many total data values? There are 34 total values. Plug into the equation: Interpretation: The height of 65 inches is approximately the 26th percentile. Meaning approximately 26% of the heights fall below 65 inches, or approximately 74% of the heights fall above 65 inches.

65 Corresponding Data Value
Converting from the kth Percentile to the Corresponding Data Value Notation n total number of values in the data set k percentile being used L locator that gives the position of a value Pk kth percentile L = • n k 100 If you get a decimal, round up to the next whole number. If you get a whole number, locate the number in this position in the list and average it with the next value in the list.

66 Example Use the heights of MAT 108 students (in inches) listed in the table below. Find P60 and P50. 60 62 63 64 65 66 67 68 69 70 71 72 73 74 76

67 Solution For P60: Use the formula for L:
Find the 21st number in the list, so P60 = 69 So the 60th percentile is a height of 69 in. About 60% of people in the class are shorter than, and about 40% are taller.

68 Solution (continued) For P50: Use the formula for L:
Find the 17th number in the list, and average it with the 18th so P50 = ( )/2 = 68. (Like finding the median of a data set with an even number of values.)

69 divide values into four equal parts
Quartiles Q1, Q2, Q3 divide values into four equal parts 25% Q3 Q2 Q1 (minimum) (maximum) (median) Note: quartiles divide the scores so the same number of scores fall into each part; the cutoffs will not necessarily be evenly spaced

70 Definition Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.

71 Quartile/Percentile Connection
Q1 (1st quartile) = P25 (25th Percentile) Q2 (2nd quartile) = P50 (50th Percentile) Q3 (3rd quartile) = P75 (75th Percentile)

72 Definitions For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value. Quick tip: 1-Var Stats gives a list of these values if you use the down arrow to scroll down.

73 Definitions Interquartile Range (IQR) = Q3 – Q1
The range of the middle 50% The concepts of quartiles and IQR allow us to be more specific about outliers. Outliers are values more than 1.5 IQR’s above Q3 or below Q1: less than Q1 – 1.5(IQR) greater than Q (IQR)

74 Example Find the 5-number summary and IQR for the following test score data, using your calculator. Identify any outliers. 45, 66, 67, 72, 75, 76, 79, 80, 85, 85, 85, 87, 87, 88, 90, 91, 91, 92, 98, 99, 100

75 Solution The 5-Number Summary for the test score data would be: min = 45, Q1 = 75.5, Q2 = 85, Q3 = 91, max = 100 The Interquartile Range would be IQR = 91 – 75.5 = (The middle 50% of the data falls within a 15.5 point range. It can be informative to know if the middle 50% falls within a large or small range of values.)

76 Solution Outliers would be less than
Q1 – 1.5(IQR) = 75.5 – 1.5(15.5) = 52.25 Or greater than Q (IQR) = (15.5) = The only outlier in this case would be the score of 45.

77 Definitions Boxplot (or box-and-whisker)
a graph that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3 These can be drawn either horizontally or vertically.

78 range = 55 (length of line)
Example Boxplot for the test score data: range = 55 (length of line) IQR = 15.5 (width of box) 45 min 75.5 Q1 85 Q2, med 91 Q3 100 max The width of the whole line is the range of the data, and the width of the box is the interquartile range. The line at the middle of the box indicates the median.

79 66 smallest non-outlier value
Definitions Modified Boxplot a boxplot with lines that don’t extend to the outliers—these have a dot or asterisk instead 45 min/outlier 66 smallest non-outlier value 75.5 Q1 85 Q2, med 91 Q3 100 max

80 Analyzing Boxplots We use boxplots to analyze and compare center (median), variation (range and IQR), distribution, and outliers of data sets. If we have the original data, we can also compare means and standard deviations.

81 Example These are box and whisker plots for test scores from two different classes. Center: The first class has a higher median. (Look at the line inside the box. Think of left to right as an x-axis. The first median has a higher x-value than the second.) Variation: The second class has a much wider range than the first (because the line is longer). The second class also has a wider IQR (because its box is wider). Distribution: First class is skewed left, second is slightly skewed right. Outliers: There don’t appear to be any, though a modified boxplot would make this more obvious.


Download ppt "Sections 3-1 & 3-2 Measures of Center."

Similar presentations


Ads by Google