Presentation is loading. Please wait.

Presentation is loading. Please wait.

Choosing the “Best Average”

Similar presentations


Presentation on theme: "Choosing the “Best Average”"— Presentation transcript:

1 Choosing the “Best Average”
The shape of your data and the existence of any outliers may help you choose the best average:

2 This distribution of stolen bases is skewed right, with a median of 5, as noted on the histogram.
It does not seem plausible that the balancing point (mean) is also 5. Because the distribution is stretched out to the right, the mean must be greater than 5. Think of all the extreme values that will pull the mean up.

3 Being able to identify the shape and center of a distribution is a great start. However, two distributions can have the same shape and center, but look quite different. Here are the dotplots that show 100 PERFORMANCES by two different bowlers. Both distributions are unimodal and symmetric, with centers around However, it is important to compare the spread (variability) of the distributions.

4 Measuring Variability
In sports, it is important to measure variability because it shows the consistency of an athlete or team. For example, if the distribution of an athlete’s PERFORMANCES has little variability, it means that he or she is very consistent. There are several ways to measure the variability of a distribution. We will focus on the range and the interquartile range.

5 Range The range of a distribution is the distance between the minimum value and the maximum value. Examples: The range of AL runs= = 255 runs The range of NL runs= = 218 runs We have some evidence there is less variability in the NL distribution. Range can be a bit deceptive if there is an unusually high or unusually low value in a distribution. For this reason, we often use a second measure of variability called the…

6 Interquartile Range The interquartile range (IQR) is a single number that measures the range of the middle half of the distribution, ignoring the values in the lowest quarter of the distribution and the values in the highest quarter of the distribution

7 In order to calculate the IQR, we first have to find the quartiles of the distribution, which are the values that divide the distribution into four groups of roughly the same size.

8 The IQR for the NL distribution is smaller than the IQR for the AL distribution. Therefore, we have evidence that there is less variability in the NL distribution.

9 Unusually large or small values can have a big impact on measures like the mean and range.
Think about if we were going to calculate the mean salary and range of salaries for students in this classroom. Let’s say Adam Sandler finds out he is one class short of graduating high school, and that class happens to be Algebra I. He moves to LaGrange and transfers into this class. What effect would his salary have on the mean? On the range? What type of effect would it have on the median? On IQR?

10 Outliers Outliers are any value that falls out of the pattern of the rest of the data (unusually high or unusually low values in a distribution). Outliers can have a big effect on summary statistics, such as the mean and range.

11 Here are Tennessee Titan’s running back Chris Johnson’s yards for each rush during a game against the Houston Texans on September 20, Do there appear to be any outliers? The mean is brought up greatly by the two outliers. However, the median is relatively unaffected.

12 A measure of center or spread is resistant if it isn’t influenced by outliers.
Median and interquartile range are resistant to outliers Mean and range are not resistant to outliers

13 The rule of thumb for an observation being an outlier is if the observation lies more than 1.5 IQR’s below the first quartile or above the third quartile. Let’s practice using Chris Johnson’s 16 rushing attempts from the September 20, 2009 game against the Texans.

14

15

16

17 Comparing Distributions
When asked to compare two distributions, you must address four points: The shape The outliers The center The spread Think of the acronym SOCS to help you remember what to address.

18 The shape of a distribution may be difficult to determine from a boxplot.
Try comparing the distance from the median to the minimum and maximum values to determine if a distribution is skewed or roughly symmetric. You will not be able to tell if a distribution is unimodal from looking at a boxplot.

19 Here are boxplots for the number of runs scored in the AL and in the NL during (Note: the plots are on the same scale for comparison purposes.) Let’s compare using our four points.

20 Shape The AL distribution is skewed slightly left (the left half of the distribution appears more spread out). The NL distribution is approximately symmetric.

21 Outliers Neither distribution contains an outlier.

22 Center Typically, teams score more runs in the AL because the median for the AL distributions is higher than the median for the NL distributions.

23 Spread The AL distribution is slightly more spread out because it has both a larger range and larger IQR. This indicates there is more variability among AL teams and more consistency among NL teams.

24 Using the TI-84 to Make Graphs and Calculate Summary Statistics
As fun as it is to calculate everything by hand, the TI-84 calculator can do many of our calculations for us. The calculator can create boxplots, histograms, and calculate summary statistics.

25 Boxplot Let’s use our 2008 run data. Here are the numbers:
AL runs scored: NL runs scored:

26 The first thing we have to do is store this data as a list.
Press STAT and choose the first option EDIT Enter the 14 AL data values in L1 and the 16 NL values in L2

27 Now we are going to set up the boxplot. Exit back into the home screen
Now we are going to set up the boxplot. Exit back into the home screen. Then press STAT PLOT (2nd and y= ). Choose Plot1. Then, turn Plot1 on. Scroll to Type and choose the boxplot icon (with outliers). It is the first option in the second row. Enter L1 for Xlist. Enter 1 for Freq. Choose a mark for outliers.

28 Now we will display the graph. Press ZOOM
Now we will display the graph. Press ZOOM. Then select option 9: ZOOMSTAT. Press enter. Press TRACE and scroll around to see different statistics for the distribution.

29 To see the boxplot for the NL distribution at the same time:
Go back into STAT PLOT and turn on Plot2. Repeat the steps, but enter L2 for Xlist. To do this, scroll down to Xlist. Then press 2nd-2 (you will see the L2 button on top of the number 2).

30 Histogram Note: We can only view one histogram at a time.
Start by pressing STAT PLOT. We want to turn on Plot 1. Make sure no other plots are turned on. Once in Plot1, change Type to Histogram. Enter L1 for Xlist. Keep Freq at 1.

31 To display the graph, press ZOOM and select the 9th option 9: ZOOMSTAT.
Press TRACE to see the class boundaries and frequencies.

32 To change the boundaries, press WINDOW.
Xmin defines where the first class begins and Xscl defines the class width. Xmax, Ymin, and Ymax define how big the window will be. To have classes of size 50 starting at 600, adjust your setting to match the example below.

33 Calculating Summary Statistics
Make sure your run data is still stored in lists. Press STAT, scroll to the CALC menu, and choose the first option 1:1-Var Stats Next, press 2nd-1 to indicate you want the statistics for L1. Then press enter.

34 Here is the information given. Scroll down for additional information.

35 To get the data for the NL distribution, repeat the process using L2.

36 One iPad app that can calculate summary statistics for us is called Bstatistics Lite.
Download it!!! When entering data, observations have to be separated with commas.


Download ppt "Choosing the “Best Average”"

Similar presentations


Ads by Google