Download presentation
Presentation is loading. Please wait.
1
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach 7th Edition Chapter 3 Describing Data Using Numerical Measures
2
Chapter Goals After completing this chapter, you should be able to:
Compute and interpret the mean, median, and mode for a set of data Compute the range, variance, and standard deviation and know what these values mean Construct and interpret a box and whisker graph Compute and explain the coefficient of variation and z scores Use numerical measures along with graphs, charts, and tables to describe data
3
Chapter Topics Measures of Center and Location Mean, median, mode
Other measures of Location Weighted mean, percentiles, quartiles Measures of Variation Range, interquartile range, variance and standard deviation, coefficient of variation Using the mean and standard deviation together Coefficient of variation, z-scores
4
Summary Measures Describing Data Numerically Center and Location
Other Measures of Location Variation Mean Range Percentiles Median Interquartile Range Quartiles Mode Variance Weighted Mean Standard Deviation Coefficient of Variation
5
Measures of Center and Location
Mean Median Mode Weighted Mean
6
Mean (Average) The most common measure of central tendency
(continued) The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) Mean = 3 Mean = 4
7
Median In an ordered array, the median is the “middle” number, i.e., the number that splits the distribution in half The median is not affected by extreme values Useful when data are highly skewed Median = 3 Median = 3
8
Which measure of location is the “best”?
Mean is generally used, unless extreme values (outliers) exist Then Median is often used, since the median is not sensitive to extreme values. Example: Realtors say “Median Home Prices”….not average – less sensitive to outliers
9
Shape of a Distribution based on Mean and Median
Describes how data is distributed Symmetric or skewed Symmetric Right-Skewed Left-Skewed Mean < Median Mean = Median Median < Mean (Longer tail extends to left) (Longer tail extends to right)
10
Mode The value that is repeated most often
Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes Mode = 5 No Mode
11
Implication Five houses on a hill by the beach
House Prices: $2,000, , , , ,000
12
Implication Mean: ($3,000,000/5) = $600,000
= $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 House Prices: $2,000,000 500, , , ,000 Sum 3,000,000
13
Practice From the class website, download “describing numerical data using Excel.” Try followings: Mean vs. Median Hockey Shoes
14
Weighted Mean Used when values are grouped by frequency or relative importance (i.e., GPA) Example: Sample of 26 Repair Projects Weighted Mean Days to Complete: Days to Complete Frequency 5 4 6 12 7 8 2
15
Other Location Measures
Other Measures of Location Percentiles Quartiles A percentile is a measure that tells us what percent of the total frequency scored at or below that measure. SAT score 90th percentile: the score is as high as or higher than 90% of other students who took the SAT test. When statistical data is ordered sequentially, the data can be divided into 4 groups of data. Each of the groups of data is separated by a line called a Quartile. Example 3-8 on pp. 99
16
Percentiles Example 1: Find the 60th percentile in an ordered array of 19 values. So, use value in the i = 12th position If i is not an integer, round off to the next higher integer value. If i is an integer, the pth percentile is the average of the values in position i and i+1.
17
Quartiles Quartiles split the ranked data into 4 equal groups:
Note that the second quartile (the 50th percentile) is the median 25% 25% 25% 25% Q1 Q2 Q3
18
Quartiles 25 Example: Find the first quartile (25th percentile)
Sample Data in Ordered Array: (n = 9) Q1 = 25th percentile, so find i : i = (9) = 2.25 so round up and use the value in the 3rd position: Q1 = 13 25 100
19
Practice Use “describing numerical data using Excel” file, try following: Salary
20
Box and Whisker Plot Statistics assumes that your data points (the numbers in your list) are clustered around some central value. The "box" in the box-and-whisker plot contains and highlights the middle half of data points. It is useful for quickly finding outliers - data points out of line with the rest of the data set. The Box and Whisker plot can be shown in either vertical or horizontal format. Example 3-9 on pp.100
21
Box and Whisker Plot Outliers are plotted outside the calculated limits and the center box extends from Q1 to Q3. 25% % % % * * Outliers Lower st Median rd Upper Limit Quartile Quartile Limit The lower limit is Q1 – 1.5 (Q3 – Q1) The upper limit is Q (Q3 – Q1)
22
Box-and-Whisker Plot Example
Below is a Box-and-Whisker plot for the following data: This data is right skewed, as the plot depicts Min Q Q Q Max * Upper limit = Q (Q3 – Q1) = (6 – 2) = 12 27 is above the upper limit so is shown as an outlier
23
Distribution Shape and Box and Whisker Plot
Left-Skewed Symmetric Right-Skewed Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
24
Practice Using Free Online Tool
Using the data set on the website, develop a Box-and-Whisker Plot (free online tool).
25
Coefficient of Variation
Measures of Variation Variation Range Variance Standard Deviation Coefficient of Variation Population Standard Deviation Interquartile Range Population Variance Sample Variance Sample Standard Deviation
26
Variation Measures of variation give information on the spread of the data values. Same center, different variation
27
Range (refer to Measures of Variability)
Simplest measure of variation Difference between the largest and the smallest observations: Range = xmaximum – xminimum Example: Range = = 13
28
Disadvantages of the Range
Ignores the way in which data are distributed Sensitive to outliers (extreme values) Range = = 5 Range = = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = = 119
29
Interquartile Range (refer to Measures of Variability)
Eliminate range’s susceptibility to extreme values Can eliminate some outlier problems Eliminate some high-and low-valued observations and calculate the range from the remaining values. Interquartile range = 3rd quartile – 1st quartile Example 3-10 on pp.108
30
Interquartile Range Example
Median (Q2) X X Q1 Q3 maximum minimum 25% % % % Interquartile range = 57 – 30 = 27
31
Variance Average of squared deviations of values from the mean
Population variance: Sample variance:
32
Standard Deviation Most commonly used measure of variation
Shows variation about the mean Has the same units as the original data Population standard deviation: Sample standard deviation:
33
Comparing Standard Deviations
Same mean, but different standard deviations: Data A Mean = 15.5 s = 3.338 Data B Mean = 15.5 s = .9258 Data C Mean = 15.5 s = 4.57
34
Practice Try the “Variance Vs. Standard Deviation” link on the class website Use “describing numerical data using Excel” file, try following: Std.
35
Coefficient of Variation
The coefficient of variation measures the relative variation for distributions with different means. In finance, CV measures the relative risk of a stock portfolio It is usually expressed as a percentage. Population Sample
36
Comparing Coefficients of Variation
Stock A: Average price last year = $50 Standard deviation = $5 Stock B: Average price last year = $100 Both stocks have the same standard deviation, but stock B is less variable relative to its price
37
The Empirical Rule If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample 68%
38
The Empirical Rule contains about 95% of the values in the population or the sample contains about 99.7% of the values in the population or the sample 95% 99.7%
39
Practice: Getting Descriptive Statistics Using Excel
Descriptive Statistics are easy to obtain from Microsoft Excel Use menu choice: Data / data analysis / descriptive statistics Enter details in dialog box
40
Open Excel and then enter the data
Select: Data / data analysis / descriptive statistics
41
Using Excel Enter dialog box details Check box for summary statistics
(continued) Enter dialog box details Check box for summary statistics Click OK
42
Excel output Microsoft Excel descriptive statistics output,
using the house price data: How to read “Skewness”? House Prices: $2,000,000 500, , , ,000
43
Chapter Summary Described measures of center and location
Mean, median, mode, weighted mean Discussed percentiles and quartiles Created Box and Whisker Plots Illustrated distribution shapes Symmetric, skewed
44
Chapter Summary Described measure of variation
(continued) Described measure of variation Range, interquartile range, variance, standard deviation, coefficient of variation Discussed Tchebysheff’s Theorem Calculated standardized data values
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.