Presentation is loading. Please wait.

Presentation is loading. Please wait.

Numerical Descriptive Measures

Similar presentations


Presentation on theme: "Numerical Descriptive Measures"— Presentation transcript:

1 Numerical Descriptive Measures

2 Motivation What is the “average consumer” exactly?
Why is it that if an average yield on an investment (e.g. mutual fund) is 28% that I’ve lost money? At Morning Brew people arrive on average 1 per min and it takes typically 1 min to serve them, why is it that if I staff the register with 1 person people complain that the lines are too long and often leave before purchasing something? If I plan our payables based on an average daily receivables of $7,000 why have I gone bankrupt? Why do students always want to know what the average on the exam was?

3 Summary Measures Describing Data Numerically Central Tendency
Quartiles Variation Shape Arithmetic Mean Range Skewness Median Interquartile Range Kurtosis Mode Standard Deviation Coefficient of Variation

4 Measures of Central Tendency
Overview Central Tendency Arithmetic Mean Median Mode “Balance Point” of data. Usually not in data set. Midpoint of ranked values. In an ordered array, the median is the “middle” number (50% above, 50% below). May by in data set. Most frequently observed value (multiple or may not exist esp. continuous data). Always in data set.

5 Which One to Use? Mean is generally used, unless extreme values (outliers) exist Median is often used, since the median is not sensitive to extreme values. Mode is rarely used because there may be no mode, and there may be several modes …. 500 Mean = 58 Median = 3 Mode = 2, 4 Mean = 3 Median = 3 Mode = 2, 4

6 Quartiles Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25% Q Q Q3 25th Percentile th percentile 75th percentile The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile

7 Class Exercise: Tendency & Histograms
mode 5 median 4.9 Q1 4.4 Q2 4.9 Q3 5.1 Q4 6 mean Place the mean, median, mode on the histogram. What do you see?

8 Example Median home prices usually are reported for a region – less sensitive to outliers Example: Five houses on a hill by the beach Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 House Prices: $2,000,000 500, , , ,000 Sum $3,000,000 Think about this: Which average best helps you decide what to offer for a house? How about set selling price? What other considerations might there be?

9 What’s The Difference? Mean $600,000

10 Coefficient of Variation
Measures of Variation Measures of variation give information on the spread or variability of the data values. More variable Less variable Range Interquartile Range Variation Standard Deviation Same center, different variation Coefficient of Variation

11 Range and Interquartile Range
Example: Median (Q2) X Q1 Q3 minimum X maximum 25% % % % Range = 70 – 12 = 58 Range = Xmaximum– Xminimum Simplest measure Sensitive to outliers Interquartile range = 57 – 30 = 27 Interquartile Range = Q3 – Q1 Measure Middle 50% Eliminate outliers Problem

12 Disadvantages of the Range
Range ignores the way in which data are distributed Range = = 7 IQR = 11 – 8 = 3 Range = = 7 IQR = 11 – 10 = 1 Range is also sensitive to outliers 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,5 Range = = 4 IQR = 2 – 1 = 1 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,120 Range = = IQR = 2 – 1 = 1

13 Standard Deviation Most commonly used measure of variation
Each value in the data set is used in the calculation Shows variation from the mean Values far from the mean are given extra weight (because deviations from the mean are squared) Has the same units as the original data Sample standard deviation:

14 Coefficient of Variation
Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of data measured in different units

15 Comparing Variation Standard Deviations
Data A Mean = 15.5 SD = 3.338 Data B Mean = 15.5 SD = 0.926 Data C Mean = 15.5 SD = 4.567

16 Comparing Variation Coefficient of Variation
Stock A: Average price last year = $50 Standard deviation = $5 Stock B: Average price last year = $100 Both stocks have the same standard deviation, but stock B is less variable relative to its price

17 Shape of a Distribution
Describes how data are distributed Measures of shape Symmetric or skewed Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Median < Mean

18 Normal or Bell-shaped Curve
Normal Exact Normal Ok Skewness = 0 Kortosis = 0 -1< SK <+1 -1< K <+1 Normal or Bell-shaped Curve Mean and Standard Deviation define what a normal curve looks like Example: Mean: μ =0 Standard Deviation: σ=15 IQR=20 Right Skewed SK >0 Left Skewed SK <0 Peaked K >0 Flat K <0 Box-and-Whisker Approx. 50% Approx. 68% Approx. 95% Almost 100% K>0 K<0 SK>0 SK<0 Mean, Mode, Median Peaked Flat Right skewed Left skewed

19 The Empirical Rule If the data distribution is approximately bell-shaped, then the interval: contains about 68% of the values in the population or the sample 68%

20 The Empirical Rule contains about 95% of the values in the population or the sample contains about 99.7% of the values in the population or the sample 95% 99.7%

21 Exploratory Data Analysis
5-number summary: Box-and-Whisker Plot: A Graphical display of 5-number summary. It shows both Central Tendency, Variation, and Shape of the numerical variable. Minimum -- Q1 -- Median -- Q3 -- Maximum 25% % % % Central Tendency Variation

22 Shape and B-n-W Plot Left-Skewed Symmetric Right-Skewed Q1 Q2 Q3 Q1 Q2

23 Shape and B-n-W Plot Cont’d
Left-Skewed Symmetric Right-Skewed Peaked Flat


Download ppt "Numerical Descriptive Measures"

Similar presentations


Ads by Google