4B-1. Descriptive Statistics (Part 2) Standardized Data Standardized Data Percentiles and Quartiles Percentiles and Quartiles Box Plots Box Plots Chapter.

Slides:



Advertisements
Similar presentations
Descriptive Statistics (Part 2)
Advertisements

Statistical Reasoning for everyday life
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Dispersion
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
BCOR 1020 Business Statistics
Basic Business Statistics 10th Edition
Prepared by Lloyd R. Jaisingh
QBM117 Business Statistics
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Descriptive Statistics (Part 2)
Objectives 1.2 Describing distributions with numbers
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
Numerical Descriptive Techniques
Methods for Describing Sets of Data
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Business Statistics: Communicating with Numbers
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Review Measures of central tendency
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Chapter 2 Describing Data.
Describing distributions with numbers
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Measures of Position. ● The standard deviation is a measure of dispersion that uses the same dimensions as the data (remember the empirical rule) ● The.
Lecture 3 Describing Data Using Numerical Measures.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
Chapter 2 Descriptive Statistics Section 2.3 Measures of Variation Figure 2.31 Repair Times for Personal Computers at Two Service Centers  Figure 2.31.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
© 2010 Pearson Education, Inc. All rights reserved Data Analysis/Statistics: An Introduction Chapter 10.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 4 – Slide 1 of 23 Chapter 3 Section 4 Measures of Position.
1 Chapter 2 Bivariate Data A set of data that contains information on two variables. Multivariate A set of data that contains information on more than.
Chapter 3 Averages and Variation Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Chapter 6: Interpreting the Measures of Variability.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Honors Statistics Chapter 3 Measures of Variation.
Chapter 4 Measures of Central Tendency Measures of Variation Measures of Position Dot Plots Stem-and-Leaf Histograms.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
2.5: Numerical Measures of Variability (Spread)
Descriptive Statistics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
Chapter 3 Section 4 Measures of Position.
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

4B-1

Descriptive Statistics (Part 2) Standardized Data Standardized Data Percentiles and Quartiles Percentiles and Quartiles Box Plots Box Plots Chapter 4B4B McGraw-Hill/Irwin© 2008 The McGraw-Hill Companies, Inc. All rights reserved.

4B-3 For any population with mean  and standard deviation , the percentage of observations that lie within k standard deviations of the mean must be at least 100[1 – 1/k 2 ].For any population with mean  and standard deviation , the percentage of observations that lie within k standard deviations of the mean must be at least 100[1 – 1/k 2 ]. Developed by mathematicians Jules Bienaymé ( ) and Pafnuty Chebyshev ( ).Developed by mathematicians Jules Bienaymé ( ) and Pafnuty Chebyshev ( ). Standardized Data  Chebyshev’s Theorem

4B-4 For k = 2 standard deviations, 100[1 – 1/2 2 ] = 75%For k = 2 standard deviations, 100[1 – 1/2 2 ] = 75% So, at least 75.0% will lie within  + 2 So, at least 75.0% will lie within  + 2  For k = 3 standard deviations, 100[1 – 1/3 2 ] = 88.9%For k = 3 standard deviations, 100[1 – 1/3 2 ] = 88.9% So, at least 88.9% will lie within  + 3 So, at least 88.9% will lie within  + 3  Although applicable to any data set, these limits tend to be too wide to be useful.Although applicable to any data set, these limits tend to be too wide to be useful. Standardized Data  Chebyshev’s Theorem

4B-5 The Empirical Rule states that for data from a normal distribution, we expect that forThe Empirical Rule states that for data from a normal distribution, we expect that for The normal or Gaussian distribution was named for Karl Gauss ( ).The normal or Gaussian distribution was named for Karl Gauss ( ). The normal distribution is symmetric and is also known as the bell-shaped curve.The normal distribution is symmetric and is also known as the bell-shaped curve. k = 1 about 68.26% will lie within  + 1  k = 2 about 95.44% will lie within  + 2  k = 3 about 99.73% will lie within  + 3  Standardized Data  The Empirical Rule

4B-6 Note: no upper bound is given. Data values outside  + 3  are rare. Distance from the mean is measured in terms of the number of standard deviations.Distance from the mean is measured in terms of the number of standard deviations. Standardized Data  The Empirical Rule

4B-7 If 80 students take an exam, how many will score within 2 standard deviations of the mean?If 80 students take an exam, how many will score within 2 standard deviations of the mean? Assuming exam scores follow a normal distribution, the empirical rule statesAssuming exam scores follow a normal distribution, the empirical rule states about 95.44% will lie within  + 2  so 95.44% x 80  76 students will score + 2  from . How many students will score more than 2 standard deviations from the mean?How many students will score more than 2 standard deviations from the mean? Standardized Data  Example: Exam Scores

4B-8 Unusual observations are those that lie beyond  + 2 .Unusual observations are those that lie beyond  + 2 . Outliers are observations that lie beyond  + 3 .Outliers are observations that lie beyond  + 3 . Standardized Data  Unusual Observations

4B-9 For example, the P/E ratio data contains several large data values. Are they unusual or outliers?For example, the P/E ratio data contains several large data values. Are they unusual or outliers? Standardized Data  Unusual Observations

4B-10 If the sample came from a normal distribution, then the Empirical rule statesIf the sample came from a normal distribution, then the Empirical rule states = ± 1(14.08) = ± 2(14.08) = ± 3(14.08) Standardized Data  The Empirical Rule = (8.9, 38.8) = (-5.4, 50.9) = (-19.5, 65.0)

4B Standardized Data  The Empirical Rule Outliers Outliers Unusual Unusual Are there any unusual values or outliers?Are there any unusual values or outliers?

4B-12 A standardized variable (Z) redefines each observation in terms the number of standard deviations from the mean.A standardized variable (Z) redefines each observation in terms the number of standard deviations from the mean. Standardization formula for a population: Standardization formula for a sample: Standardized Data  Defining a Standardized Variable

4B-13 z i tells how far away the observation is from the mean.z i tells how far away the observation is from the mean. = 7 – = Standardized Data  Defining a Standardized Variable For example, for the P/E data, the first value x 1 = 7. The associated z value isFor example, for the P/E data, the first value x 1 = 7. The associated z value is

4B-14 = 91 – = 4.85 A negative z value means the observation is below the mean.A negative z value means the observation is below the mean. Standardized Data  Defining a Standardized Variable Positive z means the observation is above the mean. For x 68 = 91,Positive z means the observation is above the mean. For x 68 = 91,

4B-15 Here are the standardized z values for the P/E data:Here are the standardized z values for the P/E data: Standardized Data  Defining a Standardized Variable What do you conclude for these four values?What do you conclude for these four values?

4B-16 In Excel, use =STANDARDIZE(Array, Mean, STDev) to calculate a standardized z value.In Excel, use =STANDARDIZE(Array, Mean, STDev) to calculate a standardized z value. MegaStat calculates standardized values as well as checks for outliers.MegaStat calculates standardized values as well as checks for outliers. Standardized Data  Defining a Standardized Variable

4B-17 What do we do with outliers in a data set?What do we do with outliers in a data set? If due to erroneous data, then discard.If due to erroneous data, then discard. An outrageous observation (one completely outside of an expected range) is certainly invalid.An outrageous observation (one completely outside of an expected range) is certainly invalid. Recognize unusual data points and outliers and their potential impact on your study.Recognize unusual data points and outliers and their potential impact on your study. Research books and articles on how to handle outliers.Research books and articles on how to handle outliers. Standardized Data  Outliers

4B-18 For a normal distribution, the range of values is 6  (from  – 3  to  + 3  ).For a normal distribution, the range of values is 6  (from  – 3  to  + 3  ). If you know the range R (high – low), you can estimate the standard deviation as  = R/6.If you know the range R (high – low), you can estimate the standard deviation as  = R/6. Useful for approximating the standard deviation when only R is known.Useful for approximating the standard deviation when only R is known. This estimate depends on the assumption of normality.This estimate depends on the assumption of normality. Standardized Data  Estimating Sigma

4B-19 Percentiles are data that have been divided into 100 groups.Percentiles are data that have been divided into 100 groups. For example, you score in the 83 rd percentile on a standardized test. That means that 83% of the test-takers scored below you.For example, you score in the 83 rd percentile on a standardized test. That means that 83% of the test-takers scored below you. Deciles are data that have been divided into 10 groups.Deciles are data that have been divided into 10 groups. Quintiles are data that have been divided into 5 groups.Quintiles are data that have been divided into 5 groups. Quartiles are data that have been divided into 4 groups.Quartiles are data that have been divided into 4 groups. Percentiles and Quartiles  Percentiles

4B-20 Percentiles are used to establish benchmarks for comparison purposes (e.g., health care, manufacturing and banking industries use 5, 25, 50, 75 and 90 percentiles).Percentiles are used to establish benchmarks for comparison purposes (e.g., health care, manufacturing and banking industries use 5, 25, 50, 75 and 90 percentiles). Quartiles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios.Quartiles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios. Percentiles are used in employee merit evaluation and salary benchmarking.Percentiles are used in employee merit evaluation and salary benchmarking. Percentiles and Quartiles  Percentiles

4B-21 Quartiles are scale points that divide the sorted data into four groups of approximately equal size.Quartiles are scale points that divide the sorted data into four groups of approximately equal size. The three values that separate the four groups are called Q 1, Q 2, and Q 3, respectively.The three values that separate the four groups are called Q 1, Q 2, and Q 3, respectively. Q1Q1 Q2Q2 Q3Q3  Lower 25%  |  Second 25%  |  Third 25%  |  Upper 25%  Percentiles and Quartiles  Quartiles

4B-22 The second quartile Q 2 is the median, an important indicator of central tendency.The second quartile Q 2 is the median, an important indicator of central tendency. Q 1 and Q 3 measure dispersion since the interquartile range Q 3 – Q 1 measures the degree of spread in the middle 50 percent of data values.Q 1 and Q 3 measure dispersion since the interquartile range Q 3 – Q 1 measures the degree of spread in the middle 50 percent of data values. Q2Q2Q2Q2  Lower 50%  |  Upper 50%  Q1Q1Q1Q1 Q3Q3Q3Q3  Lower 25%  |  Middle 50%  |  Upper 25%  Percentiles and Quartiles  Quartiles

4B-23 The first quartile Q 1 is the median of the data values below Q 2, and the third quartile Q 3 is the median of the data values above Q 2.The first quartile Q 1 is the median of the data values below Q 2, and the third quartile Q 3 is the median of the data values above Q 2. Q1Q1Q1Q1 Q2Q2Q2Q2 Q3Q3Q3Q3  Lower 25%  |  Second 25%  |  Third 25%  |  Upper 25%  For first half of data, 50% above, 50% below Q 1. For second half of data, 50% above, 50% below Q 3. Percentiles and Quartiles  Quartiles

4B-24 Depending on n, the quartiles Q 1,Q 2, and Q 3 may be members of the data set or may lie between two of the sorted data values.Depending on n, the quartiles Q 1,Q 2, and Q 3 may be members of the data set or may lie between two of the sorted data values. Percentiles and Quartiles  Quartiles

4B-25 For small data sets, find quartiles using method of medians:For small data sets, find quartiles using method of medians: Step 1. Sort the observations. Step 2. Find the median Q 2. Step 3. Find the median of the data values that lie below Q 2. Step 4. Find the median of the data values that lie above Q 2. Percentiles and Quartiles  Method of Medians

4B-26 Use Excel function =QUARTILE(Array, k) to return the kth quartile.Use Excel function =QUARTILE(Array, k) to return the kth quartile. =QUARTILE(Array, 3) =PERCENTILE(Array, 75) Excel treats quartiles as a special case of percentiles. For example, to calculate Q 3Excel treats quartiles as a special case of percentiles. For example, to calculate Q 3 Excel calculates the quartile positions as:Excel calculates the quartile positions as: Position of Q n Position of Q n Position of Q n Percentiles and Quartiles  Excel Quartiles

4B-27 Consider the following P/E ratios for 68 stocks in a portfolio.Consider the following P/E ratios for 68 stocks in a portfolio. Use quartiles to define benchmarks for stocks that are low-priced (bottom quartile) or high-priced (top quartile).Use quartiles to define benchmarks for stocks that are low-priced (bottom quartile) or high-priced (top quartile) Percentiles and Quartiles  Example: P/E Ratios and Quartiles

4B-28 Using Excel’s method of interpolation, the quartile positions are:Using Excel’s method of interpolation, the quartile positions are: Quartile Position Formula Interpolate Between Q1Q1Q1Q1 = 0.25(68) = X 17 + X 18 Percentiles and Quartiles  Example: P/E Ratios and Quartiles Q2Q2Q2Q2 = 0.50(68) = X 34 + X 35 Q3Q3Q3Q3 = 0.75(68) = X 51 + X 52

4B-29 The quartiles are:The quartiles are: QuartileFormula First (Q 1 ) Q 1 = X (X 18 -X 17 ) = (14-14) = 14 Percentiles and Quartiles  Example: P/E Ratios and Quartiles Second (Q 2 ) Q 2 = X (X 35 -X 34 ) = (19-19) = 19 Third (Q 3 ) Q 3 = X (X 52 -X 51 ) = (26-26) = 26

4B-30 So, to summarize:So, to summarize: These quartiles express central tendency and dispersion. What is the interquartile range?These quartiles express central tendency and dispersion. What is the interquartile range? Q1Q1Q1Q1 Q2Q2Q2Q2 Q3Q3Q3Q3  Lower 25%  of P/E Ratios 14  Second 25%  of P/E Ratios 19  Third 25%  of P/E Ratios 26  Upper 25%  of P/E Ratios Because of clustering of identical data values, these quartiles do not provide clean cut points between groups of observations.Because of clustering of identical data values, these quartiles do not provide clean cut points between groups of observations. Percentiles and Quartiles  Example: P/E Ratios and Quartiles

4B-31 Whether you use the method of medians or Excel, your quartiles will be about the same. Small differences in calculation techniques typically do not lead to different conclusions in business applications. Percentiles and Quartiles  Tip

4B-32 Quartiles generally resist outliers.Quartiles generally resist outliers. However, quartiles do not provide clean cut points in the sorted data, especially in small samples with repeating data values.However, quartiles do not provide clean cut points in the sorted data, especially in small samples with repeating data values. Data set A: 1, 2, 4, 4, 8, 8, 8, 8 Q 1 = 3, Q 2 = 6, Q 3 = 8 Data set B: 0, 3, 3, 6, 6, 6, 10, 15 Q 1 = 3, Q 2 = 6, Q 3 = 8 Although they have identical quartiles, these two data sets are not similar. The quartiles do not represent either data set well.Although they have identical quartiles, these two data sets are not similar. The quartiles do not represent either data set well. Percentiles and Quartiles  Caution

4B-33 Some robust measures of central tendency and dispersion using quartiles are:Some robust measures of central tendency and dispersion using quartiles are: StatisticFormulaExcelProCon Midhinge =0.5*(QUARTILE (Data,1)+QUARTILE (Data,3)) Robust to presence of extreme data values. Less familiar to most people. Percentiles and Quartiles  Dispersion Using Quartiles

4B-34 StatisticFormulaExcelProCon Midspread Q 3 – Q 1 =QUARTILE(Data,3)- QUARTILE(Data,1) Stable when extreme data values exist. Ignores magnitude of extreme data values. Percentiles and Quartiles  Dispersion Using Quartiles Coefficient of quartile variation (CQV) None Relative variation in percent so we can compare data sets. Less familiar to non- statisticians

4B-35 The mean of the first and third quartiles.The mean of the first and third quartiles. For the 68 P/E ratios,For the 68 P/E ratios, Midhinge = A robust measure of central tendency since quartiles ignore extreme values.A robust measure of central tendency since quartiles ignore extreme values. Percentiles and Quartiles  Midhinge

4B-36 A robust measure of dispersionA robust measure of dispersion For the 68 P/E ratios,For the 68 P/E ratios, Midspread = Q 3 – Q 1 Midspread = Q 3 – Q 1 = 26 – 14 = 12 Percentiles and Quartiles  Midspread (Interquartile Range)

4B-37 Measures relative dispersion, expresses the midspread as a percent of the midhinge.Measures relative dispersion, expresses the midspread as a percent of the midhinge. For the 68 P/E ratios,For the 68 P/E ratios, Similar to the CV, CQV can be used to compare data sets measured in different units or with different means.Similar to the CV, CQV can be used to compare data sets measured in different units or with different means. Percentiles and Quartiles  Coefficient of Quartile Variation (CQV)

4B-38 A useful tool of exploratory data analysis (EDA).A useful tool of exploratory data analysis (EDA). Also called a box-and-whisker plot.Also called a box-and-whisker plot. Based on a five-number summary:Based on a five-number summary: X min, Q 1, Q 2, Q 3, X max Consider the five-number summary for the 68 P/E ratios:Consider the five-number summary for the 68 P/E ratios: X min, Q 1, Q 2, Q 3, X max Box Plots

4B-39 Minimum Median (Q 2 ) Maximum Q1Q1Q1Q1 Q3Q3Q3Q3 Box Whiskers Right-skewed Center of Box is Midhinge Box Plots

4B-40 Use quartiles to detect unusual data points.Use quartiles to detect unusual data points. These points are called fences and can be found using the following formulas:These points are called fences and can be found using the following formulas: Inner fences Outer fences: Lower fence Q 1 – 1.5 (Q 3 –Q 1 ) Q 1 – 3.0 (Q 3 –Q 1 ) Upper fence Q (Q 3 –Q 1 ) Q (Q 3 –Q 1 ) Values outside the inner fences are unusual while those outside the outer fences are outliers.Values outside the inner fences are unusual while those outside the outer fences are outliers. Box Plots  Fences and Unusual Data Values

4B-41 For example, consider the P/E ratio data:For example, consider the P/E ratio data: Ignore the lower fence since it is negative and P/E ratios are only positive.Ignore the lower fence since it is negative and P/E ratios are only positive. Inner fences Outer fences: Lower fence: 14 – 1.5 (26–14) =  4 14 – 3.0 (26–14) =  22 Upper fence: (26–14) = (26–14) = +62 Box Plots  Fences and Unusual Data Values

4B-42 Truncate the whisker at the fences and display unusual values and outliers as dots.Truncate the whisker at the fences and display unusual values and outliers as dots. Inner Fence Outer Fence UnusualOutliers Box Plots  Fences and Unusual Data Values Based on these fences, there are three unusual P/E values and two outliers.Based on these fences, there are three unusual P/E values and two outliers.

4B-43 Although some information is lost, grouped data are easier to display than raw data.Although some information is lost, grouped data are easier to display than raw data. When bin limits are given, the mean and standard deviation can be estimated.When bin limits are given, the mean and standard deviation can be estimated. Accuracy of grouped estimates depend on - the number of bins - distribution of data within bins - bin frequenciesAccuracy of grouped estimates depend on - the number of bins - distribution of data within bins - bin frequencies Grouped Data  Nature of Grouped Data