2.5: Numerical Measures of Variability (Spread)

Slides:

Advertisements

Similar presentations

Describing Quantitative Variables

Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.

Descriptive Statistics

Copyright ©2003 Brooks/Cole A division of Thomson Learning, Inc. Definitions variableA variable is a characteristic that changes or varies over time and/or.

B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.

Numerically Summarizing Data

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.

Chapter 2 Describing Data with Numerical Measurements

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.

Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.

Numerical Descriptive Techniques

Methods for Describing Sets of Data

2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,

Chapter 2: Methods for Describing Sets of Data

Review Measures of central tendency

What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.

Describing distributions with numbers

1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.

Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 4 – Slide 1 of 23 Chapter 3 Section 4 Measures of Position.

Summary Statistics: Measures of Location and Dispersion.

Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.

Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.

Chapter 4 – Statistics II

Exploratory Data Analysis

Methods for Describing Sets of Data

Math 201: Chapter 2 Sections 3,4,5,6,7,9.

Business and Economics 6th Edition

Chapter 1: Exploring Data

Measures of Dispersion

Chapter 3 Describing Data Using Numerical Measures

CHAPTER 2: Describing Distributions with Numbers

Chapter 2: Methods for Describing Data Sets

Descriptive Statistics: Numerical Methods

Chapter 6 ENGR 201: Statistics for Engineers

CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.

NUMERICAL DESCRIPTIVE MEASURES

Descriptive Statistics

Chapter 3 Describing Data Using Numerical Measures

CHAPTER 1 Exploring Data

Numerical Descriptive Measures

CHAPTER 1 Exploring Data

Descriptive Statistics

Numerical Measures: Skewness and Location

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

Describing Data with Numerical Measures

Chapter 3 Section 4 Measures of Position.

Quartile Measures DCOVA

Chapter 1: Exploring Data

CHAPTER 2: Describing Distributions with Numbers

Chapter 1: Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Chapter 1: Exploring Data

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

Chapter 1: Exploring Data

CHAPTER 1 Exploring Data

Chapter 1: Exploring Data

Business and Economics 7th Edition

Chapter 1: Exploring Data

NUMERICAL DESCRIPTIVE MEASURES

Presentation transcript:

2.5: Numerical Measures of Variability (Spread) The mean, median and mode give us an idea of the central tendency, or where the “middle” of the data is. Variability gives us an idea of how spread out the data are around that middle. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.4: Numerical Measures of Variability Range Variance Standard Deviation (SD) IQR (interquartile range) McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Range The range is equal to the largest value minus the smallest value in the data set. Easy to compute, but not very informative. Considers only two observations (the smallest and largest). McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Range Ex: Data (Number of Vacation Days in a Company): 22, 17, 15, 16, 14, 20, 25, 11, 26, 14, 23, 21, 13, 15, 15, 28, 30, 20, 14, 33, 27, 28, 15, 22, 16, 12, 25, 31, 19, 23, 26, 21, 11, 18, 17 Range=largest-smallest=33-11=22 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Sample Variance The sample variance, s2, for a sample of n measurements is equal to the sum of the squared distances from the mean, divided by (n – 1). McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Sample Standard Deviation (SD) The sample standard deviation, s, for a sample of n measurements is equal to the square root of the sample variance. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Example 1: Say a small data set consists of the measurements 1, 3, 5, and 3. (when we have few numbers definition and calculation formula take about the same time, but for data with many numbers calculation formula is easier to use!) McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Example 2: Ex: Data (Number of Vacation Days in a Company): 22, 17, 15, 16, 14, 20, 25, 11, 26, 14, 23, 21, 13, 15, 15, 28, 30, 20, 14, 33, 27, 28, 15, 22, 16, 12, 25, 31, 19, 23, 26, 21, 11, 18, 17 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.5: Numerical Measures of Variability As before, Greek letters are used for populations and Roman letters for samples: s2 = sample variance s = sample standard deviation s2 = population variance s = population standard deviation McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.6: Interpreting the Standard Deviation Chebyshev’s Rule The Empirical Rule Both tell us something about where the data will be relative to the mean. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Chebyshev’s Rule Chebyshev’s Rule k k2 1/ k2 2 4 .25 75% 3 9 .11 89% Valid for any data set For any number k >1, at least (1-1/k2) ×100% of the observations will lie within k standard deviations of the mean. k k2 1/ k2 (1- 1/ k2 ) ×100% 2 4 .25 75% 3 9 .11 89% 16 .0625 93.75% McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Chebyshev’s Rule Ex: Data (Number of Vacation Days in a Company): 22, 17, 15, 16, 14, 20, 25, 11, 26, 14, 23, 21, 13, 15, 15, 28, 30, 20, 14, 33, 27, 28, 15, 22, 16, 12, 25, 31, 19, 23, 26, 21, 11, 18, 17 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Chebyshev’s Rule McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule For a perfectly symmetrical and bell-shaped distribution, ~68% will be within the range ~95% will be within the range ~99.7% will be within the range The Empirical Rule Useful for bell-shaped, symmetrical distributions If not perfectly bell-shaped and symmetrical, the values are approximations. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule Ex: Data (Number of Vacation Days in a Company): 22, 17, 15, 16, 14, 20, 25, 11, 26, 14, 23, 21, 13, 15, 15, 28, 30, 20, 14, 33, 27, 28, 15, 22, 16, 12, 25, 31, 19, 23, 26, 21, 11, 18, 17 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule (Example 3) Hummingbirds beat their wings in flight an average of 55 times per second. Assume the standard deviation is 10, and that the distribution is symmetrical and bell-shaped. Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? Between 55 and 65? Less than 45? McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule (Example 3) Since 45 and 65 are exactly one standard deviation below and above the mean, the empirical rule says that about 68% of the hummingbirds will be in this range. Hummingbirds beat their wings in flight an average of 55 times per second. Assume the standard deviation is 10, and that the distribution is symmetrical and bell-shaped. Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? Between 55 and 65? Less than 45? McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule (Example 3) Hummingbirds beat their wings in flight an average of 55 times per second. Assume the standard deviation is 10, and that the distribution is symmetrical and bell-shaped. Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? Between 55 and 65? Less than 45? This range of numbers is from the mean to one standard deviation above it, or one-half of the range in the previous question. So, about one-half of 68%, or 34%, of the hummingbirds will be in this range. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Empirical Rule (Example 3) Hummingbirds beat their wings in flight an average of 55 times per second. Assume the standard deviation is 10, and that the distribution is symmetrical and bell-shaped. Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? Between 55 and 65? Less than 45? Half of the entire data set lies above the mean, and ~34% lie between 45 and 55 (between one standard deviation below the mean and the mean), so by symmetry ~34% lie between 45 and 55, which means ~16% are below 45. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.7: Numerical Measures of Relative Standing:Percentiles Percentiles: for any (large) set of n measurements (arranged in ascending order), the 100×pth percentile is a number such that at least 100p% of the measurements fall at or below that number and at least 100(1 – p)% fall at or above it. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Percentiles Finding percentiles is similar to finding the median – the median is the 50th percentile. If you are in the 50th percentile for the GRE, half of the test-takers scored like you or better and half scored like you or worse. If you are in the 75th percentile, three-quarters of the test-takers scored like you or worse. If you are in the 90th percentile, only 10% of all the test-takers scored like you or better. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Finding Percentiles 1. Order the data from smallest to largest 2. Find k=n×p. (a) If k is an integer, 100×pth percentile is the average of the kth and (k+1)th values. (b) If k is not an integer, round it up to the next integer, say q. Then 100×pth percentile is the qth value. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Finding Percentiles Ex: Data (Number of Vacation Days in a Company): Ordered: 11, 11, 12, 13, 14, 14, 14, 15, 15, 15, 15, 16, 16, 17, 17, 18, 19, 20, 20, 21, 21, 22, 22, 23, 23, 25, 25, 26, 26, 27, 28, 28, 30, 31, 33 (i) Find the 80th percentile. k=n×p=35×0.80=28 is an integer, so 80th percentile is the average of 28th and 29th values, i.e., 80th percentile = (26+26)/2=26 Check: 29 out of 35 values (i.e., ~83% of the data) are ≤26 (which satisfies at least 80%!) 8 out of 35 values (i.e., ~23% of the data) are ≥26 (which satisfies at least 20%!) McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Finding Percentiles (ii) Find the 25th percentile (first quartile or lower quartile, QL). k=n×p=35×0.25=8.75 is not an integer, so 25th percentile is 9th value, i.e., 25th percentile = 15 (ii) Find the 75th percentile (third quartile or upper quartile, QU). k=n×p=35×0.75=26.25 is not an integer, so 75th percentile is 27th value, i.e., 75th percentile = 25 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Interquartile Range (IQR) IQR is another measure of spread or variability! It is the difference between third quartile and first quartile, That is, IQR=75th percentile minus 25th percentile= QU -QL Ex: in the number of vacation days of 35 employees IQR=25-15=10 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.7: Numerical Measures of Relative Standing: z-score The z-score tells us how many standard deviations above or below the mean a particular measurement is. Sample z-score Population z-score McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.6: Interpreting the Standard Deviation Hummingbirds beat their wings in flight an average of 55 times per second. Assume the standard deviation is 10, and that the distribution is symmetrical and bell-shaped. An individual hummingbird is measured with 75 beats per second. What is this bird’s z-score? McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.6: Interpreting the Standard Deviation Since ~95% of all the measurements will be within 2 standard deviations of the mean, only ~5% will be more than 2 standard deviations from the mean. About half of this 5% will be far below the mean, leaving only about 2.5% of the measurements at least 2 standard deviations above the mean. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.7: Numerical Measures of Relative Standing Z scores are related to the empirical rule: For a perfectly symmetrical and bell-shaped distribution, ~68 % will have z-scores between -1 and 1 ~95 % will have z-scores between -2 and 2 ~99.7% will have z-scores between -3 and 3 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers An outlier is a measurement that is unusually large or small relative to the other values. Three possible causes: Observation, recording or data entry error Item is from a different population A rare, chance event McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers The box plot is a graph representing information about certain percentiles for a data set and can be used to identify outliers McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers Lower Quartile (QL) Median Upper Quartile (QU) Minimum Value inside the inner fence Maximum Value inside the inner fence McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers Interquartile Range (IQR) = QU - QL McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers Right Outer Fence = QU + 3(IQR) Right Inner Fence = QU + 1.5(IQR) Left Inner Fence = QL - 1.5(IQR) and Left Outer Fence = QL – 3(IQR) McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers Outliers and z-scores The chance that a z-score is between -3 and +3 is over 99%. Any measurement with |z| > 3 is considered an outlier. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.8: Methods for Determining Outliers Outliers and z-scores Here are the descriptive statistics for the games won at the All-Star break, except one team had its total wins for 2006 recorded. That team, with 104 wins recorded, had a z-score of (104-45.68)/12.11 = 4.82. That’s a very unlikely result, which isn’t surprising given what we know about the observation. # of Wins n = 30 Mean 45.68 Sample Variance 146.69 Sample Standard Deviation 12.11 Minimum 25 Maximum 104 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Robustness to Outliers Ex: Consider the data: 2, 7, 5, 6, 4, 2, 5, 1, 5, 6 Mean=4.3, Median=5, Mode=5 Range=6, Variance=4.01, SD=2.0, IQR=3.25 Ex: what if data were 2, 7, 5, 6, 4, 2, 5, 1, 5, 100, then Mean=13.7, Median=5, Mode=5 Range=99, Variance=923.12, SD=30.38, IQR=3.25 McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

Robustness to Outliers Hence, mean, range, variance, and standard deviation are highly affected by the outliers (or extreme values) While, median, mode, and IQR are not affected by the outliers, i.e., they are robust to outliers. McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.9: Graphing Bivariate Relationships Scattergram (or scatterplot) shows the relationship between two quantitative variables McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.9: Graphing Bivariate Relationships If there is no linear relationship between the variables, the scatterplot may look like a cloud, a horizontal line or a more complex curve Source: Quantitative Environmental Learning Project http://www.seattlecentral.org/qelp/index.html McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data

2.10: Distorting the Truth with Deceptive Statistics Distortions Stretching the axis (and the truth) Is average center? Mean, median or mode? Is average relevant? What about the spread? McClave, Statistics , 11th ed. Chapter 2: Methods for Describing Sets of Data