Chapter 4 Describing Data (Ⅱ ) Numerical Measures

Slides:



Advertisements
Similar presentations
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Advertisements

Calculating & Reporting Healthcare Statistics
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Analysis of Economic Data
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Slides by JOHN LOUCKS St. Edward’s University.
Chapter Two Descriptive Statistics McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Describing Data: Numerical
Numerical Descriptive Techniques
1 Descriptive Statistics: Numerical Methods Chapter 4.
Review of Measures of Central Tendency, Dispersion & Association
1 Tendencia central y dispersión de una distribución.
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
Business Statistics: Communicating with Numbers
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
4 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Descriptive Statistics: Numerical Methods
Review of Measures of Central Tendency, Dispersion & Association
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
Describing Data Descriptive Statistics: Central Tendency and Variation.
Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Describing Data: Numerical Measures
Descriptive Statistics ( )
Descriptive Statistics Measures of Variation
Measures of Dispersion
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Techniques
Descriptive Statistics: Numerical Methods
Ch 4 實習.
Chapter 3 Created by Bethany Stubbe and Stephan Kogitz.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Numerical Measures: Centrality and Variability
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Characteristics of the Mean
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Keller: Stats for Mgmt & Econ, 7th Ed
Descriptive Statistics
Descriptive Statistics: Numerical Methods
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Data with Numerical Measures
Numerical Descriptive Measures
Numerical Descriptive Measures
Numerical Descriptive Statistics
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
St. Edward’s University
Business and Economics 7th Edition
Numerical Descriptive Measures
NUMERICAL DESCRIPTIVE MEASURES
Presentation transcript:

Chapter 4 Describing Data (Ⅱ ) Numerical Measures Mean/Average Indicator Deviation Indicator

Dispersion refers to the spread or variability in the data. Measures of dispersion include the following: range, mean deviation, variance, and standard deviation.

Deviation Indicator Measures of variability Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? or How much spread out are the measurements about the average value?

Observe two hypothetical data sets Low variability data set The average value provides a good representation of the values in the data set. High variability data set This is the previous data set. It is now changing to... The same average value does not provide as good presentation of the values in the data set as before.

But, how do all the measurements spread out? 1. The range The range of a set of measurements is the difference between the largest and smallest measurements. Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. But, how do all the measurements spread out? ? ? ? Smallest measurement Largest measurement Range

The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio. Highest value: 22.1 Lowest value: -8.1 Range = Highest value – lowest value = 22.1-(-8.1) = 30.2

2. Mean deviation Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean.

The main features of the mean deviation are: All values are used in the calculation. It is not unduly influenced by large or small values. Generally, the absolute values are difficult to work with. The weights of a sample of crates containing books for the bookstore (in pounds ) are: 103, 97, 101, 106, 103 Find the mean deviation. X = 102

Self-review The weights of containers being shipped to Hongkong are (in thousand of pounds): 95 103 110 104 105 112 90 What is the range of the weights? Compute the arithmetic mean weight. Compute the mean deviation of the weights. Solution: 22 thousands of pounds, found by 112-90 103 thousands of pounds MD= 46/8=5.75 thousands of pounds

3. The variance The variance is the arithmetic mean of the squared deviations from the mean. This measure of dispersion reflects the values of all the measurements. The variance of a population of N measurements x1, x2,…, xN, having a mean The variance of a sample n measurements x1, x2,…,xN having a mean

The major characteristics of the Population Variance are: Not influenced by extreme values. The units are awkward, the square of the original units. All values are used in the calculation.

A B Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16 9-10= -1 11-10= 1 8-10=-2 12-10= +2 The sum of squared deviations is used in calculating the variance. sum= 0 A The sum of deviations is zero in both cases, therefore, another measure is needed. 8 9 10 11 12 …but measurements in B are much more dispersed then those in A. The mean of both populations is 10... 4-10 = - 6 16-10 = 6 B 7-10 =-3 4 7 10 13 16 4-10 = -6 13-10 = 3 sum= 0

A B The sum of squared deviations is used in calculating the variance. 9-10= -1 The sum of squared deviations is used in calculating the variance. See example next. 11-10= +1 8-10= -2 12-10= +2 Sum = 0 The sum of deviations is zero in both cases, therefore, another measure is needed. A 8 9 10 11 12 4-10 = - 6 16-10 = +6 B 7-10 = -3 4 7 10 13 16 13-10 = +3

Let us calculate the variance of the two populations Why not use the sum of squared deviations as a measure to compare dispersion of data sets instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!

e.g. xi 46 44 2 4 54 44 10 100 42 44 -2 4 46 44 2 4 32 44 -12 144

4. The standard deviation It is the square root of the variance of the measurements. Sample Standard Deviation Population Standard Deviation

Example Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05 Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4 Solution Let us use the Excel printout that is run from the “Descriptive statistics” sub-menu (use file Xm04-10)

Fund A should be considered riskier because its standard deviation is larger

where k is any constant greater than 1. Chebyshev’s theorem: For a symmetrical, bell-shaped distribution, the proportion of the values that lie within k standard deviations of the mean is at least: where k is any constant greater than 1. Example: In a symmetrical, bell-shaped score distribution, the arithmetic mean is 71.54 and the standard deviation 7.51. At least what percent of the scores lie within plus 3.5 standard deviation and minus 3.5 standard deviations of the mean Solution: About 92%, found by

Empirical Rule: For any symmetrical, bell-shaped distribution: About 68% of the observations will lie within 1s the mean About 95% of the observations will lie within 2s of the mean Virtually (99.7%) all the observations will be within 3s of the mean

Interpreting Standard Deviation (1)The standard deviation can be used to compare the variability of several distributions make a statement about the general shape of a distribution. (2)The empirical rule

68% 95% 99.7% m-3s m-2s m-1s m m+1s m+2s m+ 3s

First check if the histogram has an approximate mound-shape Example The duration of 30 long-distance telephone calls are shown next. Check the empirical rule for the this set of measurements. Solution First check if the histogram has an approximate mound-shape

Mean = 10.26; Standard deviation = 4.29. Calculate the mean and the standard deviation: Mean = 10.26; Standard deviation = 4.29. Calculate the intervals: Interval Empirical Rule Actual percentage 5.97, 14.55 68% 70% 1.68, 18.84 95% 96.7% -2.61, 23.13 100% 100%

Other conclusions By the empirical rule, approximately 95% of the area under a mound-shaped histogram lies between 95% of the area Since about 95% of all the measurements fall within two standard deviation around the mean For the telephone calls duration problem the range is 19.5-2.3=17.2 minutes.

Self-review The Pitney Pipe Company is one of several domestic manufacturers of PVC pipe. The quality control department sampled 600 10-foot lengths. At a point 1 foot from the end of the pipe they measured the outside diameter. The mean was 14.0 inches and the standard deviation 0.1 inches. If the mound-shape of the distribution is not unknown, at least what percent of the observations will between 13.85 inches and 14.15 inches? If we assume that the distribution of diameter is symmetrical and bell-shaped, about 95% of the observations will between what two values? Solution: (1) (2) 13.9 and 14.2, found by

Self-review The weights of the contents of several small aspirin sample bottles are (in grams): 4, 2, 5, 4, 5, 2, and 6. What is the sample variance? The room rate for a sample of 10 motels are (in $): 101, 97, 103, 110, 78, 87, 101, 80, 106, 88. What is the sample variance? Solution: 2.33, found by 2. 123.66, found by

Can we say that the standard deviation of US $120 for a distribution of annual incomes is greater than the standard distribution of 4.5 days for a distribution of absence from work? Can we say that the annual income distribution of the top executives with the standard deviation US $1200 is more dispersed than that of the unskilled employees with the standard deviation US $ 120. The data are in different units (such as dollars and days absent) 2.The data in the same units, but the means are far apart (such as the the incomes of the top executives and those of the unskilled employees)

Further qualities of Variance: Since

refers to the variance within a class refers to the variance among the classes Example: Class Value Deviation Squared D Mean A 0.6 -0.1 0.01 0.7 0.005 0.8 0.1 To be continued

Continued B 1.6 0.092 1.1 -0.5 0.25 1.5 -0.1 0.01 1.8 0.2 0.04 2.0 0.4 0.16 C 3.0 -1 1 4 1.167 3.5 -0.5 0.25 5.5 1.5 2.25

5. The coefficient of variation ( Relative dispersion ) The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. This coefficient provides a proportionate measure of variation. The coefficient of Range= The coefficient of Mean deviation=

The coefficient of Standard variance The coefficient of population standard variance The coefficient of sample standard variance A standard deviation of 10 may be perceived as large when the mean value is 100, but only Moderately large when the mean value is 500

Z scores Time ( in minutes ) Z score 39 -0.09 29 -1.57 43 0.50 …… Mean Z score is an extreme value or outlier located far away from the mean. It is useful in identifying the extreme value. The larger the Z score, the farther the distance from the value to the mean. It is the deviation divided by the standard deviation. Example: Time ( in minutes ) Z score 39 -0.09 29 -1.57 43 0.50 …… Mean 39.6 S 6.77

Self-review The variation in the annual incomes of executives in Nash Inc. is to be compared with the variation if incomes of unskilled employees. For a sample of executives, and .For a sample of unskilled employees, the annual income and .Compare the relative dispersion in the two groups. Solution: There is no difference in the relative dispersion of the two groups.

Decile/ Proportion N1—— Yes( have some kind of attribute or property) Proportion:The fraction, ratio, or percent indicating the part of the sample or the population having a particular trait of interest. N1—— Yes( have some kind of attribute or property) N2—— No( do not some kind of attribute or property) N= N1+ N2 P= , Q=

Quartiles

Quartiles

n is the number of the observations. Quartiles Lp = (n+1) n is the number of the observations.

consecutive days for a major publicly traded company Stock prices on twelve consecutive days for a major publicly traded company

Quartile 3 Median Quartile 1 Using the twelve stock prices, we can find the median, 25th, and 75th percentiles as follows: Quartile 3 Median Quartile 1

75th percentile Price at 9.75 observation = 88 + .75(91-88) = 90.25 12 11 10 9 8 7 6 5 4 3 2 1 96 92 91 88 86 85 84 83 82 79 78 69 Q4 Q3 50th percentile: Median Price at 6.50 observation = 85 + .5(85-84) = 84.50 Q2 25th percentile Price at 3.25 observation = 79 + .25(82-79) = 79.75 Q1

This distance will include the middle 50 percent of the observations. The Interquartile range is the distance between the third quartile Q3 and the first quartile Q1. This distance will include the middle 50 percent of the observations. Interquartile range = Q3 - Q1

For a set of observations the third quartile is 24 and the first quartile is 10. What is the quartile deviation? The interquartile range is 24 - 10 = 14. Fifty percent of the observations will occur between 10 and 24.

Self-review 1. The quality control department of the Plainsville Peanut Company is responsible for checking the weight of the 8-ounce jar of peanut butter. The weight of a sample of nine jars produced last hours are: 7.69 7.72 7.8 7.86 7.90 7.94 7.97 8.06 8.09 What is the median weight? Determine the weight corresponding to the first and the third quartile. Determine the interquartile range. Solution: 7.9 Q1=7.76 Q3=8.015 Q3 -Q1=0.225