1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.

Slides:



Advertisements
Similar presentations
St. Edward’s University
Advertisements

1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edwards University.
Chapter 3 - Part A Descriptive Statistics: Numerical Methods
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Descriptive Statistics
Descriptive Statistics: Numerical Measures
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Slides by JOHN LOUCKS St. Edward’s University.
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2001 South-Western/Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
1 1 Slide © 2001 South-Western /Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
Business Statistics: Communicating with Numbers
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part A n Measures of.
1 1 Slide Descriptive Statistics: Numerical Measures Location and Variability Chapter 3 BA 201.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Descriptive Statistics: Numerical Methods
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western /Thomson Learning.
1 1 Slide © 2003 South-Western/Thomson Learning TM Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Variability n Measures of Relative.
Chapter 3 Descriptive Statistics: Numerical Methods.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part B n Measures of.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
St. Edward’s University
Chapter 3 Descriptive Statistics: Numerical Measures Part A
St. Edward’s University
St. Edward’s University
Descriptive Statistics
St. Edward’s University
Essentials of Statistics for Business and Economics (8e)
St. Edward’s University
Business and Economics 7th Edition
Econ 3790: Business and Economics Statistics
Presentation transcript:

1 1 Slide © 2007 Thomson South-Western. All Rights Reserved

2 2 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 3 Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Measures of Association Between Two Variables n Weighted Mean

3 3 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Location If the measures are computed for data from a sample, for data from a sample, they are called sample statistics. If the measures are computed for data from a population, for data from a population, they are called population parameters. A sample statistic is referred to as the point estimator of the corresponding population parameter. n Mean n Median n Mode n Percentiles n Quartiles

4 4 Slide © 2007 Thomson South-Western. All Rights Reserved Mean n The mean of a data set is the average of all the data values. The sample mean is the point estimator of the population mean . The sample mean is the point estimator of the population mean .

5 5 Slide © 2007 Thomson South-Western. All Rights Reserved Sample Mean Number of observations in the sample Number of observations in the sample Sum of the values of the n observations Sum of the values of the n observations

6 6 Slide © 2007 Thomson South-Western. All Rights Reserved Population Mean  Number of observations in the population Number of observations in the population Sum of the values of the N observations Sum of the values of the N observations

7 7 Slide © 2007 Thomson South-Western. All Rights Reserved Median Whenever a data set has extreme values, the median Whenever a data set has extreme values, the median is the preferred measure of central location. is the preferred measure of central location. A few extremely large incomes or property values A few extremely large incomes or property values can inflate the mean. can inflate the mean. The median is the measure of location most often The median is the measure of location most often reported for annual income and property value data. reported for annual income and property value data. The median of a data set is the value in the middle The median of a data set is the value in the middle when the data items are arranged in ascending order. when the data items are arranged in ascending order.

8 8 Slide © 2007 Thomson South-Western. All Rights Reserved Median For an odd number of observations: For an odd number of observations: in ascending order observations the median is the middle value. Median = 26

9 9 Slide © 2007 Thomson South-Western. All Rights Reserved 28 Median For an even number of observations: For an even number of observations: in ascending order 27 8 observations the median is the average of the middle two values. Median = ( )/2 =

10 Slide © 2007 Thomson South-Western. All Rights Reserved Mean VS Median n The mean IS affected by outliers (extreme observations) n The median IS NOT affected by outliers

11 Slide © 2007 Thomson South-Western. All Rights Reserved Mode The mode of a data set is the value that occurs with The mode of a data set is the value that occurs with greatest frequency. greatest frequency. The greatest frequency can occur at two or more The greatest frequency can occur at two or more different values. different values. If the data have exactly two modes, the data are If the data have exactly two modes, the data are bimodal. bimodal. If the data have more than two modes, the data are If the data have more than two modes, the data are multimodal. multimodal.

12 Slide © 2007 Thomson South-Western. All Rights Reserved Percentiles A percentile provides information about how the A percentile provides information about how the data are spread over the interval from the smallest data are spread over the interval from the smallest value to the largest value. value to the largest value. Admission test scores for colleges and universities Admission test scores for colleges and universities are frequently reported in terms of percentiles. are frequently reported in terms of percentiles.

13 Slide © 2007 Thomson South-Western. All Rights Reserved n The p th percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p ) percent of the items take on this value or more. Percentiles

14 Slide © 2007 Thomson South-Western. All Rights Reserved Percentiles Arrange the data in ascending order. Arrange the data in ascending order. Compute index i, the position of the p th percentile. Compute index i, the position of the p th percentile. i = ( p /100) n If i is not an integer, round up. The p th percentile If i is not an integer, round up. The p th percentile is the value in the i th position. is the value in the i th position. If i is not an integer, round up. The p th percentile If i is not an integer, round up. The p th percentile is the value in the i th position. is the value in the i th position. If i is an integer, the p th percentile is the average If i is an integer, the p th percentile is the average of the values in positions i and i +1. of the values in positions i and i +1. If i is an integer, the p th percentile is the average If i is an integer, the p th percentile is the average of the values in positions i and i +1. of the values in positions i and i +1.

15 Slide © 2007 Thomson South-Western. All Rights Reserved Note on Excel’s Percentile Function The formula that Excel uses is different from the one used in the textbook! The formula that Excel uses is different from the one used in the textbook! In order to find the observation where the median occurs, Excel uses the following formula: L p = ( p /100) n + (1 – p /100) Once the observation is identified Excel will: 1.If L p is a whole number (e.g. 12), Excel’s result will be the same as the textbook’s. 2.If Lp is not a whole number (e.g. 12.3) Excel’s result will be different from the textbook’s.

16 Slide © 2007 Thomson South-Western. All Rights Reserved Quartiles Quartiles are specific percentiles. Quartiles are specific percentiles. First Quartile = 25th Percentile First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile Third Quartile = 75th Percentile

17 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Variability It is often desirable to consider measures of variability It is often desirable to consider measures of variability (dispersion), as well as measures of location. (dispersion), as well as measures of location. For example, in choosing supplier A or supplier B we For example, in choosing supplier A or supplier B we might consider not only the average delivery time for might consider not only the average delivery time for each, but also the variability in delivery time for each. each, but also the variability in delivery time for each.

18 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Variability n Range n Interquartile Range n Variance n Standard Deviation n Coefficient of Variation

19 Slide © 2007 Thomson South-Western. All Rights Reserved Range The range of a data set is the difference between the The range of a data set is the difference between the largest and smallest data values. largest and smallest data values. It is the simplest measure of variability. It is the simplest measure of variability. It is very sensitive to the smallest and largest data It is very sensitive to the smallest and largest data values. values.

20 Slide © 2007 Thomson South-Western. All Rights Reserved Interquartile Range The interquartile range of a data set is the difference The interquartile range of a data set is the difference between the third quartile and the first quartile. between the third quartile and the first quartile. It is the range for the middle 50% of the data. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values. It overcomes the sensitivity to extreme data values.

21 Slide © 2007 Thomson South-Western. All Rights Reserved The variance is a measure of variability that utilizes The variance is a measure of variability that utilizes all the data. all the data. Variance It is based on the difference between the value of It is based on the difference between the value of each observation ( x i ) and the mean ( for a sample, each observation ( x i ) and the mean ( for a sample,  for a population).  for a population).

22 Slide © 2007 Thomson South-Western. All Rights Reserved Variance The variance is computed as follows: The variance is computed as follows: The variance is the average of the squared The variance is the average of the squared differences between each data value and the mean. differences between each data value and the mean. The variance is the average of the squared The variance is the average of the squared differences between each data value and the mean. differences between each data value and the mean. for a sample population

23 Slide © 2007 Thomson South-Western. All Rights Reserved Standard Deviation The standard deviation of a data set is the positive The standard deviation of a data set is the positive square root of the variance. square root of the variance. It is measured in the same units as the data, making It is measured in the same units as the data, making it more easily interpreted than the variance. it more easily interpreted than the variance.

24 Slide © 2007 Thomson South-Western. All Rights Reserved The standard deviation is computed as follows: The standard deviation is computed as follows: for a sample population Standard Deviation

25 Slide © 2007 Thomson South-Western. All Rights Reserved The coefficient of variation is computed as follows: The coefficient of variation is computed as follows: Coefficient of Variation The coefficient of variation indicates how large the The coefficient of variation indicates how large the standard deviation is in relation to the mean. standard deviation is in relation to the mean. The coefficient of variation indicates how large the The coefficient of variation indicates how large the standard deviation is in relation to the mean. standard deviation is in relation to the mean. for a sample population

26 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Distribution Shape, Relative Location, and Detecting Outliers n Distribution Shape n z-Scores n Chebyshev’s Theorem n Empirical Rule n Detecting Outliers

27 Slide © 2007 Thomson South-Western. All Rights Reserved Distribution Shape: Skewness n An important measure of the shape of a distribution is called skewness. n The formula for computing skewness for a data set is somewhat complex. Skewness can be easily computed using statistical software. Skewness can be easily computed using statistical software. n Excel’s SKEW function can be used to compute the skewness of a data set. skewness of a data set.

28 Slide © 2007 Thomson South-Western. All Rights Reserved Distribution Shape: Skewness n Symmetric (not skewed) Skewness is zero. Skewness is zero. Mean and median are equal. Mean and median are equal. Relative Frequency Skewness = 0 Skewness = 0

29 Slide © 2007 Thomson South-Western. All Rights Reserved Relative Frequency Distribution Shape: Skewness n Moderately Skewed Left Skewness is negative. Skewness is negative. Mean will usually be less than the median. Mean will usually be less than the median. Skewness = .31 Skewness = .31

30 Slide © 2007 Thomson South-Western. All Rights Reserved Distribution Shape: Skewness n Moderately Skewed Right Skewness is positive. Skewness is positive. Mean will usually be more than the median. Mean will usually be more than the median. Relative Frequency Skewness =.31 Skewness =.31

31 Slide © 2007 Thomson South-Western. All Rights Reserved The z-score is often called the standardized value. The z-score is often called the standardized value. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value x i is from the mean. value x i is from the mean. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value x i is from the mean. value x i is from the mean. z-Scores

32 Slide © 2007 Thomson South-Western. All Rights Reserved z-Scores A data value less than the sample mean will have a A data value less than the sample mean will have a z-score less than zero. z-score less than zero. A data value greater than the sample mean will have A data value greater than the sample mean will have a z-score greater than zero. a z-score greater than zero. A data value equal to the sample mean will have a A data value equal to the sample mean will have a z-score of zero. z-score of zero. An observation’s z-score is a measure of the relative An observation’s z-score is a measure of the relative location of the observation in a data set. location of the observation in a data set.

33 Slide © 2007 Thomson South-Western. All Rights Reserved Chebyshev’s Theorem At least (1 - 1/ z 2 ) of the items in any data set will be At least (1 - 1/ z 2 ) of the items in any data set will be within z standard deviations of the mean, where z is within z standard deviations of the mean, where z is any value greater than 1. any value greater than 1. At least (1 - 1/ z 2 ) of the items in any data set will be At least (1 - 1/ z 2 ) of the items in any data set will be within z standard deviations of the mean, where z is within z standard deviations of the mean, where z is any value greater than 1. any value greater than 1.

34 Slide © 2007 Thomson South-Western. All Rights Reserved At least of the data values must be At least of the data values must be within of the mean. within of the mean. At least of the data values must be At least of the data values must be within of the mean. within of the mean. 75%75% z = 2 standard deviations z = 2 standard deviations Chebyshev’s Theorem At least of the data values must be At least of the data values must be within of the mean. within of the mean. At least of the data values must be At least of the data values must be within of the mean. within of the mean.89%89% z = 3 standard deviations z = 3 standard deviations At least of the data values must be At least of the data values must be within of the mean. within of the mean. At least of the data values must be At least of the data values must be within of the mean. within of the mean. 94%94% z = 4 standard deviations z = 4 standard deviations

35 Slide © 2007 Thomson South-Western. All Rights Reserved Empirical Rule For data having a bell-shaped distribution: For data having a bell-shaped distribution: of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean.68.26%68.26% +/- 1 standard deviation of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean %95.44% +/- 2 standard deviations of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean. of the values of a normal random variable of the values of a normal random variable are within of its mean. are within of its mean.99.72%99.72% +/- 3 standard deviations

36 Slide © 2007 Thomson South-Western. All Rights Reserved Empirical Rule x  – 3   – 1   – 2   + 1   + 2   + 3  68.26% 95.44% 99.72%

37 Slide © 2007 Thomson South-Western. All Rights Reserved Detecting Outliers An outlier is an unusually small or unusually large An outlier is an unusually small or unusually large value in a data set. value in a data set. A data value with a z-score less than -3 or greater A data value with a z-score less than -3 or greater than +3 might be considered an outlier. than +3 might be considered an outlier. It might be: It might be: an incorrectly recorded data value an incorrectly recorded data value a data value that was incorrectly included in the a data value that was incorrectly included in the data set data set a correctly recorded data value that belongs in a correctly recorded data value that belongs in the data set the data set

38 Slide © 2007 Thomson South-Western. All Rights Reserved Measures of Association Between Two Variables n Covariance n Correlation Coefficient

39 Slide © 2007 Thomson South-Western. All Rights Reserved Covariance Positive values indicate a positive relationship. Positive values indicate a positive relationship. Negative values indicate a negative relationship. Negative values indicate a negative relationship. The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables. The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables.

40 Slide © 2007 Thomson South-Western. All Rights Reserved Covariance The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: forsamples forpopulations

41 Slide © 2007 Thomson South-Western. All Rights Reserved Correlation Coefficient Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship. relationship. Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship. relationship. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship. relationship. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship. relationship. The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1.

42 Slide © 2007 Thomson South-Western. All Rights Reserved The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: forsamplesforpopulations Correlation Coefficient

43 Slide © 2007 Thomson South-Western. All Rights Reserved Correlation Coefficient Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of the does not mean that one variable is the cause of the other. other. Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of the does not mean that one variable is the cause of the other. other. Correlation is a measure of linear association and not Correlation is a measure of linear association and not necessarily causation. necessarily causation. Correlation is a measure of linear association and not Correlation is a measure of linear association and not necessarily causation. necessarily causation.

44 Slide © 2007 Thomson South-Western. All Rights Reserved A golfer is interested in investigating A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score Average Driving Distance (yds.) Average 18-Hole Score Covariance and Correlation Coefficient

45 Slide © 2007 Thomson South-Western. All Rights Reserved Weighted Mean When the mean is computed by giving each data When the mean is computed by giving each data value a weight that reflects its importance, it is value a weight that reflects its importance, it is referred to as a weighted mean. referred to as a weighted mean. In the computation of a grade point average (GPA), In the computation of a grade point average (GPA), the weights are the number of credit hours earned for the weights are the number of credit hours earned for each grade. each grade. When data values vary in importance, the analyst When data values vary in importance, the analyst must choose the weight that best reflects the must choose the weight that best reflects the importance of each value. importance of each value.

46 Slide © 2007 Thomson South-Western. All Rights Reserved Weighted Mean where: x i = value of observation i x i = value of observation i w i = weight for observation i w i = weight for observation i

47 Slide © 2007 Thomson South-Western. All Rights Reserved In class empirical exercises