1 1 Slide © 2003 South-Western/Thomson Learning TM Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Variability n Measures of Relative Location and Detecting Outliers n Exploratory Data Analysis n Measures of Association Between Two Variables x x % %
2 2 Slide © 2003 South-Western/Thomson Learning TM Measures of Variability n It is often desirable to consider measures of variability (dispersion), as well as measures of location. n For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each. n Range n Inter-quartile Range n Variance n Standard Deviation n Coefficient of Variation
3 3 Slide © 2003 South-Western/Thomson Learning TM Measures of Variation Variation Variance Standard Deviation Coefficient of Variation PopulationVariance SampleVariance PopulationStandardDeviation SampleStandardDeviation Range InterquartileRange
4 4 Slide © 2003 South-Western/Thomson Learning TM n Measures of variation give information on the spread or variability of the data values. Variation Variation Same center, different variation
5 5 Slide © 2003 South-Western/Thomson Learning TM Range n Simplest measure of variation n Difference between the largest and the smallest observations: Range = x maximum – x minimum Range = = 13 Example: Chap 3-5
6 6 Slide © 2003 South-Western/Thomson Learning TM Example: Apartment Rents n Range Range = largest value - smallest value Range = largest value - smallest value Range = = 190 Range = = 190
7 7 Slide © 2003 South-Western/Thomson Learning TM Interquartile Range n The interquartile range of a data set is the difference between the third quartile and the first quartile. n It is the range for the middle 50% of the data.
8 8 Slide © 2003 South-Western/Thomson Learning TM Example: Apartment Rents n Interquartile Range 3rd Quartile ( Q 3) = 525 3rd Quartile ( Q 3) = 525 1st Quartile ( Q 1) = 445 1st Quartile ( Q 1) = 445 Interquartile Range = Q 3 - Q 1 = = 80 Interquartile Range = Q 3 - Q 1 = = 80
9 9 Slide © 2003 South-Western/Thomson Learning TM Variance n The variance is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation ( x i ) and the mean ( x for a sample, for a population). It is based on the difference between the value of each observation ( x i ) and the mean ( x for a sample, for a population).
10 Slide © 2003 South-Western/Thomson Learning TM Variance n The variance is the average of the squared differences between each data value and the mean. n If the data set is a sample, the variance is denoted by s 2. If the data set is a population, the variance is denoted by 2. If the data set is a population, the variance is denoted by 2.
11 Slide © 2003 South-Western/Thomson Learning TM Variance for Grouped Data n Sample Data n Population Data
12 Slide © 2003 South-Western/Thomson Learning TM Standard Deviation n Most commonly used measure of variation n Shows variation about the mean n The standard deviation of a data set is the positive square root of the variance. n If the data set is a sample, the standard deviation is denoted s. If the data set is a population, the standard deviation is denoted (sigma). If the data set is a population, the standard deviation is denoted (sigma).
13 Slide © 2003 South-Western/Thomson Learning TM Calculation Example: Sample Standard Deviation Sample Data (X i ) : n = 8 Mean = x = 16 n = 8 Mean = x = 16
14 Slide © 2003 South-Western/Thomson Learning TM Coefficient of Variation n Measures relative variation n Always in percentage (%) n Shows variation relative to mean n Is used to compare two or more sets of data measured in different units Population Sample
15 Slide © 2003 South-Western/Thomson Learning TM Example: Apartment Rents n Variance n Standard Deviation n Coefficient of Variation
16 Slide © 2003 South-Western/Thomson Learning TM Measures of Relative Location and Detecting Outliers n z-Scores n Detecting Outliers
17 Slide © 2003 South-Western/Thomson Learning TM z -Scores n The z -score is often called the standardized value. n It denotes the number of standard deviations a data value x i is from the mean. n A data value less than the sample mean will have a z -score less than zero. n A data value greater than the sample mean will have a z -score greater than zero. n A data value equal to the sample mean will have a z -score of zero.
18 Slide © 2003 South-Western/Thomson Learning TM n z -Score of Smallest Value (425) Standardized Values for Apartment Rents Example: Apartment Rents
19 Slide © 2003 South-Western/Thomson Learning TM Detecting Outliers n An outlier is an unusually small or unusually large value in a data set. n A data value with a z-score less than -3 or greater than +3 might be considered an outlier. n It might be an incorrectly recorded data value. n It might be a data value that was incorrectly included in the data set.
20 Slide © 2003 South-Western/Thomson Learning TM Example: Apartment Rents n Detecting Outliers The most extreme z-scores are and Using | z | > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents
21 Slide © 2003 South-Western/Thomson Learning TM Exploratory Data Analysis n Five-Number Summary
22 Slide © 2003 South-Western/Thomson Learning TM Five-Number Summary n Smallest Value n First Quartile n Median n Third Quartile n Largest Value
23 Slide © 2003 South-Western/Thomson Learning TM Example: Apartment Rents n Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Median = 475 Third Quartile = 525 Largest Value = 615
24 Slide © 2003 South-Western/Thomson Learning TM Measures of Association between Two Variables n Covariance n Correlation Coefficient
25 Slide © 2003 South-Western/Thomson Learning TM Covariance n The covariance is a measure of the linear association between two variables. n Positive values indicate a positive relationship. n Negative values indicate a negative relationship.
26 Slide © 2003 South-Western/Thomson Learning TM n If the data sets are samples, the covariance is denoted by s xy. n If the data sets are populations, the covariance is denoted by. Covariance
27 Slide © 2003 South-Western/Thomson Learning TM Correlation Coefficient n The coefficient can take on values between -1 and +1. n Values near -1 indicate a strong negative linear relationship. n Values near +1 indicate a strong positive linear relationship. n If the data sets are samples, the coefficient is r xy. n If the data sets are populations, the coefficient is.