Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4 Third Quartile 5 Largest Value
Five-Number Summary Lowest Value = 425 First Quartile = 445 Median = 475 Third Quartile = 525 Largest Value = 615
Box Plot A box is drawn with its ends located at the first and third quartiles. A vertical line is drawn in the box at the location of the median (second quartile). 375 400 425 450 475 500 525 550 575 600 625 Q1 = 445 Q3 = 525 Q2 = 475
Box Plot Limits are located (not drawn) using the interquartile range (IQR). Data outside these limits are considered outliers. The locations of each outlier is shown with the symbol * . … continued
Box Plot The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(75) = 332.5 The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outliers (values less than 332.5 or greater than 637.5) in the apartment rent data.
Box Plot Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. 375 400 425 450 475 500 525 550 575 600 625 Smallest value inside limits = 425 Largest value inside limits = 615
Measures of Association Between two Variables Covariance Correlation coefficient
Covariance Covariance is a measure of linear association between variables. Positive values indicate a positive correlation between variables. Negative values indicate a negative correlation between variables.
To compute a covariance for variables x and y For populations For samples
n = 299 II I IV III
If the majority of the sample points are located in quadrants II and IV, you have a negative correlation between the variables—as we do in this case. Thus the covariance will have a negative sign.
The (Pearson) Correlation Coefficient A covariance will tell you if 2 variables are positively or negatively correlated—but it will not tell you the degree of correlation. Moreover, the covariance is sensitive to the unit of measurement. The correlation coefficient does not suffer from these defects
The (Pearson) Correlation Coefficient For populations For samples Note that:
I have 7 hours per week for exercise
Example: Golf Stats A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score. Average Driving Distance (yds.) Average 18-Hole Score 277.6 259.5 269.1 267.0 255.6 272.9 69 71 70
Using Excel to Compute the Covariance and Correlation Coefficient Formula Worksheet
Using Excel to Compute the Covariance and Correlation Coefficient Value Worksheet
The Weighted Mean and Working with Grouped Data Mean for grouped data Variance for grouped data Standard deviation for grouped data.
GPA Example A grade point average is a weighted-mean. That is, 4- hour courses are weighted more than 3- hour courses when computing a GPA
The Weighted Mean Where wi is the weight attached to observation i
Example: Raw Materials Purchase Cost per Pound($) Number of Pounds 1 3.00 1200 2 3.40 500 3 2.80 2750 4 2.90 1000 5 3.25 800 Let x1 = 3.00, x2 = 3.40, x3 = 2.80, x4 =2.90, and x5 = 3.25 Let w1 = 1200, w2 = 500, w3 = 2750, w4 =1000, and w5 =800 Thus:
Grouped Data The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. We compute a weighted mean of the class midpoints using the class frequencies as weights. Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.
Sample Mean for Grouped Data For populations For samples Where fi is the frequency of class i and Mi is the midpoint of class i
Example: Apartment Rents Given below is the previous sample of monthly rents for 70 studio apartments, presented here as grouped data in the form of a frequency distribution.
Sample Mean for Grouped Data This approximation differs by $2.41 from the actual sample mean of $490.80.
Variance for Grouped Data For populations For samples
Sample Variance for Grouped Data continued
Sample Standard Deviation Sample Variance for Grouped Data Sample Variance Sample Standard Deviation s2 = 208,234.29/(70 – 1) = 3,017.89 This approximation differs by only $.20 from the actual standard deviation of $54.74.