Download presentation
Presentation is loading. Please wait.
Published byAlison Anissa Simon Modified over 9 years ago
1
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning
2
2 2 Slide Chapter 3 Part B Descriptive Statistics: Numerical Methods n Measures of Relative Location and Detecting Outliers n Exploratory Data Analysis n Measures of Association Between Two Variables n The Weighted Mean and Working with Grouped Data Working with Grouped Data % % x x
3
3 3 Slide Measures of Relative Location and Detecting Outliers n z-Scores n Chebyshev’s Theorem n Empirical Rule n Detecting Outliers
4
4 4 Slide z-Scores n The z-score is often called the standardized value. n It denotes the number of standard deviations a data value x i is from the mean. n A data value less than the sample mean will have a z- score less than zero. n A data value greater than the sample mean will have a z-score greater than zero. n A data value equal to the sample mean will have a z- score of zero.
5
5 5 Slide n z-Score of Smallest Value (425) Standardized Values for Apartment Rents Example: Apartment Rents
6
6 6 Slide Chebyshev’s Theorem At least (1 - 1/ k 2 ) of the items in any data set will be At least (1 - 1/ k 2 ) of the items in any data set will be within k standard deviations of the mean, where k is any value greater than 1. At least 75% of the items must be within At least 75% of the items must be within k = 2 standard deviations of the mean. At least 89% of the items must be within At least 89% of the items must be within k = 3 standard deviations of the mean. At least 94% of the items must be within At least 94% of the items must be within k = 4 standard deviations of the mean. At least (1 - 1/ k 2 ) of the items in any data set will be At least (1 - 1/ k 2 ) of the items in any data set will be within k standard deviations of the mean, where k is any value greater than 1. At least 75% of the items must be within At least 75% of the items must be within k = 2 standard deviations of the mean. At least 89% of the items must be within At least 89% of the items must be within k = 3 standard deviations of the mean. At least 94% of the items must be within At least 94% of the items must be within k = 4 standard deviations of the mean.
7
7 7 Slide Example: Apartment Rents n Chebyshev’s Theorem Let k = 1.5 with = 490.80 and s = 54.74 Let k = 1.5 with = 490.80 and s = 54.74 At least (1 - 1/(1.5) 2 ) = 1 - 0.44 = 0.56 or 56% of the rent values must be between of the rent values must be between - k ( s ) = 490.80 - 1.5(54.74) = 409 - k ( s ) = 490.80 - 1.5(54.74) = 409 and and + k ( s ) = 490.80 + 1.5(54.74) = 573 + k ( s ) = 490.80 + 1.5(54.74) = 573
8
8 8 Slide n Chebyshev’s Theorem (continued) Actually, 86% of the rent values Actually, 86% of the rent values are between 409 and 573. are between 409 and 573. Example: Apartment Rents
9
9 9 Slide Empirical Rule For data having a bell-shaped distribution: For data having a bell-shaped distribution: Approximately 68% of the data values will be within one standard deviation of the mean. Approximately 68% of the data values will be within one standard deviation of the mean.
10
10 Slide Empirical Rule For data having a bell-shaped distribution: Approximately 95% of the data values will be within two standard deviations of the mean. Approximately 95% of the data values will be within two standard deviations of the mean.
11
11 Slide Empirical Rule For data having a bell-shaped distribution: Almost all (99.7%) of the items will be within three standard deviations of the mean. Almost all (99.7%) of the items will be within three standard deviations of the mean.
12
12 Slide Example: Apartment Rents n Empirical Rule Interval % in Interval Interval % in Interval Within +/- 1 s 436.06 to 545.5448/70 = 69% Within +/- 2 s 381.32 to 600.2868/70 = 97% Within +/- 3 s 326.58 to 655.0270/70 = 100%
13
13 Slide Detecting Outliers n An outlier is an unusually small or unusually large value in a data set. n A data value with a z-score less than -3 or greater than +3 might be considered an outlier. n It might be an incorrectly recorded data value. n It might be a data value that was incorrectly included in the data set. n It might be a correctly recorded data value that belongs in the data set !
14
14 Slide Example: Apartment Rents n Detecting Outliers The most extreme z-scores are -1.20 and 2.27. Using | z | > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents
15
15 Slide Exploratory Data Analysis n Five-Number Summary n Box Plot
16
16 Slide Five-Number Summary n Smallest Value n First Quartile n Median n Third Quartile n Largest Value
17
17 Slide Example: Apartment Rents n Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Median = 475 Third Quartile = 525 Largest Value = 615
18
18 Slide Box Plot n A box is drawn with its ends located at the first and third quartiles. n A vertical line is drawn in the box at the location of the median. n Limits are located (not drawn) using the interquartile range (IQR). The lower limit is located 1.5(IQR) below Q 1. The lower limit is located 1.5(IQR) below Q 1. The upper limit is located 1.5(IQR) above Q 3. The upper limit is located 1.5(IQR) above Q 3. Data outside these limits are considered outliers. Data outside these limits are considered outliers. … continued
19
19 Slide Box Plot (Continued) n Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. n The locations of each outlier is shown with the symbol *.
20
20 Slide Example: Apartment Rents n Box Plot Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5 Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5 Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outliers. 37 5 40 0 42 5 45 0 47 5 50 0 52 5 550 575 600 625
21
21 Slide Measures of Association Between Two Variables n Covariance n Correlation Coefficient
22
22 Slide Covariance n The covariance is a measure of the linear association between two variables. n Positive values indicate a positive relationship. n Negative values indicate a negative relationship.
23
23 Slide n If the data sets are samples, the covariance is denoted by s xy. n If the data sets are populations, the covariance is denoted by. Covariance
24
24 Slide Correlation Coefficient n The coefficient can take on values between -1 and +1. n Values near -1 indicate a strong negative linear relationship. n Values near +1 indicate a strong positive linear relationship. n If the data sets are samples, the coefficient is r xy. n If the data sets are populations, the coefficient is.
25
25 Slide The Weighted Mean and Working with Grouped Data n Weighted Mean n Mean for Grouped Data n Variance for Grouped Data n Standard Deviation for Grouped Data
26
26 Slide Weighted Mean n When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean. n In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade. n When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.
27
27 Slide Weighted Mean x = w i x i x = w i x i w i w iwhere: x i = value of observation i x i = value of observation i w i = weight for observation i w i = weight for observation i
28
28 Slide Grouped Data n The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. n To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. n We compute a weighted mean of the class midpoints using the class frequencies as weights. n Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.
29
29 Slide n Sample Data n Population Data where: f i = frequency of class i f i = frequency of class i M i = midpoint of class i M i = midpoint of class i Mean for Grouped Data
30
30 Slide Example: Apartment Rents Given below is the previous sample of monthly rents for one-bedroom apartments presented here as grouped data in the form of a frequency distribution.
31
31 Slide Example: Apartment Rents n Mean for Grouped Data This approximation differs by $2.41 from This approximation differs by $2.41 from the actual sample mean of $490.80. the actual sample mean of $490.80.
32
32 Slide Variance for Grouped Data n Sample Data n Population Data
33
33 Slide Example: Apartment Rents n Variance for Grouped Data n Standard Deviation for Grouped Data This approximation differs by only $.20 from the actual standard deviation of $54.74.
34
34 Slide End of Chapter 3, Part B
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.