Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods

Name: Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Uploaded: 2017-12-15T10:42:36+00:00
Duration: PTM22S39
Description: Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods

Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Measures of Location Measures of Variability x s s m

Measures of Location Mean Median Mode Percentiles

Example: Apartment Rents
Given below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order. 425 430 430 435 435 435 435 435 440 440 440 440 440 445 445 445 445 445 450 450 450 450 450 450 450 460 460 460 465 465 465 470 470 472 475 475 475 480 480 480 480 485 490 490 490 500 500 500 500 510 510 515 525 525 525 535 549 550 570 570 575 575 580 590 600 600 600 600 615 615

Mean The mean of a data set is the average of all the data values. If the data are from a sample, the mean is denoted by (x-bar) If the data are from a population, the mean is denoted by (mu). x å x x = i n m å x m = i N

Mean å x 34 , , 356 = i = = x = = = 490 . . 80 n 70

Trimmed Mean With n = 70, a 5% trimmed mean removes .05(70) = 3.5 = 4 values from each end of the set. 5% trimmed mean = 30 , , 206 = = 487 . . 19 62

Median The median of a data set is the value in the middle when the data items are arranged in ascending order. If there is an odd number of items, the median is the value of the middle item. If there is an even number of items, the median is the average of the values for the middle two items.

Median Median = 50th percentile i = (p/100)n = (50/100)70 = Averaging the 35th and 36th data values: Median = ( )/2 = 475

Mode The mode of a data set is the value that occurs with greatest frequency.

Mode 450 occurred most frequently (7 times) Mode = 450

Percentiles The p th percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100-p) percent of the items take on this value or more. Arrange the data in ascending order. Compute index i, the position of the p th percentile. i = (p/100)n If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i+1.

90th Percentile i = (p/100)n = (90/100)70 = 63 Averaging the 63rd and 64th data values: 90th Percentile = ( )/2 = 585

Quartiles Quartiles are specific percentiles.
First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile

Third Quartile Third quartile = 75th percentile i = (p/100)n = (75/100)70 = 52.5 = 53 Third quartile = 525

Measures of Variability
Range Variance Standard Deviation Coefficient of Variation

Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion. It is very sensitive to the smallest and largest data values.

Range Range = largest value - smallest value Range = = 190

Variance The variance is the average of the squared differences between each data value and the mean. If the data set is a sample, the variance is denoted by s2. If the data set is a population, the variance is denoted by  2. - - 2 å 2 ( x x ) 2 i i s = = - n - 1 x - m ) 2 å ( x ) 2 s = i N

Standard Deviation The standard deviation of a data set is the positive square root of the variance. It is measured in the same units as the data, making it more easily comparable to the mean. If the data set is a sample, the standard deviation is denoted s. If the data set is a population, the standard deviation is denoted  (sigma). = s = s 2 s = = s 2

Coefficient of Variation
The coefficient of variation indicates how large the standard deviation is in relation to the mean. If the data set is a sample, the coefficient of variation is computed as follows: If the data set is a population, the coefficient of variation is computed as follows: s ( ( 100 ) ) x s ( 100 ) m

Variance Standard Deviation Coefficient of Variation å 2 ( x x ) 2 i s = = = 2 , , 996 . . 16 n - - 1 = = s = s 2 = 2996 . . 47 54 . . 74 s 54 . . 74 = 100 = = 100 = 11 . . 15 x 490 . 80

Measures of Relative Location and Locating Outliers z -Scores Chebyshev’s Theorem The Empirical Rule Detecting Outliers

z -Scores The z -score is often called the standardized value.
It denotes the number of standard deviations a data value xi is from the mean. A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z -score greater than zero. A data value equal to the sample mean will have a z -score of zero. x - x = i z i s

z -Score of Smallest Value (425) Standardized Values for Apartment Rents x - - x 425 - - 490 . . 80 = i = = = = - z - 1 . . 20 s 54 . . 74 -1.20 -1.11 -1.02 -0.93 -0.84 -0.75 -0.56 -0.47 -0.38 -0.34 -0.29 -0.20 -0.11 -0.01 0.17 0.35 0.44 0.62 0.81 1.06 1.08 1.45 1.54 1.63 1.81 1.99 2.27

Chebyshev’S Theorem At least (1 - 1/k 2) of the items in any data set will be within k standard deviations of the mean, where k is any value greater than 1. At least 75% of the items must be within k = 2 standard deviations of the mean. At least 89% of the items must be within k = 3 standard deviations of the mean. At least 94% of the items must be within k = 4 standard deviations of the mean.

Chebyshev’s Theorem Let k = 1.5 with = and s = 54.74 At least (1 - 1/(1.5)2) = = 0.56 or 56% of the rent values must be between - k(s) = (54.74) = 409 and + k(s) = (54.74) = 573 x x x

Chebyshev’s Theorem (continued) Actually, 86% of the rent values are between 409 and 573.

Empirical Rule For data having a bell-shaped distribution:
Approximately 68% of the data values will be within one standard deviation of the mean. Approximately 95% of the data values will be within two standard deviations of the mean. Almost all of the items (99%) will be within three standard deviations of the mean.

Empirical Rule Interval % in Interval Within +/- 1s to /70 = 69% Within +/- 2s to /70 = 97% Within +/- 3s to /70 = 100%

Detecting Outliers An outlier is an unusually small or unusually large value in a data set. A data value with a z -score less than -3 or greater than +3 might be considered an outlier. It might be an incorrectly recorded data value. It might be a data value that was incorrectly included in the data set. It might be a correctly recorded data value that belongs in the data set!

Detecting Outliers The most extreme z -scores are and 2.27. Using |z | > 3 as the criterion for an outlier, there are no outliers in this data set.

Measures of Association Between Two Variables Working with Grouped Data

Measures of Association Between Two Variables
Covariance Correlation Coefficient

Covariance Positive values indicate a positive relationship.
Negative values indicate a negative relationship. If the data sets are samples, the covariance is denoted by sxy. If the data sets are populations, the covariance is denoted by

Correlation Coefficient
The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. If the data sets are samples, the coefficient is denoted by rxy. If the data sets are populations, the coefficient is denoted by Where Sx and Sy are the standard deviations for each variable!

Mean for Grouped Data Sample Data Population Data
where fi = frequency of class i Mi = midpoint of class i å f M x = i i n å f M m = i i N

Given below is the previous sample of monthly rents for one-bedroom apartments presented as grouped data in the form of a frequency distribution.

Example: Apartment Rent
Mean for Grouped Data This approximation differs by $2.41 from the actual sample mean of $

Variance for Grouped Data
Sample Data Population Data å f ( M - x ) 2 2 = i i s n - 1 å f ( M - m ) 2 s 2 = i i N

Sample Variance for Grouped Data Sample Standard Deviation for Grouped Data This approximation differs by only $.20 from the actual standard deviation of $54.74. s 2 = 3 , 017 . 89 s = 3 , 017 . 89 = 54 . 94

Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods

Similar presentations

Presentation on theme: "Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods

Similar presentations

Presentation on theme: "Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods"— Presentation transcript:

Similar presentations

About project

Feedback