Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics: Numerical Measures

Similar presentations


Presentation on theme: "Descriptive Statistics: Numerical Measures"— Presentation transcript:

1 Descriptive Statistics: Numerical Measures
Measures of Shape of a Distribution, Relative Location and Outliers Measures of Association between Two Variables Weighted Mean and Grouped Data

2 Shape of A Distribution Depends On the Relative Location, and Outliers
z-Scores (Standardized Values) Chebyshev’s Theorem Empirical Rule Detecting Outliers

3 Distribution Shape: Skewness
An important measure of the shape of a distribution is called skewness. The formula for computing the skewness of a data set is somewhat complex.

4 Distribution Shape: Skewness
Skeweness (S) Is a measure of the asymmetry of a probability distribution S=0: Symmetrical S>0: the distribution is right (positively) skewed S<0: the distribution is left (negatively) skewed

5 Distribution Shape: Skewness
Symmetric (not skewed) Skewness is zero. Mean and median are equal. .05 .10 .15 .20 .25 .30 .35 Skewness = 0 Relative Frequency

6 Distribution Shape: Skewness
Moderately Skewed Left Skewness is negative. Mean will usually be less than the median. .05 .10 .15 .20 .25 .30 .35 Skewness = Relative Frequency

7 Distribution Shape: Skewness
Highly Skewed Right Skewness is positive (often above 1.0). Mean will usually be more than the median. .05 .10 .15 .20 .25 .30 .35 Skewness = 1.25 Relative Frequency

8 Distribution Shape: Skewness
Example: Apartment Rents The monthly rents for randomly selected 70 efficiency apartments are listed in ascending order on the next slide.

9 Distribution Shape: Skewness

10 Distribution Shape: Skewness
.05 .10 .15 .20 .25 .30 .35 Skewness = .92 Relative Frequency

11 z-Score (Standardized Value)
Is a measure of the relative location of the observation in a data set. Z-Score denotes the number of standard units (deviations) a given data value xi is located from the mean. As a result, z-score is also called a standardized value.

12 z-Score (Standardized Value)
A data value less than the sample mean will always have a z-score less than zero. A data value greater than the sample mean will always have a z-score greater than zero. A data value equal to the sample mean will always have a z-score of zero.

13 Recall: Rent for 70 Efficiency Apartments
To convert this data into z-scores, first we compute the mean and the standard deviation of the data. Then we use the formula for computing the z-score to convert each value into a standardized value; Mean=490.80; S=54.74

14 Standardized Values for Apartment Rents
z-Scores z-Score of Smallest Value (425) Standardized Values for Apartment Rents

15 Chebyshev’s Theorem A theorem that shows the position of a certain proportion of observation in any data set relative to the mean of the data when the values are standardized.

16 The theorem states that, for any data set---
At least of the data values must be within of the mean. 75% z = 2 standard deviations At least of the data values must be within of the mean. 89% z = 3 standard deviations At least of the data values must be within of the mean. 94% z = 4 standard deviations

17 For a data with a bell-shaped distribution:
Empirical Rule For a data with a bell-shaped distribution: of the values of a normal random variable are within of its mean. 68.26% +/- 1 standard deviation of the values of a normal random variable are within of its mean. 95.44% +/- 2 standard deviations of the values of a normal random variable are within of its mean. 99.72% +/- 3 standard deviations

18 Empirical Rule x 99.72% 95.44% 68.26% m m – 3s m – 1s m + 1s m + 3s

19 Z-Scores allow us to Detect Outliers
An outlier is an unusually small or unusually large value in a data set. It might be the result of: an incorrect recording an incorrectly included value in the data set a correctly recorded data value but an unusual occurrence A data value with a z-score less than -3 or greater than +3 might be considered an outlier.

20 Standardized Values for Apartment Rents
Detecting Outliers The most extreme z-scores are and 2.27 Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents

21 Exploratory Data Analysis
Five-Number Summary and Box Plot

22 Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4
Third Quartile Largest Value 5

23 Five-Number Summary Lowest Value = 425 First Quartile = 445
Median = 475 Third Quartile = 525 Largest Value = 615

24 Box Plot A box is drawn with its ends located at the first and
third quartiles. A vertical line is drawn in the box at the location of the median (second quartile). 375 400 425 450 475 500 525 550 575 600 625 Q1 = 445 Q3 = 525 Q2 = 475

25 Box Plot Lower and upper Limits are located (not drawn) using the interquartile range (IQR). Data outside these limits are considered outliers. The locations of each outlier is shown with the symbol *

26 Box Plot The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q (IQR) = (75) = 332.5 The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q (IQR) = (75) = 637.5 There are no outliers (values less than or greater than 637.5) in the apartment rent data.

27 Box Plot Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. 375 400 425 450 475 500 525 550 575 600 625 Smallest value inside limits = 425 Largest value inside limits = 615

28 Measures of Association between Two Variables
Covariance Correlation Coefficient

29 Covariance

30 Covariance Covariance between two random variables ( X and Y)
is computed as follows: for samples for populations

31 Covariance The covariance is a measure of the direction of movement
and linear association between two variables. Positive values indicate a positive relationship. Negative values indicate a negative relationship.

32 Covariance Use the following data on Age and Weight of five randomly selected UMD students and compute the covariance between age and weight. Age Weight 17 160 25 150 20 130 19 145 21 155

33 Correlation Coefficient

34 Correlation Coefficient
Correlation is a measure of the degree of linear association between two variables. However, it doesn’t indicate the causation. That is, just because two variables are highly correlated, it does not mean that one variable is the cause of the other.

35 Correlation Coefficient
The correlation coefficient is computed as follows: for samples for populations

36 Correlation Coefficient
The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship.

37 Correlation Age Weight 17 160 25 150 20 130 19 145 21 155
Use the following data on age and weight of five randomly selected UMD students, compute the correlation between age and weight, and comment on the direction and extent of the association observed

38 Correlation Coefficient
The correlation coefficient is computed as follows: for samples

39 Covariance and Correlation Coefficient
A golfer is interested in investigating the relationship, if any, between driving distance and 18-hole score. Average Driving Distance (yds.) Average 18-Hole Score 277.6 259.5 269.1 267.0 255.6 272.9 69 71 70

40 Covariance and Correlation Coefficient
x y 277.6 259.5 269.1 267.0 255.6 272.9 69 71 70 10.65 -7.45 2.15 0.05 -11.35 5.95 -1.0 1.0 -10.65 -7.45 -11.35 -5.95 Average 267.0 70.0 Total -35.40 Std. Dev. 8.2192 .8944

41 Covariance and Correlation Coefficient
Sample Covariance Sample Correlation Coefficient

42 Mean, Variance, and Standard Deviation of Grouped Data
Weighted Mean Mean, Variance, and Standard Deviation of Grouped Data

43 Courses Credit Hours Grade Calculus 4 B Psychology 3 A Marketing C
You are taking five courses. The following table depicts the credit hours associated with each course and your grades. Compute your GPA for the semester? Courses Credit Hours Grade Calculus 4 B Psychology 3 A Marketing C Economics D Stat 2

44 Weighted Mean where: xi = value of observation i
wi = weight for observation i

45 Courses Credit Hours (Wi) Grade WiX G Calculus 4 B(3) 12 Psychology 3
You are taking five courses. The following table depicts the credit hours associated with each course and your grades. Compute your GPA for the semester? Courses Credit Hours (Wi) Grade WiX G Calculus 4 B(3) 12 Psychology 3 A(4) Marketing C(2) 6 Economics D(1) Stat 2 8 13 41

46 Weighted Mean where: xi = value of observation i
wi = weight for observation i

47 Weighted Mean When the mean is computed by giving each data
value a weight that reflects its importance, it is referred to as a weighted mean. When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.

48 Working with Grouped Data

49 Given below is a sample of monthly rents for 70 efficiency apartments.

50 Grouped Data

51 Computing the mean and variance of a grouped data
To compute the weighted mean from a grouped data we treat the midpoint of each class as though it were the mean of all items in the class. We compute a weighted mean of the data using class midpoints and class frequencies as weights. Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.

52 Mean for Grouped Data Sample Data Population Data where:
fi = frequency of class i Mi = midpoint of class i

53 Sample Mean for Grouped Data
Given below is a sample of monthly rents for 70 efficiency apartments as grouped data--- in the form of a frequency distribution.

54 Sample Mean for Grouped Data
This approximation differs by $2.41 from the actual sample mean of $

55 Variance for Grouped Data
For sample data For population data

56 Sample Variance for Grouped Data

57 Sample Standard Deviation
Sample Variance for Grouped Data Sample Variance s2 = 208,234.29/(70 – 1) = 3,017.89 Sample Standard Deviation This approximation differs by only $.20 from the actual standard deviation of $54.74.


Download ppt "Descriptive Statistics: Numerical Measures"

Similar presentations


Ads by Google