Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval about the mean.
Rules of Data Dispersion Empirical Rule Chebyshev’s Theorem (IMPORTANT TERM: AT LEAST)
Empirical Rule Applicable for a symmetric bell shaped distribution / normal distribution. There are 3 rules: i. 68% of the observations lie in the interval (mean ±SD) ii. 95% of the observations lie in the interval (mean ±2SD) iii. 99.7% of the observations lie in the interval (mean ±3SD)
Empirical Rule
Example: 95% of students at school are between 1.1m and 1.7m tall. Assuming this data is normally distributed can you calculate the mean and standard deviation?
Empirical Rule
The age distribution of a sample of 5000 persons is bell shaped with a mean of 40 yrs and a standard deviation of 12 yrs. Determine the approximate percentage of people who are 16 to 64 yrs old.
Chebyshev’s Theorem
Applicable for any distribution /not normal distribution At least of the observations will be in the range of k standard deviation from mean where k is the positive number exceed 1 or (k>1).
Chebyshev’s Theorem Example Assuming that the weight of students in this class are not normally distributed, find the percentage of student that falls under 2SD.
Chebyshev’s Theorem Consider a distribution of test scores that are badly skewed to the right, with a sample mean of 80 and a sample standard deviation of 5. If k=2, what is the percentage of the data fall in the interval from mean?
Measures of Position To describe the relative position of a certain data value within the entire set of data. z scores Percentiles Quartiles Outliers
Quartiles Divide data sets into fourths or four equal parts.
Boxplot
Outliers Extreme observations Can occur because of the error in measurement of a variable, during data entry or errors in sampling.
Outliers Checking for outliers by using Quartiles Step 1: Determine the first and third quartiles of data. Step 2: Compute the interquartile range (IQR). Step 3: Determine the fences. Fences serve as cutoff points for determining outliers. Step 4: If data value is less than the lower fence or greater than the upper fence, considered outlier.