Download presentation
Published byJacob Harmon Modified over 9 years ago
1
Descriptive Statistics-IV (Measures of Variation)
QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)
2
Deviation, Variance and Standard Deviation-I
The of a data entry xi in a population data set is the difference between xi and population mean , i.e. The sum of the deviations over all entries is zero. The is the sum of the squared deviations over all entries: is the Greek letter sigma. Deviation Population variance
3
Deviation, Variance and Standard Deviation-II
The is the square root of the population variance, i.e.: Note: these quantities relate to the population and not a sample from the population. Note: sometimes the standard deviation is referred to as the standard error. Population standard deviation
4
The Sample variance and Standard Deviation
The and the of a data set with n entries are given by: Sample variance Sample standard deviation Note the division by n -1 rather than N or n.
5
Calculating Standard Deviations
Step Population Sample Find the mean Find the deviation for each entry Square each deviation Add to get the sum of squares (SSx) Divide by N or (n -1) to get the variance Take the square root to get the standard deviation
6
Example Find the standard deviation of the following bowhead lengths (in m): (8.5, 8.4, 13.8, 9.3, 9.7) Key question (before doing anything) – is this a sample or a population?
7
Formulae in EXCEL Calculating Means: Average(“A1:A10”)
Calculating Standard deviations: Stdev(“A1:A10”) – this calculates the sample and not the population standard deviation!
8
Standard Deviations-I
SD=0 SD=2.1 SD=5.3
9
Standard Deviations-II (Symmetric Bell-shaped distributions)
k = 2: proportion > 75% k = 3: proportion > 88% Chebychev’s Theorem: The proportion of the data lying within k standard deviations (k >1) of the mean is at least 1 - 1/k2 68% 34% 95% 13.5% 99.7%
10
Standard Deviations-III (Grouped data)
The standard deviation of a frequency distribution is: Note: where the frequency distribution consists of bins that are ranges, xi should be the midpoint of bin i (be careful of the first and last bins).
11
Standard Deviations-IV (The shortcut formula)
12
The Coefficient of Variation
The is the standard deviation divided by the mean - often expressed as a percentage. The coefficient of variation is dimensionless and can be used to compare among data sets based on different units. coefficient of variation
13
Z-Scores The is calculated using the equation: Standard (or Z) score
14
Outliers-I Outliers can lead to mis-interpretation of results. They can arise because of data errors (typing measurements in cm rather than in m) or because of unusual events. There are several rules for identifying outliers: Outliers: < Q2-6(Q2-Q1); > Q2+6(Q3-Q2) Strays: < Q2-3(Q2-Q1); > Q2+3(Q3-Q2)
15
Outliers-II Strays and outliers should be indicated on box and whisker plots: Consider the data set of bowhead lengths, except that a length of 1 is added! 5 10 15 Length (m)
16
Review of Symbols in this Lecture
17
Summary We use descriptive statistics to “get a feel for the data” (also called “exploratory data analysis”). In general, we are using statistics from the sample to learn something about the population.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.