Download presentation
Presentation is loading. Please wait.
Published byCameron Stevens Modified over 9 years ago
1
Lecture 4 Dustin Lueker
2
The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets finer and finer Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate Symmetric distributions ◦ Bell-shaped ◦ U-shaped ◦ Uniform Not symmetric distributions: ◦ Left-skewed ◦ Right-skewed ◦ Skewed 2STA 291 Summer 2010 Lecture 4
3
Center of the data ◦ Mean ◦ Median ◦ Mode Dispersion of the data Sometimes referred to as spread ◦ Variance, Standard deviation ◦ Interquartile range ◦ Range 3STA 291 Summer 2010 Lecture 4
4
Mean ◦ Arithmetic average Median ◦ Midpoint of the observations when they are arranged in order Smallest to largest Mode ◦ Most frequently occurring value 4STA 291 Summer 2010 Lecture 4
5
Sample size n Observations x 1, x 2, …, x n Sample Mean “x-bar” 5STA 291 Summer 2010 Lecture 4
6
Population size N Observations x 1, x 2,…, x N Population Mean “mu” Note: This is for a finite population of size N 6STA 291 Summer 2010 Lecture 4
7
Requires numerical values ◦ Only appropriate for quantitative data ◦ Does not make sense to compute the mean for nominal variables ◦ Can be calculated for ordinal variables, but this does not always make sense Should be careful when using the mean on ordinal variables Example “Weather” (on an ordinal scale) Sun=1, Partly Cloudy=2, Cloudy=3, Rain=4, Thunderstorm=5 Mean (average) weather=2.8 Another example is “GPA = 3.8” is also a mean of observations measured on an ordinal scale 7STA 291 Summer 2010 Lecture 4
8
Center of gravity for the data set Sum of the differences from values above the mean is equal to the sum of the differences from values below the mean ◦ 3+2+2 = 3 + 4 STA 291 Summer 2010 Lecture 48
9
Mean ◦ Sum of observations divided by the number of observations Example ◦ {7, 12, 11, 18} ◦ Mean = 9STA 291 Summer 2010 Lecture 4
10
Highly influenced by outliers ◦ Data points that are far from the rest of the data ◦ Example Monthly income for five people 1,0002,0003,0004,000100,000 Average monthly income = What is the problem with using the average to describe this data set? 10STA 291 Summer 2010 Lecture 4
11
Measurement that falls in the middle of the ordered sample When the sample size n is odd, there is a middle value ◦ It has the ordered index (n+1)/2 Ordered index is where that value falls when the sample is listed from smallest to largest An index of 2 means the second smallest value ◦ Example 1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3 rd smallest observation = 5.7 11STA 291 Summer 2010 Lecture 4
12
When the sample size n is even, average the two middle values ◦ Example 3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2 nd and 3 rd smallest observations = (5+6)/2 = 5.5 12STA 291 Summer 2010 Lecture 4
13
For skewed distributions, the median is often a more appropriate measure of central tendency than the mean The median usually better describes a “typical value” when the sample distribution is highly skewed Example ◦ Monthly income for five people 1,000 2,000 3,000 4,000 100,000 ◦ Median monthly income: Why is the median better to use with this data than the mean? 13STA 291 Summer 2010 Lecture 4
14
14 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured x i = Measurement of the i th unit Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order STA 291 Summer 2010 Lecture 4
15
Example: Highest Degree Completed 15 Highest DegreeFrequencyPercentage Not a high school graduate 38,01221.4 High school only 65,29136.8 Some college, no degree 33,19118.7 Associate, Bachelor, Master, Doctorate, Professional 41,12423.2 Total 177,618100 STA 291 Summer 2010 Lecture 4
16
n = 177,618 (n+1)/2 = 88,809.5 Median = midpoint between the 88809 th smallest and 88810 th smallest observations ◦ Both are in the category “High school only” Mean wouldn’t make sense here since the variable is ordinal Median ◦ Can be used for interval data and for ordinal data ◦ Can not be used for nominal data because the observations can not be ordered on a scale 16STA 291 Summer 2010 Lecture 4
17
Mean ◦ Interval data with an approximately symmetric distribution Median ◦ Interval data ◦ Ordinal data Mean is sensitive to outliers, median is not 17STA 291 Summer 2010 Lecture 4
18
Symmetric distribution ◦ Mean = Median Skewed distribution ◦ Mean lies more toward the direction which the distribution is skewed 18STA 291 Summer 2010 Lecture 4
19
While the median is better than the mean for skewed distributions there is one large disadvantage to using the median ◦ Insensitive to changes within the lower or upper half of the data ◦ Example 1, 2, 3, 4, 5 1, 2, 3, 100, 100 ◦ Sometimes, the mean is more informative even when the distribution is skewed 19STA 291 Summer 2010 Lecture 4
20
Keeneland Sales STA 291 Summer 2010 Lecture 420
21
The deviation of the i th observation x i from the sample mean is the difference between them, ◦ Sum of all deviations is zero ◦ Therefore, we use either the sum of the absolute deviations or the sum of the squared deviations as a measure of variation 21STA 291 Summer 2010 Lecture 4
22
Variance of n observations is the sum of the squared deviations, divided by n-1 22STA 291 Summer 2010 Lecture 4
23
23 ObservationMeanDeviationSquared Deviation 1 3 4 7 10 Sum of the Squared Deviations n-1 Sum of the Squared Deviations / (n-1) STA 291 Summer 2010 Lecture 4
24
About the average of the squared deviations ◦ “average squared distance from the mean” Unit ◦ Square of the unit for the original data Difficult to interpret ◦ Solution Take the square root of the variance, and the unit is the same as for the original data Standard Deviation 24STA 291 Summer 2010 Lecture 4
25
s ≥ 0 ◦ s = 0 only when all observations are the same If data is collected for the whole population instead of a sample, then n-1 is replaced by N s is sensitive to outliers 25STA 291 Summer 2010 Lecture 4
26
Sample ◦ Variance ◦ Standard Deviation Population ◦ Variance ◦ Standard Deviation 26STA 291 Summer 2010 Lecture 4
27
Population mean and population standard deviation are denoted by the Greek letters μ (mu) and σ (sigma) ◦ They are unknown constants that we would like to estimate Sample mean and sample standard deviation are denoted by and s ◦ They are random variables, because their values vary according to the random sample that has been selected 27STA 291 Summer 2010 Lecture 4
28
If the data is approximately symmetric and bell-shaped then ◦ About 68% of the observations are within one standard deviation from the mean ◦ About 95% of the observations are within two standard deviations from the mean ◦ About 99.7% of the observations are within three standard deviations from the mean 28STA 291 Summer 2010 Lecture 4
29
Scores on a standardized test are scaled so they have a bell-shaped distribution with a mean of 1000 and standard deviation of 150 ◦ About 68% of the scores are between ◦ About 95% of the scores are between ◦ If you have a score above 1300, you are in the top % 29STA 291 Summer 2010 Lecture 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.