STA 291 Summer 2008 Lecture 4 Dustin Lueker.

Slides:



Advertisements
Similar presentations
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Advertisements

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Measures of Dispersion
LECTURE 5 5 FEBRUARY 2009 STA291 Fall Itinerary 2.3 Graphical Techniques for Interval Data (mostly review) 2.4 Describing the Relationship Between.
Descriptive Statistics
Intro to Descriptive Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
Measures of Central Tendency Mode Median Mean. The Mode the value or property that occurs most frequently in the data.
Programming in R Describing Univariate and Multivariate data.
Summarizing Scores With Measures of Central Tendency
LECTURE 6 TUESDAY, 10 FEBRUARY 2008 STA291. Administrative Suggested problems from the textbook (not graded): 4.2, 4.3, and 4.4 Check CengageNow for second.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
LECTURE 5 10 SEPTEMBER 2009 STA291 Fall Itinerary Graphical Techniques for Interval Data (mostly review) Describing the Relationship Between Two.
Lecture 3 Dustin Lueker. 2  Suppose the population can be divided into separate, non-overlapping groups (“strata”) according to some criterion ◦ Select.
Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.
LECTURE 6 15 SEPTEMBER 2009 STA291 Fall Review: Graphical/Tabular Descriptive Statistics Summarize data Condense the information from the dataset.
Lecture 3 Describing Data Using Numerical Measures.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
THURSDAY, 24 SEPTEMBER 2009 STA291. Announcement Exam 1: September 30 th at 5pm to 7pm. Location MEH, Memorial Auditoriam. The make-up will be at 7:30pm.
Lecture 2 Dustin Lueker.  Center of the data ◦ Mean ◦ Median ◦ Mode  Dispersion of the data  Sometimes referred to as spread ◦ Variance, Standard deviation.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
1 a value at the center or middle of a data set Measures of Center.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics(Summary and Variability measures)
TUESDAY, 22 SEPTEMBER 2009 STA291. Exam 1: September 30 th at 5pm to 7pm. Location MEH, Memorial Auditoriam. The make-up will be at 7:30pm to 9:30pm at.
CHAPTER 1 Exploring Data
Descriptive Statistics ( )
Methods for Describing Sets of Data
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
PA330 FEB 28, 2000.
CHAPTER 2: Describing Distributions with Numbers
Summarizing Scores With Measures of Central Tendency
Descriptive Statistics: Presenting and Describing Data
Statistical Reasoning
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
Descriptive Statistics
DAY 3 Sections 1.2 and 1.3.
STA 291 Spring 2008 Lecture 3 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Describing Quantitative Data with Numbers
Measure of Central Tendency
Numerical Descriptive Measures
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Numerical Descriptive Measures
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
Business and Economics 7th Edition
Numerical Descriptive Measures
Presentation transcript:

STA 291 Summer 2008 Lecture 4 Dustin Lueker

Population Distribution The population distribution for a continuous variable is usually represented by a smooth curve Like a histogram that gets finer and finer Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate Symmetric distributions Bell-shaped U-shaped Uniform Not symmetric distributions: Left-skewed Right-skewed Skewed STA 291 Summer 2008 Lecture 4

Good Graphics Present large data sets concisely and coherently Can replace a thousand words and still be clearly understood and comprehended Encourage the viewer to compare two or more variables Do not replace substance by form Do not distort what the data reveal STA 291 Summer 2008 Lecture 4 3

Bad Graphics Don’t have a scale on the axis Have a misleading caption Distort by using absolute values where relative/proportional values are more appropriate Distort by stretching/shrinking the vertical or horizontal axis Use bar charts with bars of unequal width STA 291 Summer 2008 Lecture 4 4

Summarizing Data Numerically Center of the data Mean Median Mode Dispersion of the data Sometimes referred to as spread Variance, Standard deviation Interquartile range Range STA 291 Summer 2008 Lecture 4

Measures of Central Tendency Mean Arithmetic average Median Midpoint of the observations when they are arranged in order Smallest to largest Mode Most frequently occurring value STA 291 Summer 2008 Lecture 4

Sample Mean Sample size n Observations x1, x2, …, xn Sample Mean “x-bar” STA 291 Summer 2008 Lecture 4

Population Mean Population size N Observations x1 , x2 ,…, xN Population Mean “mu” Note: This is for a finite population of size N STA 291 Summer 2008 Lecture 4

Mean Requires numerical values Only appropriate for quantitative data Does not make sense to compute the mean for nominal variables Can be calculated for ordinal variables, but this does not always make sense Should be careful when using the mean on ordinal variables Example “Weather” (on an ordinal scale) Sun=1, Partly Cloudy=2, Cloudy=3, Rain=4, Thunderstorm=5 Mean (average) weather=2.8 Another example is “GPA = 3.8” is also a mean of observations measured on an ordinal scale STA 291 Summer 2008 Lecture 4

Mean (Average) Mean Example Sum of observations divided by the number of observations Example {7, 12, 11, 18} Mean = STA 291 Summer 2008 Lecture 4

Mean Highly influenced by outliers Data points that are far from the rest of the data Not representative of a typical observation if the distribution of the data is highly skewed Example Monthly income for five people 1,000 2,000 3,000 4,000 100,000 Average monthly income = Not representative of a typical observation STA 291 Summer 2008 Lecture 4

Median Measurement that falls in the middle of the ordered sample When the sample size n is odd, there is a middle value It has the ordered index (n+1)/2 Ordered index is where that value falls when the sample is listed from smallest to largest An index of 2 means the second smallest value Example 1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3rd smallest observation = 5.7 STA 291 Summer 2008 Lecture 4

Median When the sample size n is even, average the two middle values Example 3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2nd and 3rd smallest observations = (5+6)/2 = 5.5 STA 291 Summer 2008 Lecture 4

Mean and Median For skewed distributions, the median is often a more appropriate measure of central tendency than the mean The median usually better describes a “typical value” when the sample distribution is highly skewed Example Monthly income for five people 1,000 2,000 3,000 4,000 100,000 Median monthly income: Does this better describe a “typical value” in the data set than the mean of 22,000? STA 291 Summer 2008 Lecture 4

Measures of Central Tendency Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi = Measurement of the ith unit Mode - Most frequent value. STA 291 Summer 2008 Lecture 4

Mean and Median Trimmed mean is a compromise between the median and mean Calculating the trimmed mean Order the date from smallest to largest Delete a selected number of values from each end of the ordered list Find the mean of the remaining values The trimming percentage is the percentage of values that have been deleted from each end of the ordered list STA 291 Summer 2008 Lecture 4

Median for Grouped or Ordinal Data Example: Highest Degree Completed Highest Degree Frequency Percentage Not a high school graduate 38,012 21.4 High school only 65,291 36.8 Some college, no degree 33,191 18.7 Associate, Bachelor, Master, Doctorate, Professional 41,124 23.2 Total 177,618 100 STA 291 Summer 2008 Lecture 4

Calculate the Median n = 177,618 (n+1)/2 = 88,809.5 Median = midpoint between the 88809th smallest and 88810th smallest observations Both are in the category “High school only” Mean wouldn’t make sense here since the variable is only ordinal Median Can be used for interval data and for ordinal data Can not be used for nominal data because the observations can not be ordered on a scale STA 291 Summer 2008 Lecture 4

Mean vs. Median Mean Median Interval data with an approximately symmetric distribution Median Interval data Ordinal data Mean is sensitive to outliers, median is not STA 291 Summer 2008 Lecture 4

Mean vs. Median Observations Median Mean 1, 2, 3, 4, 5 3 1, 2, 3, 4, 100 3, 3, 3, 3, 3 1, 2, 3, 100, 100 STA 291 Summer 2008 Lecture 4

Mean vs. Median Symmetric distribution Skewed distribution Mean = Median Skewed distribution Mean lies more towards the direction which the distribution is skewed STA 291 Summer 2008 Lecture 4

Median Disadvantage Insensitive to changes within the lower or upper half of the data Example 1, 2, 3, 4, 5 1, 2, 3, 100, 100 Sometimes, the mean is more informative even when the distribution is skewed STA 291 Summer 2008 Lecture 4