Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6 Statistics 200 Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6 Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness
Motivating example A group of students was randomly assigned to one of two classes. One class was taught by teacher A and the other by teacher B. At the end of the semester, all students took the same exam. Investigate whether there is any difference in exam scores between the two teachers. 53 72 35 47 64 66 13 6 35 42 45 59 58 69 53 67 57 53 62 95 74 2 61 84 88 65 69 76 53 71 71 87 98 83 81 73 75
Summarizing Quantitative Variables The distribution of a quantitative variable is the overall pattern of how often the possible values occur. Four key aspects of the distribution are: Location: center, average Spread: variability Shape: symmetric, bell, skew outliers Let’s begin with the shape, which is best seen with a visual summary
Visual summaries for quantitative variables Histogram Boxplot A chart of the data that shows how many observations are in each equally spaced interval. Usually use 6-15 intervals Can use frequency or relative frequency
Histograms Teacher A Scores Teacher B Scores
Outlier An individual value that is unusual compared to the bulk of the other values. Outlier!
Example When considering study hours/week, what percent of the students spend: at most 3 hours? at least 11 hours? between 5 and 9 hours?
Shapes of distributions Symmetric the shape of the data is similar on both sides of the center. Bell-shaped is a special case of symmetric Skewed: Values are more spread out on one side than the other. Left-skewed: lower values more spread out than higher values Right-skewed: higher values more spread out than lower values.
Shape Examples: Symmetric Question: What is the fastest you have ever driven a car? Symmetric
Shape Examples: Right-skewed Left-skewed Question: How many coins are you carrying? Right-skewed Left-skewed Question: What is your grade point average?
Breakdown of Descriptive Statistical Methods: Quantitative Data graphs numbers: statistics Measures of center did one: histogram do now
Quantitative Data: Measures of Center Mean: ___________ of all numbers symbol for sample mean: Value is sensitive to ______________ Median: middle observation of ___________ data value is resistant to ________________ Mode: observation that occurs most frequently don’t really use in this course Average Outliers ordered outliers
Example: Center and outliers Sample 1 (n = 5) Sample 2 (n = 6) Sample 1 4 8 12 7 1 4 8 12 30 Mean (1+4+ 8+12+7)/5 = 32/5 = Mean = ____ (1+4+ 8+12+ 30)/5 = 55/5 = Ordered Data/ Median Median = ____ 4 8 12 30 Median = _____
Sensitive vs. Resistant statistics Calculated using ALL observations Affected by skewness and / or unusual observations. Example: Mean Sensitive Statistic Resistant Statistic Calculated using only some observations Not affected much by outliers Example: Median
Examples: mean = 94.8 mph median = 95 mph mean = 17.3 coins median = 9 coins
Work together question: Which is most likely true when considering salaries($) in a company that employs: 1. 20 factory workers and 2 very highly paid executives: one would find with the salaries that the: mean > median mean < median mean ≈ median 2. 2 factory workers and 20 very highly paid executives: one would find with the salaries that the:
A percentile tells us how much of the data is below a specific value. Percentiles What is the value (in studyhrs/week) for the: 5th percentile? 90th percentile?
Percentiles of Interest 25th percentile: ___________Quartile (QL) ___________ Quartile (Q1) Lower First 50th percentile: Second Quartile (Q2) ________ Median 75th percentile: __________ Quartile (QU) __________ Quartile (Q3) Upper Third
We use quartiles for the… Five Number Summary smallest number (min) lower or first quartile median upper or third largest (max) Numerical method for summarizing quantitative data.
Example: 5-Number summary Descriptive Statistics: Fastest_Speed Variable N Minimum Q1 Median Q3 Maximum Fastest_Speed 20 45 90 95 100 135 Fill-in the five number summary 25th 50th 75th Min Q1 Median Q3 Max
Another look: 5-number summary The 5-number summary divides your data into 4 quarters:
Approximately what percent of the fastest speeds: Min Q1 Median Q3 Max 45 90 95 100 135 Approximately what percent of the fastest speeds: are at least 100 mph? are at most 90 mph?
Approximately what percent of the fastest speeds lie: Min Q1 Median Q3 Max 45 90 95 100 135 Approximately what percent of the fastest speeds lie: between 90 and 100 mph? (at most 95) or (at least 100?) 45 90 95 100 135
Visual summaries for quantitative variables Histogram Boxplot A chart of the data that shows how many observations are in each equally spaced interval. Usually use 6-15 intervals Can use frequency or relative frequency Visualization of the 5-number summary Shows Q1, Median, Q3 as lines around and through a middle box. Identifies outliers.
Boxplots: Examples Max 135 mph Q3 100 mph Median 95 mph Q1 90 mph Min 80 coins Q3 25 coins Median 9 coins Q1 5 coins Min 0 coins
Boxplot shows same shape as histogram Symmetric
Boxplot shows same shape as histogram Right-skewed
Boxplot shows same shape as histogram Left-skewed
Link measures of center to shape
Another example: Parties per month Outliers!
Parties per month, without the outliers
Median: 50% of students surveyed partied less than 4.5 times per month. Right-skewed mean > median
Consider the variables Party and Year Response How many parties do you attend in a month? What year are you in school? Explanatory
Consider the variables Party and Year How many parties do you attend in a month? What year are you in school? Quantitative Categorical (ordinal)
Explore relationship with boxplot
Which year has highest median? Largest box? Most outliers? Do we observe a trend?
Review: If you understood today’s lecture, you should be able to solve Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness