Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Numerically Summarizing Data
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Numerical Descriptive Techniques
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Describing distributions with numbers
Measure of Central Tendency Measures of central tendency – used to organize and summarize data so that you can understand a set of data. There are three.
Chapter 5 Describing Distributions Numerically.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
AP Statistics 5 Number Summary and Boxplots. Measures of Center and Distributions For a symmetrical distribution, the mean, median and the mode are the.
Describing Data: Two Variables
Methods for Describing Sets of Data
Statistics 200 Lecture #4 Thursday, September 1, 2016
CHAPTER 1 Exploring Data
Bellwork 1. Order the test scores from least to greatest: 89, 93, 79, 87, 91, 88, Find the median of the test scores. 79, 87, 88, 89, 91, 92, 93.
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Bell Ringer Create a stem-and-leaf display using the Super Bowl data from yesterday’s example
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Topic 5: Exploring Quantitative data
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
CHAPTER 1 Exploring Data
1.3 Describing Quantitative Data with Numbers
Means & Medians.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Honors Statistics Review Chapters 4 - 5
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Probability and Statistics
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6 Statistics 200 Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6 Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness

Motivating example A group of students was randomly assigned to one of two classes. One class was taught by teacher A and the other by teacher B. At the end of the semester, all students took the same exam. Investigate whether there is any difference in exam scores between the two teachers. 53 72 35 47 64 66 13 6 35 42 45 59 58 69 53 67 57 53 62 95 74 2 61 84 88 65 69 76 53 71 71 87 98 83 81 73 75

Summarizing Quantitative Variables The distribution of a quantitative variable is the overall pattern of how often the possible values occur. Four key aspects of the distribution are: Location: center, average Spread: variability Shape: symmetric, bell, skew outliers Let’s begin with the shape, which is best seen with a visual summary

Visual summaries for quantitative variables Histogram Boxplot A chart of the data that shows how many observations are in each equally spaced interval. Usually use 6-15 intervals Can use frequency or relative frequency

Histograms Teacher A Scores Teacher B Scores

Outlier An individual value that is unusual compared to the bulk of the other values. Outlier!

Example When considering study hours/week, what percent of the students spend: at most 3 hours? at least 11 hours? between 5 and 9 hours?

Shapes of distributions Symmetric the shape of the data is similar on both sides of the center. Bell-shaped is a special case of symmetric Skewed: Values are more spread out on one side than the other. Left-skewed: lower values more spread out than higher values Right-skewed: higher values more spread out than lower values.

Shape Examples: Symmetric Question: What is the fastest you have ever driven a car? Symmetric

Shape Examples: Right-skewed Left-skewed Question: How many coins are you carrying? Right-skewed Left-skewed Question: What is your grade point average?

Breakdown of Descriptive Statistical Methods: Quantitative Data graphs numbers: statistics Measures of center did one: histogram do now

Quantitative Data: Measures of Center Mean: ___________ of all numbers symbol for sample mean: Value is sensitive to ______________ Median: middle observation of ___________ data value is resistant to ________________ Mode: observation that occurs most frequently don’t really use in this course Average Outliers ordered outliers

Example: Center and outliers Sample 1 (n = 5) Sample 2 (n = 6) Sample 1 4 8 12 7 1 4 8 12 30 Mean (1+4+ 8+12+7)/5 = 32/5 = Mean = ____ (1+4+ 8+12+ 30)/5 = 55/5 = Ordered Data/ Median Median = ____ 4 8 12 30 Median = _____

Sensitive vs. Resistant statistics Calculated using ALL observations Affected by skewness and / or unusual observations. Example: Mean Sensitive Statistic Resistant Statistic Calculated using only some observations Not affected much by outliers Example: Median

Examples: mean = 94.8 mph median = 95 mph mean = 17.3 coins median = 9 coins

Work together question: Which is most likely true when considering salaries($) in a company that employs: 1. 20 factory workers and 2 very highly paid executives: one would find with the salaries that the: mean > median mean < median mean ≈ median 2. 2 factory workers and 20 very highly paid executives: one would find with the salaries that the:

A percentile tells us how much of the data is below a specific value. Percentiles What is the value (in studyhrs/week) for the: 5th percentile? 90th percentile?

Percentiles of Interest 25th percentile: ___________Quartile (QL) ___________ Quartile (Q1) Lower First 50th percentile: Second Quartile (Q2) ________ Median 75th percentile: __________ Quartile (QU) __________ Quartile (Q3) Upper Third

We use quartiles for the… Five Number Summary smallest number (min) lower or first quartile median upper or third largest (max) Numerical method for summarizing quantitative data.

Example: 5-Number summary Descriptive Statistics: Fastest_Speed Variable N Minimum Q1 Median Q3 Maximum Fastest_Speed 20 45 90 95 100 135 Fill-in the five number summary 25th 50th 75th Min Q1 Median Q3 Max

Another look: 5-number summary The 5-number summary divides your data into 4 quarters:

Approximately what percent of the fastest speeds: Min Q1 Median Q3 Max 45 90 95 100 135 Approximately what percent of the fastest speeds: are at least 100 mph? are at most 90 mph?

Approximately what percent of the fastest speeds lie: Min Q1 Median Q3 Max 45 90 95 100 135 Approximately what percent of the fastest speeds lie: between 90 and 100 mph? (at most 95) or (at least 100?) 45 90 95 100 135

Visual summaries for quantitative variables Histogram Boxplot A chart of the data that shows how many observations are in each equally spaced interval. Usually use 6-15 intervals Can use frequency or relative frequency Visualization of the 5-number summary Shows Q1, Median, Q3 as lines around and through a middle box. Identifies outliers.

Boxplots: Examples Max 135 mph Q3 100 mph Median 95 mph Q1 90 mph Min 80 coins Q3 25 coins Median 9 coins Q1 5 coins Min 0 coins

Boxplot shows same shape as histogram Symmetric

Boxplot shows same shape as histogram Right-skewed

Boxplot shows same shape as histogram Left-skewed

Link measures of center to shape

Another example: Parties per month Outliers!

Parties per month, without the outliers

Median: 50% of students surveyed partied less than 4.5 times per month. Right-skewed  mean > median

Consider the variables Party and Year Response How many parties do you attend in a month? What year are you in school? Explanatory

Consider the variables Party and Year How many parties do you attend in a month? What year are you in school? Quantitative Categorical (ordinal)

Explore relationship with boxplot

Which year has highest median? Largest box? Most outliers? Do we observe a trend?

Review: If you understood today’s lecture, you should be able to solve Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness