STAT 250 Dr. Kari Lock Morgan

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Describing Quantitative Variables
DESCRIBING DISTRIBUTION NUMERICALLY
C. D. Toliver AP Statistics
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Describing Data: One Variable
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Describing Data: One Quantitative Variable
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Statistics: Use Graphs to Show Data Box Plots.
Quartiles and the Interquartile Range.  Comparing shape, center, and spreads of two or more distributions  Distribution has too many values for a stem.
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
Numerical Descriptive Techniques
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
STA291 Statistical Methods Lecture 9. About those boxplots … Often used to compare samples (& make inferences about populations) Example: Barry Bonds’
CHAPTER 7: Exploring Data: Part I Review
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Chapter 3 Descriptive Statistics II: Additional Descriptive Measures and Data Displays.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Chapter 3 Looking at Data: Distributions Chapter Three
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
Chapter 5 Describing Distributions Numerically.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Describing Data: Two Variables
STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Activity: Car Correlations Consumer Reports’ data from a sample of n=109 car models We’ll explore the following associations: (a) Weight vs. City MPG (b)
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Describing Data: Two Variables
Statistics 200 Lecture #4 Thursday, September 1, 2016
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Box and Whisker Plots Algebra 2.
One Quantitative Variable: Measures of Spread
Organizing Data AP Stats Chapter 1.
Exploratory Data Analysis
Chapter 1: Exploring Data
Summary (Week 1) Categorical vs. Quantitative Variables
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

STAT 250 Dr. Kari Lock Morgan Describing Data II SECTIONS 2.3, 2.4, 2.5 One quantitative variable (2.3, 2.4) One quantitative by one categorical (2.4) Two quantitative (2.5)

The 95% Rule The standard deviation for hours of sleep per night is closest to ½ 1 2 4 I have no idea

z-score The z-score for a data value, x, is 𝑧= 𝑥− 𝑥 𝑠 for sample data, and 𝑧= 𝑥−𝜇 𝜎 for population data. z-score measures the number of standard deviations away from the mean

z-score A z-score puts values on a common scale A z-score is the number of standard deviations a value falls from the mean For symmetric, bell-shaped distributions, 95% of all z-scores fall between -2 and 2, so z-scores beyond these values can be considered extreme

z-score Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT:  = 21,  = 5 SAT:  = 1500,  = 325 Assume ACT and SAT scores have approximately bell-shaped distributions ACT score of 28 SAT score of 2100 I don’t know

Other Measures of Location Maximum = largest data value Minimum = smallest data value Quartiles: Q1 = median of the values below m. Q3 = median of the values above m.

Five Number Summary Five Number Summary: Min Max Q1 Q3 m 25% Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

Five Number Summary > summary(study_hours) Min. 1st Qu. Median 3rd Qu. Max. 2.00 10.00 15.00 20.00 69.00 The distribution of number of hours spent studying each week is Symmetric Right-skewed Left-skewed Impossible to tell

The Pth percentile is the value which is greater than P% of the data We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better We could also have used percentiles: ACT score of 28: 91st percentile SAT score of 2100: 97th percentile

Five Number Summary Five Number Summary: Min Max Q1 Q3 m 25% 0th percentile 25th percentile 50th percentile 75th percentile 100th percentile

Measures of Spread Range = Max – Min Interquartile Range (IQR) = Q3 – Q1 Is the range resistant to outliers? Yes No Is the IQR resistant to outliers?

Comparing Statistics Measures of Center: Measures of Spread: Mean (not resistant) Median (resistant) Measures of Spread: Standard deviation (not resistant) IQR (resistant) Range (not resistant) Most often, we use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information

Boxplot Lines (“whiskers”) extend from each quartile to the most extreme value that is not an outlier Q3 Middle 50% of data Median Q1 Minitab: Graph -> Boxplot -> One Y -> Simple

Boxplot Outlier *For boxplots, outliers are defined as any point more than 1.5 IQRs beyond the quartiles (although you don’t have to know that)

Boxplot This boxplot shows a distribution that is Symmetric Left-skewed Right-skewed

Summary: One Quantitative Variable Summary Statistics Center: mean, median Spread: standard deviation, range, IQR 5 number summary Percentiles Visualization Dotplot Histogram Boxplot Other concepts Shape: symmetric, skewed, bell-shaped Outliers, resistance z-scores

Quantitative and Categorical Relationships Interested in a quantitative variable broken down by categorical groups

Side-by-Side Boxplots Minitab: Graph -> Boxplot -> One Y -> With Groups

Stacked Dotplots Minitab: Graph -> Dotplot -> One Y -> With Groups

Overlaid Histograms Minitab: Graph -> Histogram -> With Groups

Quantitative Statistics by a Categorical Variable Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics -> By variables

Difference in Means Often, when comparing a quantitative variable across two categories, we compute the difference in means 𝑥 𝐹 − 𝑥 𝑀 =25.586 −24.466=1.12

Summary: One Quantitative and One Categorical Summary Statistics Any summary statistics for quantitative variables, broken down by groups Difference in means Visualization Side-by-side graphs

Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. Minitab: Graph -> Scatterplot -> Simple

Direction of Association A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

Exploring Associations In the states data, explore the associations between obesity rate and the following variables: PhysicalActivity: % doing physical activity in the past month Smokers: % who smoke Population: State population (in millions) HouseholdIncome: Mean household income (in $) McCainVote: % voting for McCain in 2008 election IQ: Mean IQ score Make your initial guesses…

Associations Minitab: Graph -> Scatterplot -> Simple -> Multiple Graphs -> In separate panels of the same graph

Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation:  (“rho”) Minitab: Stat -> Basic Statistics -> Correlation

What are the properties of correlation? Correlations What are the properties of correlation?

Correlation

Correlation Guessing Game http://www.istics.net/Correlations/ Enter PennState for the group ID.

Correlation NFL Teams r = 0.43

Testosterone Levels and Time What is the correlation between testosterone levels and hour of the day? Positive Negative About 0 Are testosterone level and hour of the day associated? Yes No

TVs and Life Expectancy

Correlation Cautions Correlation can be heavily affected by outliers. Always plot your data! r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! Correlation does not imply causation!

Summary: Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

Lots of Scatterplots Minitab: Graph -> Matrix Plot

3 Variables: Adding a Categorical Variable to a Scatterplot Minitab: Graph -> Scatterplot -> With Groups

3 Variables: Adding a Quantitative Variable to a Scatterplot Minitab: Graph -> Bubble Plot -> Simple

Four Variables!: Adding a categorical and a quantitative variable to a scatterplot Minitab: Graph -> Bubble Plot -> With Groups

Scatterplot with Histograms/Boxplots/Dotplots Minitab: Graph -> Marginal Plot

To Do Read Sections 2.4 and 2.5 Do Homework 2.2, 2.3, 2.4, 2.5 (due Friday, 2/6)