Describing Data: Two Variables

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Describing Quantitative Variables
DESCRIBING DISTRIBUTION NUMERICALLY
C. D. Toliver AP Statistics
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Describing Data: One Variable
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Describing Data: One Quantitative Variable
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION
Basic Business Statistics 10th Edition
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Numerical Descriptive Measures
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
STAT 250 Dr. Kari Lock Morgan
Numerical Descriptive Techniques
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
1.3: Describing Quantitative Data with Numbers
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Chapter 3 Descriptive Statistics II: Additional Descriptive Measures and Data Displays.
Chapter 8 Making Sense of Data in Six Sigma and Lean
Chapter 3 Looking at Data: Distributions Chapter Three
Basic Business Statistics Chapter 3: Numerical Descriptive Measures Assoc. Prof. Dr. Mustafa Yüzükırmızı.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Chapter 5 Describing Distributions Numerically.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Activity: Car Correlations Consumer Reports’ data from a sample of n=109 car models We’ll explore the following associations: (a) Weight vs. City MPG (b)
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Chapter 2 Describing Data: Numerical
Descriptive Statistics ( )
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Describing Data: Two Variables
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Box and Whisker Plots Algebra 2.
One Quantitative Variable: Measures of Spread
Means & Medians.
Exploratory Data Analysis
Chapter 1: Exploring Data
Measures of Position Section 3.3.
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

Describing Data: Two Variables STAT 250 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.4, 2.5 One quantitative variable (2.4) One quantitative and one categorical (2.4) Two quantitative (2.5)

z-score Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT: μ = 21, σ = 5 SAT: μ = 1500, σ = 325 Assume ACT and SAT scores have approximately bell-shaped distributions ACT score of 28 SAT score of 2100 I don’t know

Honeybee Waggle Dances https://www.youtube.com/watch?v=-7ijI-g4jHg

Honeybee Waggle Dance Honeybee scouts investigate new home or food source options; the scouts communicate the information to the hive with a “waggle dance” Scientists took bees to an island with only two possible options for nesting: one of very high quality and one of low quality. They recorded Quality of nesting site Distance to nesting site Number of waggle dance circuits performed Duration of waggle dance B Seeley, T., Honeybee Democracy, Princeton University Press, Princeton, NJ, 2010, p. 128

Questions of the Day How many circuits of the waggle dance do honey bees do? How is this related to quality of a nesting site? How is duration of the dance related to distance to a nesting site?

Other Measures of Location Maximum = largest data value Minimum = smallest data value Quartiles: Q1 = median of the values below m. Q3 = median of the values above m.

Five Number Summary Five Number Summary: Min Max Q1 Q3 m 25% Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

Five Number Summary The distribution of number of circuits is Symmetric Right-skewed Left-skewed Impossible to tell

The Pth percentile is the value which is greater than P% of the data We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better We could also have used percentiles: ACT score of 28: 91st percentile SAT score of 2100: 97th percentile

Five Number Summary Five Number Summary: Min Max Q1 Q3 m 25% 0th percentile 25th percentile 50th percentile 75th percentile 100th percentile

Measures of Spread Range = Max – Min Interquartile Range (IQR) = Q3 – Q1 Is the range resistant to outliers? Yes No Is the IQR resistant to outliers?

Comparing Statistics Measures of Center: Measures of Spread: Mean (not resistant) Median (resistant) Measures of Spread: Standard deviation (not resistant) IQR (resistant) Range (not resistant) Most often, we use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information

Boxplot Middle 50% of data Median *For boxplots, outliers are defined as any point more than 1.5 IQRs beyond the quartiles (although you don’t have to know that) Outlier Outlier Lines (“whiskers”) extend from each quartile to the most extreme value that is not an outlier Middle 50% of data Q3 Median Q1 Minitab: Graph -> Boxplot -> One Y -> Simple

Boxplot This boxplot shows a distribution that is Symmetric Left-skewed Right-skewed

One Quantitative and One Categorical How is number of waggle circuits related to the quality of the nesting site? Two variables One quantitative (number of circuits) One categorical (quality – low or high) Can do anything for one quantitative variable, broken down by categorical groups

Side-by-Side Boxplots Minitab: Graph -> Boxplot -> One Y -> With Groups

Stacked Dotplots Minitab: Graph -> Dotplot -> One Y -> With Groups

Overlaid Histograms Minitab: Graph -> Histogram -> With Groups

Quantitative Statistics by a Categorical Variable Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics -> By variables

Difference in Means Often, when comparing a quantitative variable across two categories, we compute the difference in means Honeybees perform 60.5 circuits more, on average, for the high quality site as opposed to the low quality site.

Association? Does there appear to be an association between number of waggle circuits and quality of potential nesting site? Yes No

Summary: One Quantitative and One Categorical Summary Statistics Any summary statistics for quantitative variables, broken down by groups Difference in means Visualization Side-by-side graphs

Two Quantitative Variables How is duration of the dance related to distance to a nesting site? Two quantitative variables Summary Statistics: correlation Visualization: scatterplot

Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. Minitab: Graph -> Scatterplot -> Simple

Direction of Association A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables Sample correlation: r Population correlation: ρ (“rho”) r = 0.994 for duration of dance and distance to site Minitab: Stat -> Basic Statistics -> Correlation

Correlation -1 ≤ r ≤ 1 The sign indicates the direction of association positive association: r > 0 negative association: r < 0 no linear association: r ≠ 0 The closer r is to ±1, the stronger the linear association r has no units and does not depend on the units of measurement The correlation between X and Y is the same as the correlation between Y and X

Correlation Guessing Game http://www.istics.net/Correlations/ Enter PennState for the group ID. Highest scorer in the class by the first exam gets one extra credit point on Exam 1!

Correlation NFL Teams r = 0.43

Correlation Cautions Correlation can be heavily affected by outliers. Always plot your data!

Testosterone Levels and Time What is the correlation between testosterone levels and hour of the day? Positive Negative About 0 Are testosterone level and hour of the day associated? Yes No

Correlation Cautions Correlation can be heavily affected by outliers. Always plot your data! r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data!

TVs and Life Expectancy

Correlation Cautions Correlation can be heavily affected by outliers. Always plot your data! r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! Correlation does not imply causation!

Summary: Two Quantitative Variables Summary Statistics: correlation Visualization: scatterplot

To Do Read Sections 2.4 and 2.5 Do HW 2.2, 2.3, 2.4, 2.5 (due Friday, 9/18)