STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables.

Slides:



Advertisements
Similar presentations
C. D. Toliver AP Statistics
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Describing Data: One Variable
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Describing Data: One Quantitative Variable
MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION
1.2: Describing Distributions
CHAPTER 2: Describing Distributions with Numbers
Quartiles and the Interquartile Range.  Comparing shape, center, and spreads of two or more distributions  Distribution has too many values for a stem.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
Objectives 1.2 Describing distributions with numbers
STAT 250 Dr. Kari Lock Morgan
Numerical Descriptive Techniques
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
1.3: Describing Quantitative Data with Numbers
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Describing distributions with numbers
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Chapter 3 Looking at Data: Distributions Chapter Three
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
1 Chapter 2 Bivariate Data A set of data that contains information on two variables. Multivariate A set of data that contains information on more than.
Describing Data: Two Variables
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Chapter 6: Interpreting the Measures of Variability.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Describing Data: Two Variables
Chapter 1: Exploring Data
Statistics 200 Lecture #4 Thursday, September 1, 2016
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Box and Whisker Plots Algebra 2.
DAY 3 Sections 1.2 and 1.3.
One Quantitative Variable: Measures of Spread
Organizing Data AP Stats Chapter 1.
1.3 Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables Section 2.3, 2.4, 2.5 Professor Kari Lock Morgan Duke University

Clicker Registration To register your clicker, just press the letter that appears next to your name, then press the second letter that appears next to your name

What are The Odds That Stats Would Be This Popular? - New York Times, 1/26/12 There are billions of bytes generated daily, not just from the Internet but also from sciences like genetics and astronomy. Companies like Google and Facebook, as well as product marketers, risk analysts, spies, natural philosophers and gamblers are all scouring the info, desperate to find a new angle on what makes us and the world tick. … What no one has are enough people to figure out the valuable patterns that lie inside the data. …

m=$1,250,000 x=$2,210,000 Mean is “pulled” in the direction of skewness Measures of Center

Standard Deviation The sample standard deviation, s, measures the spread of a distribution. The larger s is, the more spread out the distribution is Standard deviation is always ≥ 0. R: sd()

Standard Deviation Both of these distributions are bell-shaped

The 95% Rule If a distribution is symmetric and bell-shaped, then approximately 95% of the data values will lie within 2 standard deviations of the mean

The 95% Rule The standard deviation for hours of sleep per night is closest to a)½ b)1 c)2 d)4 e)I have no idea

z-score A z-score is unit-free measure of extremity of a data point. It tells us how many standard deviations away from the mean a value is Values farther from 0 are more extreme 95% of all z-scores fall between -2 and 2

z-score Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT: mean = 21, sd = 5 SAT: mean = 1500, sd = 325 Assume ACT scores and SAT scores have approximately symmetric and bell-shaped distributions (a) ACT score of 28 (b) SAT score of 2100 (c) I don’t know

Other Measures of Location Maximum = largest data value Minimum = smallest data value Quartiles: Q 1 = median of the values below m. Q 3 = median of the values above m.

Five Number Summary Five Number Summary: MinMaxQ1Q1 Q3Q3 m  25%  R: summary()

Percentile The P th percentile is the value of a quantitative variable which is greater than P percent of the data We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better We could also have used percentiles: – ACT score of 28: 91 st percentile – SAT score of 2100: 97 th percentile

Five Number Summary Five Number Summary: MinMaxQ1Q1 Q3Q3 m  25%  0 th percentile 100 th percentile 50 th percentile 75 th percentile 25 th percentile

Five Number Summary The distribution of number of hours you spend studying each week is (a) Symmetric (b) Right-skewed (c) Left-skewed (d) Impossible to tell > summary(study_hours) Min. 1st Qu. Median 3rd Qu. Max

Measures of Spread Range = Max – Min Interquartile Range (IQR) = Q 3 – Q 1 Is the range resistant to outliers? a)Yes b)No Is the IQR resistant to outliers? a)Yes b)No

Outliers Outliers can be informally identified by looking at a plot, but one rule of thumb for identifying outliers is data values more than 1.5 IQRs beyond the quartiles A data value is an outlier if it is Smaller than Q 1 – 1.5(IQR) or Larger than Q (IQR)

Boxplot Median Q1Q1 Q3Q3 Lines (“whiskers”) extend from each quartile to the most extreme value that is not an outlier Outliers R: boxplot(study_hours, ylab=“Hours spent studying”)

Boxplot Which boxplot goes with the histogram of waiting times for the bus? (a) (b)(c)

Summary: One Quantitative Variable Summary Statistics – Center: mean, median – Spread: standard deviation, range, IQR – Percentiles – 5 number summary Visualization – Dotplot – Histogram – Boxplot Other concepts – Shape: symmetric, skewed, bell-shaped – Outliers, resistance – z-scores

Quantitative and Categorical Relationships Boxplots are particularly useful for comparing distributions of a quantitative variable across different levels of a categorical variable

Side-by-Side Boxplots boxplot(gpa~parent_degree, ylab="GPA", xlab="Parents' Highest Degree") Do students whose parents had more of an education have higher GPAs?

Side-by-Side Boxplots Does GPA differ by major?

Side-by-Side Boxplots Do students who’ve had AP statistics do better in STAT 101? NO!

Side-by-Side Boxplots

Quantitative Statistics by a Categorical Variable Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable Mean hours per week spent studying by major:

Summary: One Quantitative and One Categorical Summary Statistics – Any summary statistics for quantitative variables, broken down by each level of the categorical variable Visualization – Side-by-side boxplots

Scatterplot A scatterplot is a graph of the relationship between two quantitative variables. Each dot represents one case. R: plot(study_hours, gpa)

Direction of Association A positive association means that values of one variable tend to be higher when values of the other variable are higher A negative association means that values of one variable tend to be lower when values of the other variable are higher Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

Cars Data - Handout Quantitative Variables: – Weight (pounds) – City MPG – Fuel capacity (gallons) – Page number (in Consumer Reports) – Time to go ¼ mile (in seconds) – Acceleration time from 0 to 60 mph Relationships – Weight vs. CityMPG – Weight vs. FuelCapacity – PageNum vs. Fuel Capacity – Weight vs. QtrMile – Acc060 vs. QtrMile – CityMPG vs. QtrMile

Correlation The sample correlation, r, measures the strength and direction of linear association between two quantitative variables s X : sample standard deviation of X s Y : sample standard deviation of Y R: cor(X,Y)

Car Correlations What are the properties of correlation? (-.91) (.89) (-.08) (-.45) (.99) (.51)

Correlation -1 ≤ r ≤ 1 positive association: r > 0 negative association: r < 0 no linear association: r  0 The closer r is to ±1, the stronger the linear association r does not depend on the units of measurement The correlation between X and Y is the same as the correlation between Y and X