Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Descriptive Measures MARE 250 Dr. Jason Turner.
Stor 155, Section 2, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms –Binwidth is critical Time Plots = Time Series Course.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Looking at data: distributions - Describing distributions with numbers
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Understanding and Comparing Distributions
Describing Data: Numerical
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Objectives 1.2 Describing distributions with numbers
1.1 Displaying Distributions with Graphs
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Chapter 4 Displaying and Summarizing Quantitative Data Math2200.
Stat 155, Section 2, Last Time Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Transformation.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Table of Contents 1. Standard Deviation
The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.
Essential Statistics Chapter 21 Describing Distributions with Numbers.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Describing Distributions Numerically.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Stat 31, Section 1, Last Time Course Organization & Website What is Statistics? Data types.
Numerical descriptions of distributions
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
BPS - 5th Ed.Chapter 21 Describing Distributions with Numbers.
Descriptive Statistics ( )
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
CHAPTER 2: Describing Distributions with Numbers
Chapter 4 Review December 19, 2011.
Objective: Given a data set, compute measures of center and spread.
CHAPTER 2: Describing Distributions with Numbers
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
Organizing Data AP Stats Chapter 1.
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
Exploratory Data Analysis
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Measures of Center.
Summary (Week 1) Categorical vs. Quantitative Variables
Describing Distributions Numerically
Histograms and Measures of Center vs. Spread
Essential Statistics Describing Distributions with Numbers
Basic Practice of Statistics - 3rd Edition
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
The Five-Number Summary
Basic Practice of Statistics - 3rd Edition
Describing Distributions with Numbers
Presentation transcript:

Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot Course Organization & Website https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html

Exploratory Data Analysis 4 “Time Plots”, i.e. “Time Series: Idea: when time structure is important, plot variable as a function of time: variable time Often useful to “connect the dots”

Class Time Series Example Monthly Airline Passenger Numbers https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg5Done.xls Increasing Trend (long term growth, over years) Increasing Variation (appears proportional to trend) “Seasonal Effect” - 12 Month Cycle (Peak in summer, less in winter)

Airline Passengers Example Interesting variation: log transformation Stabilizes variation Since log of product is sum Shows changing variation prop’l to trend Log10 is “most interpretable” (log10(1000) = 3, …) Generally useful trick (there are others)

Airline Passengers Example A look under the hood https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg5Raw.xls Use Chart Wizard Chart Type: Line (or could do XY) Use subtype for points & lines Use menu for first log10 Although could just type it in Drag down to repeat for whole column

Time Series HW HW: 1.36, 1.37 Use EXCEL

Exploratory Data Analysis 5 Numerical Summaries of Quant. Variables: Idea: Summarize distributional information (“center”, “spread”, “skewed”) In Text, Sec. 1.2 for data (subscripts allow “indexing numbers” in list)

Numerical Summaries “Centers” (note there are several) “Mean” = Average = Greek letter “Sigma”, for “sum” In EXCEL, use “AVERAGE” function

Numerical Summaries of Center “Median” = Value in middle (of sorted list) Unsorted E.g: Sorted E.g: 3 0 1 1 27 “in middle”? (no) 2 better “middle”! 2 3 0 27 EXCEL: use function “MEDIAN”

Difference Betw’n Mean & Median Symmetric Distribution: Essentially no difference Right Skewed: 50% area 50% area M bigger since “feels tails more strongly”

Difference Betw’n Mean & Median Outliers (unusual values): Nice Web Example: http://www.stat.sc.edu/~west/applets/box.html Mean feels outliers much more strongly Leaves “range of most of data” Good notion of “center”? (perhaps not) Median affected very minimally Robustness Terminology: Median is “resistant to the effect of outliers”

Difference Betw’n Mean & Median A more flexible web example: http://www.ruf.rice.edu/~lane/stat_sim/descriptive/index.html Get various dist’ns, by manipulating bar heights See Mean, Median and more Similar for symmetric distributions Very different when skewed “Big Gap”, can make median jump a lot But mean is less sensitive (more “continuous”)

Numerical Centerpoint HW HW: 1.49 a (but make histograms), b Use EXCEL

Numerical Summaries (cont.) “Spreads” (again there are several) 1. Range = biggest - smallest range Problems: Feels only “outliers” Not “bulk of data” Very non-resistant to outliers

Numerical Summaries of Spread Variance = = “average squared distance to “ EXCEL: VAR Drawback: units are wrong e. g. For in feet  is in square feet

Numerical Summaries of Spread Standard Deviation EXCEL: STDEV Scale is right But not resistant to outliers Will use quite a lot later (for reasons described later)

Interactive View of S. D. Revisit flexible web example: http://www.ruf.rice.edu/~lane/stat_sim/descriptive/index.html Note SD range centered at mean Can put SD “right near middle” (densely packed data) Can put SD at “edges of data” (U shaped data) Can put SD “outside of data” (big spike + outlier) But generally “sensible measure of spread”

Variance – S. D. HW HW: for both data sets in 1.49, find the: Standard Deviation (26.4, 32.9) Use EXCEL

Numerical Summaries of Spread Interquartile Range = IQR Based on “quartiles”, Q1 and Q3 (idea: shows where are 25% & 75% “through the data”) 25% 25% 25% 25% Q1 Q2 = median Q3 IQR = Q3 – Q1

Quartiles Example Revisit flexible web example: Right skewness gives: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg6Done.xls Right skewness gives: Median < Mean (mean “feels farther points more strongly”) Q1 near median Q3 quite far (makes sense from histogram)

Tools  Data Analysis  Descriptive Stats Quartiles Example A look under the hood: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg6Raw.xls Can compute as separate functions for each Or use: Tools  Data Analysis  Descriptive Stats Which gives many other measures as well Use “k-th largest & smallest” to get quartiles

5 Number Summary Summarize Information About: Minimum Q1 - 1st Quartile Median Q3 - 3rd Quartile Maximum Summarize Information About: Center - from 3 Spread - from 2 & 4 (maybe 1 & 6) Skewness - from 2, 3 & 4 Outliers - from 1 & 5

5 Number Summary How to Compute? EXCEL function QUARTILE https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg6Done.xls EXCEL function QUARTILE “One stop shopping” IQR seems to need explicit calculation

Rule for Defining “Outliers” Caution: There are many of these Textbook version: Above Q3 + 1.5 * IQR Below Q1 – 1.5 * IQR For stamps data: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg6Done.xls No outliers at “low end” Some that “high end”

Box Plot Additional Visual Display Device Again legacy from pencil & paper days Not supported in EXCEL We will skip

5 Number Sum. & Outliers HW 1.49 c, d 1.46 and add: (d) How much does the mean change if you omit Montana and Wyoming?