Stor 155, Section 2, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms –Binwidth is critical Time Plots = Time Series Course.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

CHAPTER 1 Exploring Data
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
1.1 Displaying Distributions with Graphs
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Stat 155, Section 2, Last Time Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Transformation.
Table of Contents 1. Standard Deviation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company.
Measures of Dispersion How far the data is spread out.
Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Describing Distributions Numerically.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Stat 31, Section 1, Last Time Course Organization & Website What is Statistics? Data types.
Numerical descriptions of distributions
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Descriptive Statistics ( )
Notes 13.2 Measures of Center & Spread
Chapter 1: Exploring Data
2.5: Numerical Measures of Variability (Spread)
Objective: Given a data set, compute measures of center and spread.
Do-Now-Day 2 Section 2.2 Find the mean, median, mode, and IQR from the following set of data values: 60, 64, 69, 73, 76, 122 Mean- Median- Mode- InterQuartile.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Please take out Sec HW It is worth 20 points (2 pts
1.2 Describing Distributions with Numbers
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
CHAPTER 1 Exploring Data
Displaying and Summarizing Quantitative Data
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Measures of Center.
Summary (Week 1) Categorical vs. Quantitative Variables
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Stor 155, Section 2, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms –Binwidth is critical Time Plots = Time Series Course Organization & Website

Reading In Textbook Approximate Reading for Today’s Material: Pages Approximate Reading for Next Class: Pages 64-83

And now for something completely different Is this class too “monotone”? Easier to understand? Calm environment enhances learning? Or does it induce somnolence? What is “somnolence”? Google definition: Sleepiness, a condition of semiconsciousness approaching coma.

And now for something completely different An experiment: Pull out any coins you have with you How many of you have: –>= 1 penny? –>= 1 nickel? –>= 1 dime? –>= 1 quarter? Choose most frequent denomination

And now for something completely different Collect data (into Spreadsheet): Years stamped on coins (chosen denomination) Many as person has Enter into spreadsheet Look at “distribution” using histogram

And now for something completely different Predicted Answer –From Text Book, Problem 1.32 Distribution is Left Skewed Works out as predicted? Why? Note: most skewed dist’ns seem to be: Right Skewed

Exploratory Data Analysis 4 Numerical Summaries of Quant. Variables: Idea: Summarize distributional information (“center”, “spread”, “skewed”) In Text, Sec. 1.2 for data (subscripts allow “indexing numbers” in list)

Numerical Summaries A.“Centers” (note there are several) 1.“Mean” = Average = Greek letter “Sigma”, for “sum” In EXCEL, use “AVERAGE” function

Numerical Summaries of Center 2.“Median” = Value in middle (of sorted list) Unsorted E.g:Sorted E.g: “in middle”? (no)2 better “middle”! EXCEL: use function “MEDIAN”

Difference Betw’n Mean & Median Symmetric Distribution: Essentially no difference Right Skewed Distribution: 50% area 50% area M bigger since “feels tails more strongly”

Difference Betw’n Mean & Median Outliers (unusual values): Simple Web Example: Mean feels outliers much more strongly –Leaves “range of most of data” –Good notion of “center”? (perhaps not) Median affected very minimally –Robustness Terminology: Median is “resistant to the effect of outliers”

Difference Betw’n Mean & Median A richer web example: Publisher’s Web SitePublisher’s Web Site: Statistical Applets: Mean & Median For Symmetric distributions: –Both are same Add an outlier: –Mean feels it much more strongly –Implication for “bad data”: can be very bad Two Clusters: –Median jumps more quickly –Mean more stable (better?)

Computation using Excel Some Toy Examples: Compute Using Excel Functions Mean feels location of data on number line Median feels location of data in sorted list Median breaks tie by averaging center points

Numerical Centerpoint HW HW: 1.46 a, 1.47, 1.49 Use EXCEL

And now for something completely different Check out this small quick movie clip:

And now for something completely different Suggestions for other things to show here are very welcome…. Movie Clips… Music… Jokes… Cartoons… …

Numerical Summaries (cont.) A.“Spreads” (again there are several) 1.Range = biggest - smallest range Problems: Feels only “outliers” Not “bulk of data” Very non-resistant to outliers

Numerical Summaries of Spread 2.Variance = = “average squared distance to “ EXCEL: VAR Drawback: units are wrong e. g. For in feet  is in square feet

Numerical Summaries of Spread 3.Standard Deviation EXCEL: STDEV Scale is right But not resistant to outliers Will use quite a lot later (for reasons described later)

Interactive View of S. D. Interesting web example (manipulate histogram): Note SD range centered at mean Can put SD “right near middle” (densely packed data) Can put SD at “edges of data” (U shaped data) Can put SD “outside of data” (big spike + outlier) But generally “sensible measure of spread”

Variance – S. D. HW C3:For the data set in 1.46 (i.e. 1.37), find the: i.Variance (1620) ii.Standard Deviation (40.2) Use EXCEL

Numerical Summaries of Spread 3.Interquartile Range = IQR Based on “quartiles”, Q1 and Q3 (idea: shows where are 25% & 75% “through the data”) 25% 25% Q1 Q2 = median Q3 IQR = Q3 – Q1

Quartiles Example Revisit Hidalgo Stamp Thickness example: stat-or.unc.edu/webspace/postscript/marron/Teaching/stor /Stor155Eg6Done.xls Right skewness gives: –Median < Mean (mean “feels farther points more strongly”) –Q1 near median –Q3 quite far (makes sense from histogram)

Quartiles Example A look under the hood: Can compute as separate functions for each Or use: Tools  Data Analysis  Descriptive Stats Which gives many other measures as well Use “k-th largest & smallest” to get quartiles

5 Number Summary 1.Minimum 2.Q1 - 1 st Quartile 3.Median 4.Q3 - 3 rd Quartile 5.Maximum Summarize Information About: a)Center-from 3 b)Spread-from 2 & 4 (maybe 1 & 5) c)Skewness-from 2, 3 & 4 d)Outliers-from 1 & 5

5 Number Summary How to Compute? EXCEL function QUARTILE “One stop shopping” IQR seems to need explicit calculation

Rule for Defining “Outliers” Caution: There are many of these Textbook version: Above Q * IQR Below Q1 – 1.5 * IQR For stamps data: –No outliers at “low end” –Some at “high end”

5 Number Sum. & Outliers HW 1.43

Box Plot Additional Visual Display Device Again legacy from pencil & paper days Not supported in EXCEL So we won’t do Main use: comparing populations –Example: Figure from text

Box Plot

Main use: comparing populations –Example: Figure from text Want to do this? Find better software package than Excel

And now for something completely different Recall Distribution of majors of students in this course:

And now for something completely different How about a business manager joke? How many managers does it take to replace a light bulb?

And now for something completely different How about a business manager joke? How many managers does it take to replace a light bulb? Two. One to find out if it needs changing, and one to tell an employee to change it. Source:

Linear Transformations Idea: What happens to data & summaries, when data are: “shifted and scaled” i.e. “panned and zoomed” Math: Scaled by a Shifted by b

Linear Transformations Effect on linear summaries: Centerpoints, and “follow data”:. Spreads, and “feel scale, not shift”:.

Most Useful Linear Transfo. “Standardization” Goal: put data sets on “common scale” Approach: 1.Subtract Mean, to “center at 0” 2.Divide by S.D., to “give common SD = 1”

Standardization Result is called “z-score”: Note that Thus is interpreted as: “number of SDs from the mean”

Standardization Example Next time: work in Excel command: STANDARDIZE

Standardization Example Buffalo Snowfall Data: Standardized data have same (EXCEL default) histogram shape as raw data. (Since axes and bin edges just follow the transformation) i.e. “shape” doesn’t depend on “scaling”

Standardization Example A look under the hood: Compute AVERAGE and SD 1.Standardize by: a.Create Formula in cell B2 b.Drag downwards c.Keep Mean and SD cells fixed using $s 3.Check stand’d data have mean 0 & SD 1 note that “8.247E-16 = 0”

Standardization HW C4:For data in 1.17, use EXCEL to: a.Give the list of standardized scores b.Give the Z-score for: (i)the mean (0) (ii)the median (-0.223) (iii)the smallest (-1.21) (iv)the largest (2.77) 1.59a, 1.73