Download presentation
Presentation is loading. Please wait.
Published byDwain Stokes Modified over 9 years ago
1
Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot Course Organization & Website
2
Exploratory Data Analysis 4
“Time Plots”, i.e. “Time Series: Idea: when time structure is important, plot variable as a function of time: variable time Often useful to “connect the dots”
3
Class Time Series Example
Monthly Airline Passenger Numbers Increasing Trend (long term growth, over years) Increasing Variation (appears proportional to trend) “Seasonal Effect” Month Cycle (Peak in summer, less in winter)
4
Airline Passengers Example
Interesting variation: log transformation Stabilizes variation Since log of product is sum Shows changing variation prop’l to trend Log10 is “most interpretable” (log10(1000) = 3, …) Generally useful trick (there are others)
5
Airline Passengers Example
A look under the hood Use Chart Wizard Chart Type: Line (or could do XY) Use subtype for points & lines Use menu for first log10 Although could just type it in Drag down to repeat for whole column
6
Time Series HW HW: 1.36, 1.37 Use EXCEL
7
Exploratory Data Analysis 5
Numerical Summaries of Quant. Variables: Idea: Summarize distributional information (“center”, “spread”, “skewed”) In Text, Sec. 1.2 for data (subscripts allow “indexing numbers” in list)
8
Numerical Summaries “Centers” (note there are several)
“Mean” = Average = Greek letter “Sigma”, for “sum” In EXCEL, use “AVERAGE” function
9
Numerical Summaries of Center
“Median” = Value in middle (of sorted list) Unsorted E.g: Sorted E.g: 27 “in middle”? (no) 2 better “middle”! EXCEL: use function “MEDIAN”
10
Difference Betw’n Mean & Median
Symmetric Distribution: Essentially no difference Right Skewed: 50% area % area M bigger since “feels tails more strongly”
11
Difference Betw’n Mean & Median
Outliers (unusual values): Nice Web Example: Mean feels outliers much more strongly Leaves “range of most of data” Good notion of “center”? (perhaps not) Median affected very minimally Robustness Terminology: Median is “resistant to the effect of outliers”
12
Difference Betw’n Mean & Median
A more flexible web example: Get various dist’ns, by manipulating bar heights See Mean, Median and more Similar for symmetric distributions Very different when skewed “Big Gap”, can make median jump a lot But mean is less sensitive (more “continuous”)
13
Numerical Centerpoint HW
HW: a (but make histograms), b Use EXCEL
14
Numerical Summaries (cont.)
“Spreads” (again there are several) 1. Range = biggest smallest range Problems: Feels only “outliers” Not “bulk of data” Very non-resistant to outliers
15
Numerical Summaries of Spread
Variance = = “average squared distance to “ EXCEL: VAR Drawback: units are wrong e. g. For in feet is in square feet
16
Numerical Summaries of Spread
Standard Deviation EXCEL: STDEV Scale is right But not resistant to outliers Will use quite a lot later (for reasons described later)
17
Interactive View of S. D. Revisit flexible web example:
Note SD range centered at mean Can put SD “right near middle” (densely packed data) Can put SD at “edges of data” (U shaped data) Can put SD “outside of data” (big spike + outlier) But generally “sensible measure of spread”
18
Variance – S. D. HW HW: for both data sets in 1.49, find the:
Standard Deviation (26.4, ) Use EXCEL
19
Numerical Summaries of Spread
Interquartile Range = IQR Based on “quartiles”, Q1 and Q3 (idea: shows where are 25% & 75% “through the data”) 25% % % % Q Q2 = median Q3 IQR = Q3 – Q1
20
Quartiles Example Revisit flexible web example: Right skewness gives:
Right skewness gives: Median < Mean (mean “feels farther points more strongly”) Q1 near median Q3 quite far (makes sense from histogram)
21
Tools Data Analysis Descriptive Stats
Quartiles Example A look under the hood: Can compute as separate functions for each Or use: Tools Data Analysis Descriptive Stats Which gives many other measures as well Use “k-th largest & smallest” to get quartiles
22
5 Number Summary Summarize Information About: Minimum
Q st Quartile Median Q rd Quartile Maximum Summarize Information About: Center - from 3 Spread - from 2 & 4 (maybe 1 & 6) Skewness - from 2, 3 & 4 Outliers - from 1 & 5
23
5 Number Summary How to Compute? EXCEL function QUARTILE
EXCEL function QUARTILE “One stop shopping” IQR seems to need explicit calculation
24
Rule for Defining “Outliers”
Caution: There are many of these Textbook version: Above Q * IQR Below Q1 – 1.5 * IQR For stamps data: No outliers at “low end” Some that “high end”
25
Box Plot Additional Visual Display Device
Again legacy from pencil & paper days Not supported in EXCEL We will skip
26
5 Number Sum. & Outliers HW
1.49 c, d and add: (d) How much does the mean change if you omit Montana and Wyoming?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.