Download presentation
Presentation is loading. Please wait.
Published byJayson Terry Modified over 9 years ago
1
Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics. 2007 Instructor: Dr. Longin Jan Latecki Slides: QUINCY R WALKER
2
16.1 The Center of the Data Set Center of the Data= sample mean, sample median Mean: x bar n = the sample size Example: Sample mean of the following data is 44.7 43, 43, 41, 41, 41, 42, 43, 58, 58, 41, 41
3
Outliers an outlier is an observation that is numerically distant from the rest of the data
4
Variability in A Data Set Variance: Standard Deviation=sqrt(Var(X)): Where: n=number samples x bar =mean
5
Variability cont. Median of Absolute Deviation (MAD): The Median of the Absolute Deviations of a Sample. Med n = median of sample Absolute Deviation: The absolute value of the distance Of a point x[i] in a data set from the median
6
Empirical quantiles The order statistics consist of the same elements as the original dataset x 1, x 2 x 3,…, x k, but in ascending order. Denote by the kth element in the ordered list. Then: To compute the pth quartile use this formula: F inv (p) where F(p) is the cumulative distribution function
7
Quartiles Lower quartile: qn(.25) Upper quartile: qn(.75) Interquartile Range (IQR) IQR = qn(0.75) − qn(0.25) Median(Middle Quartile): qn(.50)
8
The box-and-whisker plot Advantages: Good representation of statistical data Shows quartiles, median and outliers Disadvantages poor graphical display of the dataset histogram and kernel density estimate are more informative displays of a single dataset
9
Using boxplots to compare several datasets Boxplots become useful if we want to compare several sets of data in a simple graphical display:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.