Exploratory data analysis: numerical summaries

Slides:



Advertisements
Similar presentations
CIS Based on text book: F.M. Dekking, C. Kraaikamp, H.P.Lopulaa, L.E.Meester. A Modern Introduction to Probability and Statistics Understanding.
Advertisements

Statistical Reasoning for everyday life
DESCRIBING DISTRIBUTION NUMERICALLY
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Measures of Dispersion boxplots. RANGE difference between highest and lowest value; gives us some idea of how much variation there is in the categories.
Probabilistic & Statistical Techniques
Measures of Variation Sample range Sample variance Sample standard deviation Sample interquartile range.
Chap 10: Summarizing Data 10.1: INTRO: Univariate/multivariate data (random samples or batches) can be described using procedures to reveal their structures.
1 The Islamic University of Gaza Civil Engineering Department Statistics ECIV 2305 ‏ Chapter 6 – Descriptive Statistics.
MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Vocabulary for Box and Whisker Plots. Box and Whisker Plot: A diagram that summarizes data using the median, the upper and lowers quartiles, and the extreme.
(c) 2007 IUPUI SPEA K300 (4392) Outline: Numerical Methods Measures of Central Tendency Representative value Mean Median, mode, midrange Measures of Dispersion.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Methods for Describing Sets of Data
Exploration of Mean & Median Go to the website of “Introduction to the Practice of Statistics”website Click on the link to “Statistical Applets” Select.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
STA Lecture 131 STA 291 Lecture 13, Chap. 6 Describing Quantitative Data – Measures of Central Location – Measures of Variability (spread)
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Measures of Dispersion How far the data is spread out.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
1 Further Maths Chapter 2 Summarising Numerical Data.
MIA U2D7 Warmup: Find the mean (rounded to the nearest tenth) and median for the following data: 73, 50, 72, 70, 70, 84, 85, 89, 89, 70, 73, 70, 72, 74.
Math 3033 Wanwisa Smith 1 Base on text book: A Modern Introduction to Probability and Statistics Understanding Why and How By: F.M. Dekking, C. Kraaikamp,
Summary Statistics and Mean Absolute Deviation MM1D3a. Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data.
CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified.
Chapter 16 Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics Instructor:
The field of statistics deals with the collection,
Box and Whisker Plots Example: Comparing two samples.
Methods for Describing Sets of Data
Chapter 16: Exploratory data analysis: numerical summaries
a graphical presentation of the five-number summary of data
BAE 6520 Applied Environmental Statistics
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
BAE 5333 Applied Water Resources Statistics
Numerical descriptions of distributions
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Engineering Probability and Statistics - SE-205 -Chap 6
CHAPTER 1 Exploring Data
Chapter 5 : Describing Distributions Numerically I
Boxplots.
Chapter 16: Exploratory data analysis: Numerical summaries
Unit 6 Day 2 Vocabulary and Graphs Review
Mean Absolute Deviation
Bar graphs are used to compare things between different groups
Unit 4 Statistics Review
Numerical Measures: Skewness and Location
Unit 4 Part 1 Test Review.
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Distributions Numerically
The absolute value of each deviation.
Boxplots.
Shape of Distributions
Boxplots.
CIS 2033 Base on text book: A Modern Introduction to
AP Statistics Day 4 Objective: The students will be able to describe distributions with numbers and create and interpret boxplots.
Boxplots.
Key points! *Use the mean and mean absolute deviation (MAD) to describe symmetric distributions of data. *Use the median and the interquartile range (IQR)
Descriptive Statistics
Statistics Vocabulary Continued
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
The Five-Number Summary
Box-And-Whisker Plots
Statistics Vocabulary Continued
Describing Data Coordinate Algebra.
Introductory Statistics
Presentation transcript:

Exploratory data analysis: numerical summaries CIS 2033 Based on Textbook: A Modern Introduction to Probability and Statistics. 2007 Slides: QUINCY R WALKER Modified by the instructor: Dr. Longin Jan Latecki Chapter 16 Exploratory data analysis: numerical summaries

16.1 The Center of the Data Set Center of the Data= sample mean: n = the sample size Example: Sample mean of the following data is 44.7 43, 43, 41, 41, 41, 42, 43, 58, 58, 41, 41

Outliers an outlier is an observation that is numerically distant from the rest of the data Sample median is more robust in the presence of outliers.

Variability in A Data Set Variance: Standard Deviation: where n is the number samples Why we choose the factor 1/(n−1) instead of 1/n will be explained later (in Chapter 19).

Variability cont. Medn= median of sample Median of Absolute Deviation (MAD): The Median of the Absolute Deviations of a Sample. Medn= median of sample Absolute Deviation: The absolute value of the distance Of a point xi in a data set from the median

Empirical quantiles The order statistics consist of the same elements as the original dataset x1, x2 x3,…, xk , but in ascending order. Denote by the kth element in the ordered list. Then: The pth quartile corresponds to pth quartile of a cdf: Finv(p) where F(p) is the cumulative distribution function of the data

Quartiles Lower quartile: qn(.25) Upper quartile: qn(.75) Interquartile Range (IQR) IQR = qn(0.75) − qn(0.25) Median(Middle Quartile): qn(.50)

The box-and-whisker plot Advantages: Good representation of statistical data Shows quartiles, median and outliers Disadvantages poor graphical display of the dataset histogram and kernel density estimate are more informative displays of a single dataset

Using boxplots to compare several datasets Boxplots become useful if we want to compare several sets of data in a simple graphical display: