1 Empirical and probability distributions 0.4 exploratory data analysis.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

1 Chapter 1: Sampling and Descriptive Statistics.
1 The Islamic University of Gaza Civil Engineering Department Statistics ECIV 2305 ‏ Chapter 6 – Descriptive Statistics.
MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Ch. 6 The Normal Distribution
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Note 4 of 5E Statistics with Economics and Business Applications Chapter 2 Describing Sets of Data Descriptive Statistics – Numerical Measures.
Measures of Relative Standing and Boxplots
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Chapter 2 Describing Data with Numerical Measurements
Department of Quantitative Methods & Information Systems
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
(c) 2007 IUPUI SPEA K300 (4392) Outline: Numerical Methods Measures of Central Tendency Representative value Mean Median, mode, midrange Measures of Dispersion.
Section 2.4 Measures of Variation.
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
© 2003 Prentice-Hall, Inc.Chap 6-1 Basic Business Statistics (9 th Edition) Chapter 6 The Normal Distribution and Other Continuous Distributions.
REPRESENTATION OF DATA.
Descriptive Statistics
Chap 6-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 6 The Normal Distribution Business Statistics: A First Course 6 th.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
Anthony J Greene1 Dispersion Outline What is Dispersion? I Ordinal Variables 1.Range 2.Interquartile Range 3.Semi-Interquartile Range II Ratio/Interval.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Exploratory Data Analysis Observations of a single variable.
Chapter 3 Descriptive Statistics II: Additional Descriptive Measures and Data Displays.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 CHEBYSHEV'S THEOREM For any set of data and for any number k, greater than one, the.
Percentiles For any whole number P (between 1 and 99), the Pth percentile of a distribution is a value such that P% of the data fall at or below it. The.
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
ENGR 610 Applied Statistics Fall Week 2 Marshall University CITE Jack Smith.
1 Chapter 2 Bivariate Data A set of data that contains information on two variables. Multivariate A set of data that contains information on more than.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
INEN 270 ENGINEERING STATISTICS Fall 2011 Introduction.
The Third lecture We will examine in this lecture: Mean Weighted Mean Median Mode Fractiles (Quartiles-Deciles-Percentiles) Measures of Central Tendency.
Course Description Probability theory is a powerful tool that helps Computer Science and Electrical Engineering students explain, model, analyze, and design.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
STATISTICS Chapter 2 and and 2.2: Review of Basic Statistics Topics covered today:  Mean, Median, Mode  5 number summary and box plot  Interquartile.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Parameter, Statistic and Random Samples
Methods for Describing Sets of Data
Statistics -S1.
Chapter 16: Exploratory data analysis: numerical summaries
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Engineering Probability and Statistics - SE-205 -Chap 6
Chapter 6 – Descriptive Statistics
Fundamentals of Probability and Statistics
Chapter 6 ENGR 201: Statistics for Engineers
Section 2.4 Measures of Variation.
Click the mouse button or press the Space Bar to display the answers.
Presentation transcript:

1 Empirical and probability distributions 0.4 exploratory data analysis

2 Exploratory data analysis   Given unknown distribution, we often take a sample to explore its characteristics.   stem-and-leaf display   Order the n observations in a sample upwards.  50 test scores on a statistics examination: StemsLeavesFrequencyDepths (13)145 StemsLeavesFrequencyDepths (13)145 Table Stem-and-leaf displayTable Ordered stem-and-leaf display

3 Order Statistics of the sample  Order statistics of 50 exam scores   Easy to compute the sample percentiles.   The (100p)th sample percentile is defined as 0 < 1/(n+1)  p  n/(n+1) < 1   The (n+1)p th order statistic, if (n+1)p is an integer.   Or Linear interpolation between y r and y r+1 if (n+1)p=r + proper fraction t.

4  For p=1/2: (n+1)p=25.5, the 50th-percentile is  For p=1/4: (n+1)p=12.75, the 25th-percentile is  For p=3/4: (n+1)p=38.25, the 75th-percentile is   The 50th percentile is called the median of the sample.   The 25th, 50th, and 75th percentiles are the first, second, and third quartiles of the sample.   The 10th, 20th, …, and 90thpercentiles are the deciles of the sample.

5 Five-number Summary   The set has min., 1st quartile q 1, median, 3rd quartile q 3, and max.   IQR, inter-quartile range = q 3 -q 1.   Box-and-whisker diagram (box plot) to display 5- number summary.   Ex0.4-2: y 1 =34, q 1 =58.75, q 2 =m=71, q 3 =81.25, y 50 =97.   Slightly skew to the left

6   Ex0.4-5: IQR=13.5-2=11.5   Inner fence: 1.5*11.5=17.25   Outer fence: 3*11.5=34.5   Two suspected outliers are marked with an *.

7  Some functions of 2 or more order statistics  Middle  Midrange=average of the extremes=(y 1 +y n )/2  Trimean=(q 1 +2q 2 +q 3 )/4  Spread  Range=difference of the extremes=y n -y 1  Interquartile range=difference of third and first quartiles=q 3 -q 1 (=IQR)

8 0.5 Graphical comparisons of data sets   It is also called a back-to-back stem-and-leaf display.   To compare the characteristics of two populations of data.   Ex0.5-1: The hardness results for Furnace 10 & 14. Depths Furnace 10 leaves Stems Furnace 14 leaves Depths (11) s 3 . 4*4t4f4s 4 . 5* (6)9831

9  Ex0.5-2:  Ex0.5-2: IQR=13.5-2=11.5   F10: (46, 47, 48, 49, 51),   F14: (36, 40.5, 43, 46.5, 51).   Comparisons of 3+ sets of data are possible.

10 Quantile-quantile (q-q) Plot)   For two sets of data: x 1  x 2  …  x n & y 1  y 2  …  y n   x r & y r are called the quantile of order r/(n+1), & the 100[r/(n+1)]th-percentiles.   In a q-q plot, the quantiles of one sample are plotted against the corresponding quantiles of the other sample.   If both samples were the same, the points (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) will be graphed as a straight line with slope 1 & intercepting 0.

11   If the first sample’s mean is shifted over d units, the intercept is –d.   The first sample has greater variability if the slope is less than 1.   If the slope increases, the variability of the first sample decreases.   For instance, the first sample is skewed to the left.   Ex0.5-4: the average hourly number of misfeeding leads.

Probability density and mass function Probability density and mass function -Probability Density Functions   The relative frequency (or density) histogram h(x) associated with n observations of a random variable X of the continuous type is a nonnegative function.   The area between its graph and the x axis is 1.   As n increases, the class intervals approach 0.   h(x) ⇒ some function f(x) for the true probability

13   Probability density function (p.d.f)   (a) f(x) > 0, x  S   (b)  S f(x)dx =1   (c)The probability of the event a <X < b is P(a < X < b) =  b a f(x)dx   The corresponding distribution of probability is said to be one of the continuous type.

14   Ex0.7-1: For a balanced spinner, the result of a spin is a random variable X whose space is S={x:0  x<1}   Due to the spinner is “balanced”, X has the p.d.f. f(x)=1, 0 ≤x <1.

15 Probability mass function (p.m.f)   (a) f(x) > 0, x  S   (b)  x  S f(x) = 1   (c) P(X = u i ) = f(u i ), i = 1, 2,..., k

16

17 Percentile from Percentile from p.d.f.   The (100p)th percentile is a number  p s.t. the area under f(x) to the left of  p is p. p =   p -  f(x)dx   The 50 th percentile π 0.5 is called the median, m =π 0.5.   The 25 th & 75 th percentiles are called the first and third quartiles   q1=  0.25 & q3=  0.75 [m=q2=  0.5 : the second quartile]   In discrete case, the percentiles are often not so clean to find because each point in the space S has a positive probability.

18   Ex0.7-6: The distribution of the largest value, Y, of two spins of the balanced spinner has the p.d.f. f(y)=2y, 0  y<1. Find the median.

19 Q-q Plot for Model Evaluation   To exam how close a theoretical model is to the real distribution,   Coarsely,   Compute the mean μand the variance σ 2 of the theoretical model.   Perform a random experiment and compute the mean x and the variance s 2 of the observed data.   Compare these values.   Delicately,   Achieve the quantile-quantile (q-q) plot.

20   The (100p) th percentile of a distribution is often called the quantile of order p.   The percentile π p of a theoretical distribution is the quantile of order p.   Empirically, sort n observations {x 1, x 2, …, x n } into the order statistics {y 1, y 2, …, y n } (y 1 ≤y 2 ≤…≤y n )   y r is the quantile of order r/(n+1), and the 100r/(n+1) percentile.   Plot (y r, π p ), where p=r/(n+1), r=1, …, n.  If the points closely lie on a line of the slope 1, then  If the points closely lie on a line of the slope 1, then y r  π p.   Or, the theoretical model is not good.

21   Ex0.7-7: using p.d.f. f(x)=1, 0 ≤x <1, to approach the random number.   From f(x), compute μ=1/2, and σ 2 =1/12, σ=   Pick the first 19 random numbers from Table IX in the Appendix (page.665)   Sort them in an ascending order   Compute x= and s =   Compare these means and standard deviations.   Construct q-q plot: