Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3
Continuous Statistical Distribution Describes behavior of a continuous random variable The probability that the c.r. variable has any value is described by a probability density function (pdf), the probability that the variable will take on any particular value. Continuous pdfs can Symmetric Asymmetric (or skewed)
Goals Definition of continuous distributions Probability density function, cumulative distribution function, descriptive statistics, histograms, probability plots, and mixture distributions. Visualization of data structure with probability plots.
Continuous pdf shapes
Descriptive Statistics Central Tendency Mean (arithmetic mean or average) Median: observation separating upper from lower half (50%) of data set Mode: observation that occurs most frequently in a data set Dispersion Standard deviation
Examples include: Lognormal, Gamma, Chi-square, Weibull, Exponential, F and Extreme Value
Gaussian probability distribution and cumulative probability distribution functions, µ=10, σ= 1 (blue), 2 (green), and 3 (red)
Gaussian probability distribution and cumulative probability distribution functions, σ= 2; µ=10 (blue), 12 (green), and 14 (red)
Histogram (visualize ‘pdf of data sample’) Gaussian data: Working with Random Samples (DATA) Histogram (visualize ‘pdf of data sample’)
Empirical Cumulative Distribution Functions Gaussian data: Working with Random samples Empirical Cumulative Distribution Functions
Empirical Cumulative Distribution Functions Gaussian data: Working with Random samples Empirical Cumulative Distribution Functions Bold line: ECDF for all samples,1000 observations
Probability Plot: Equal Percentiles re: Hypothetical Distribution Gaussian data: Working with Random samples Probability Plot: Equal Percentiles re: Hypothetical Distribution
Probability Plot: Equal Percentiles re: Hypothetical Distribution Gaussian data: Working with Random samples Probability Plot: Equal Percentiles re: Hypothetical Distribution
Plot the sorted data (x-axis) versus the y-axis points. Normal Probability Plot: Equal Percentiles re: Normal (Gaussian) Distribution – IN EXCEL For x-axis, sort (or rank) data sample observations in ascending order (from smallest to largest) For y-axis, make a corresponding array of probability values, (i-0.5)/N, where N is the sample and i=1,2,3,…,N. Then make an array that is ‘NORMSINV()’ of these probability values, the expected value of each observation from a unit normal (mean=0, sd=1) distribution. ‘NORMINV()’ can also be used for other means and sd. Plot the sorted data (x-axis) versus the y-axis points.
Make scatter plot of corresponding points Normal Probability Plot: Equal Percentiles re: other distributions – IN EXCEL For the x-axis, sort (or rank) data sample observations in ascending order (from smallest to largest) For the y-axis, construct probability array (i-0.5)/N, where N is the sample and i=1,2,3,…,N. Chi-square distribution: ‘CHIINV()’ Gamma distribution: ‘GAMMAINV()’ Beta distribution: ‘BETAINV()’ F distribution: ‘FINV()’ Make scatter plot of corresponding points
Probability Plot re: Unit Normal Distribution Gaussian data: Working with Random samples Probability Plot re: Unit Normal Distribution
Probability Plot re: Unit Normal Distribution Gaussian data: Working with Random samples Probability Plot re: Unit Normal Distribution Bold line: plot for all samples,1000 observations
Probability Plot re: Unit Normal Distribution Gaussian data: Working with Random samples Probability Plot re: Unit Normal Distribution Slope estimates 1/SD
Probability Plot re: Unit Normal Distribution Gaussian data: Working with Random samples Probability Plot re: Unit Normal Distribution
Histogram (visualize ‘pdf of data sample’) Working with Random Samples (DATA) Histogram (visualize ‘pdf of data sample’)
Empirical Cumulative Distribution Functions Gaussian data: Working with Random samples Empirical Cumulative Distribution Functions
Probability Plot re: Unit Normal Distribution Working with Random samples Probability Plot re: Unit Normal Distribution
For the y-axis, calculate ‘Cumulative Hazard’ Hazard Plots – IN EXCEL For the x-axis, sort (or rank) data sample observations in ascending order (from smallest to largest) For the y-axis, calculate ‘Cumulative Hazard’ For each observation, enter 1/(reverse rank order) For the smallest of N observations, enter 1/N For the second smallest, enter 1/(N-1) …. Cumulative Hazard is the cumulative sum of these values for each observation. E.g., for the third smallest observation, the cumulative hazard is 1/N+1/(N-1)+1/(N-2) Make scatter plot of corresponding points
Probability Plot re: Cumulative Hazard (unit exponential distribution) Working with Random samples Probability Plot re: Cumulative Hazard (unit exponential distribution)
Make scatter plot of corresponding probability points Sample Probability-Probability (P-P) and Quantile-Quantile (Q-Q) Plots: Scatter Plot of Equal Percentiles or Quantiles of Two Samples– IN EXCEL For the x-axis, sort (or rank) first data sample observations in ascending order (from smallest to largest) For the y-axis, sort (or rank) second data sample observations in ascending order Make scatter plot of corresponding probability points If samples are from same distribution, the plot is linear.
Probability Plots: Are they identically distributed Working with Random samples Probability Plots: Are they identically distributed
Probability Plot re: Cumulative Hazard (unit exponential distribution) Working with Random samples Probability Plot re: Cumulative Hazard (unit exponential distribution)
Mixture Distributions
Mixture Distributions
Mixture Distributions
Mixture Distributions
Mixture Distributions
Mixture Distributions + + Mixture 2
Mixture Distributions + =
Call Center Data: Call Frequency
Call Center Data: Call Frequency
Call Center Data: Call Frequency
Call Center Data: Call Frequency Mean S,D, 10:09 hr ± 9 min 14:58 hr ± 34 min
Call Center Data: Call Frequency 10:09 hr ± 9 min 10:04 hr ± 11 min 14:58 hr ± 34 min 14:58 hr ± 15 min
Call Center Data: Call Frequency
Call Center Data: Interval Between Calls
Call Center Data: Interval Between Calls
Call Center Data: Interval Between Calls
Call Center Data: Interval Between Calls
Call Center Data: Call Service Times
Call Center Data: Call Service Times
Call Center Data: Call Service Times
Call Center Data: Call Service Times
Goals Definition of continuous distributions Probability density function, cumulative distribution function, descriptive statistics, histograms, probability plots, and mixture distributions. Visualization of data structure with probability plots.