Selecting Input Probability Distributions
2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival times –Service/machining times –Demand/batch sizes –Machine up/down times Inappropriate input distribution(s) can lead to incorrect output, bad decisions Given observed data on input quantities, we can use them in different ways
3 Data Usage UseProsCons Trace-driven Use actual data values to drive simulation Valid vis à vis real world Direct Not generalizable Empirical distribution Use data values to define a “ connect-the-dots ” distribution (several specific ways) Fairly valid Simple Fairly direct May limit range of generated variates (depending on form) Fitted “ standard ” distribution Use data to fit a classical distribution (exponential, uniform, Poisson, etc.) Generalizable — fills in “ holes ” in data May not be valid May be difficult
4 Parameterization of Distributions - 1 There are alternative ways to parameterize most distributions Typically, parameters can be classified as one of: –Location parameter γ (also called shift parameter): specifies an abscissa (x axis) location point of a distribution ’ s range of values, often some kind of midpoint of the distribution Example: μ for normal distribution As γ changes, distribution just shifts left or right without changing its spread or shape If X has location parameter 0, then X + γ has location parameter γ
5 Parameterization of Distributions - 2 –Scale parameter β: determines scale, or units of measurement, or spread, of a distribution Example: σ for normal distribution, β for exponential distribution As β changes, the distribution is compressed or expanded without changing its shape If X has scale parameter 1, then βX has scale parameter β
6 Parameterization of Distributions - 3 –Shape parameter α: determines, separately from location and scale, the basic form or shape of a distribution Examples: normal and exponential distribution do not have shape parameter; α for Gamma and Weibull distributions May have more than one shape parameter (Beta distribution has two shape parameters) Change in shape parameter(s) alters distribution ’ s shape more fundamentally than changes in scale or location parameters
7 Continuous and Discrete Distributions Compendium of 13 continuous and 6 discrete distributions given in the textbook with details on –Possible applications –Density and distribution functions (where applicable) –Parameter definitions and ranges –Range of possible values –Mean, variance, mode –Maximum-likelihood estimator formula or method –General comments, including relationships to other distributions –Plots of densities
8 Summary Measures from Moments Mean and variance –Coefficient of Variation is a measure of variability relative to the mean: CV(X)= X / X. Higher moments also give useful information –Skewness coefficient gives information about the shape. –Kurtosis coefficient gives information about the tail weight (likelihood of extreme-value).
Example Find: Mean Variance Coefficient of variation Median Skewness coefficient
10 Exponential Expo(β)
11 Exponential Expo(β) Expo(1) density function
Exponential: Properties Coefficient of Variation is a measure of variability relative to the mean: CV(X)= X / X. Its Coefficient of Variation is 1 (unless it is shifted). The density function is monotonically decreasing (at an exponential rate). Times of events: most likely to be small but can be large with small probabilities. Skewness = 2, Kurtosis (tail weight) =9. 12
13 Poisson(λ) Bimodal: Two modes
14 Poisson(λ)
Poisson: Properties Counts the number of events of over time. If arrivals occur according to a Poisson process with rate, times between arrivals are exponential with mean 1/ Its Coefficient of Variation is 1/Sqrt( ). Events (i.e.) are generated by a large potential population where each customer chooses to arrive at a given small interval with a very small probability. Number of outbreaks of war over time, number of goals scored in World Cup games. 15
Normal Distribution: Properties Supported by Central Limit Theorem: the random variable is a sum of several small random variables (i.e. total consumer demand). It is symmetrical (skewness = 0, mean=median). Kurtosis=3. It’s usually not appropriate for modeling times between events (can take negative values). 16
Gamma Distribution: Properties Shape parameter: >0, scale parameter >0 A special case: sum of exponential random variables ( =1, corresponds to exponential ( ). In general, skewness is positive. The CV is less than one if shape parameter > 1. Scale = 1, shape=2Scale = 1, shape=20
Weibull Distribution: Properties Shape parameter: >0, scale parameter >0 Very versatile 18 Scale = 1, shape=1.5Scale = 1, shape=10
Lognormal Distribution: Properties Y=ln(X) is Normal( , ). Models product of several independent random factors ( X=X 1 X 2 …X n ). Very versatile: like gamma and Weibull but can have a spike near zero. 19 Scale = 1, shape=0.5Scale = 2, shape=0.1
20 Empirical Distributions There may be no standard distribution that fits the data adequately: use observed data themselves to specify directly an empirical distribution There are many different ways to specify empirical distributions, resulting in different distributions with different properties.
21 Continuous Empirical Distributions If original individual data points are available (i.e., data are not grouped) –Sort data X 1, X 2,..., X n into increasing order: X (i) is ith smallest –Define F(X (i) ) = (i – 1)/(n – 1), approximately (for large n ) the proportion of the data less than X (i), and interpolate linearly between observed data points:
22 Continuous Empirical Distributions Rises most steeply over regions where observations are dense, as desired. Sample: 3,5,6,7,9,12 F(3)=0, F(5)=1/5, F(6)=2/5, F(7)=3/5, F(9)=4/5, F(12)=1,
23 Potential disadvantages: –Generated data will be within range of observed data –Expected value of this distribution is not the sample mean There are other ways to define continuous empirical distributions, including putting an exponential tail on the right to make the range infinite on the right If only grouped data are available –Don ’ t know individual data values, but counts of observations in adjacent intervals –Define empirical distribution function G(x) with properties similar to F(x) above for individual data points Continuous Empirical Distributions
24 Discrete Empirical Distributions If original individual data points are available (i.e., data are not grouped) –For each possible value x, define p(x) = proportion of the data values that are equal to x If only grouped data are available –Define a probability mass function such that the sum of the p(x) ’ s for the x ’ s in an interval is equal to the proportion of the data in that interval –Allocation of p(x) ’ s for x ’ s in an interval is arbitrary