Selecting Input Probability Distribution. Introduction need to specify probability distributions of random inputs –processing times at a specific machine.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Outline input analysis input analyzer of ARENA parameter estimation
Statistics review of basic probability and statistics.
Chapter 8 Estimating Single Population Parameters
Random Number Generators. Why do we need random variables? random components in simulation → need for a method which generates numbers that are random.
Simulation Modeling and Analysis
The Simple Regression Model
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Chapter 6 The Normal Distribution and Other Continuous Distributions
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
CHAPTER 6 Statistical Analysis of Experimental Data
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Chapter 5 Continuous Random Variables and Probability Distributions
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
Chapter 4 Continuous Random Variables and Probability Distributions
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Confidence Interval Estimation
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Input Analysis 1.  Initial steps of the simulation study have been completed.  Through a verbal description and/or flow chart of the system operation.
Chap 8-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Business Statistics: A First Course.
Modeling and Simulation CS 313
Modeling and Simulation Input Modeling and Goodness-of-fit tests
Traffic Modeling.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Random Sampling, Point Estimation and Maximum Likelihood.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Basic Business Statistics
Sampling and estimation Petter Mostad
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 6-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 8 Confidence Interval Estimation Statistics For Managers 5 th Edition.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Modeling and Simulation CS 313
Sampling Distributions and Estimation
Modeling and Simulation CS 313
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 4
Confidence Interval Estimation
Statistics for Managers Using Microsoft® Excel 5th Edition
DESIGN OF EXPERIMENT (DOE)
The Normal Distribution
Presentation transcript:

Selecting Input Probability Distribution

Introduction need to specify probability distributions of random inputs –processing times at a specific machine –interarrival times of customers/pieces –demand size evaluate data sets (if available) failure to choose the correct distribution can affect the accuracy of the model’s results! || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 2

Assessing Sample Independence correlation plot scatter diagram || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 3

Assessing Sample Independence important assumption –observations are supposed to be independent graphical techniques for informally assessing whether data are independent –correlation plot –scatter diagram || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 4

correlation plot graph of sample correlation – estimate of the true correlation between two observations that are j observations apart in time –if observations X 1, X 2, …, X n are independent then ½ j = 0 for j = 1, 2, …, n-1  estimates won’t be exactly zero, even if X i ’s are independent, since its an observation of a random variable  if estimates differ from 0 by a significant amount, then its strong evidence that the X i ’s are not independent || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 5

correlation plot (example) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 6

correlation plot (example) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 7

scatter diagram plot of pairs (X i, X i+1 ) –if X i ’s are independent, one would expect the points (X i, X i+1 ) to be scattered randomly throughout the first quadrant of the plane –nature of scattering depends on underlying distribution of the X i ’s –if X i ’s are positively (negatively) correlated, points will tend to lie along a line with positive (negative) slope || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 8

scatter diagram (example) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 9

scatter diagram (example 2) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 10

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 11

useful probability distribution parameters of continuous distributions –location parameter ° x-axis location usually the midpoint (mean for normal distribution) or lower endpoint also called “shift”-parameter changes in ° shift the distribution left or right without changing it otherwise –scale parameter ¯ determines scale (unit) of measurement standard deviation ¾ for normal distribution changes in ¯ compress or expand the associated distribution without altering its basic form || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 12

useful probability distribution parameters of continuous distributions –shape parameter ® determines basic form or shape of a distribution within the general family of distributions of interest a change in ® generally alters a distribution’s properties (skewness) more fundamentally than a change in location or scale || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 13

Approaches to specify distribution if data collection on an input random variable is possible –use data values directly in simulation (trace driven) only reproduces what happened seldom enough data to make all simulation runs useful for model validation –define empirical distribution at least (for continuous data) any value between min and max no values outside the range can be generated may have irregularities –fit to theoretical distribution preferred method easy to change || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 14

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 15

Uniform U(a,b) application used as a “first” model for a quantity that is felt to be randomly varying between a and b about which little else is known || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 16

exponential distribution exp( ¸ ) application –interarrival times of entities to a system that occur at a constant rate –time to failure of a piece of equipment parameters –scale parameter ¸ > || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 17

gamma(k, µ ) application –time to complete some task (customer service, machine repair) parameters –shape parameter k > 0 –scale parameter µ > || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 18

weibull(k, ¸ ) application –time to complete some task, time to failure of a piece of equipment –used as a rough model in absence of data parameters –shape parameter k > 0, scale parameter ¸ > || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 19

normal N( ¹, ¾ 2 ) application –errors of various types –quantities that are the sum of a large number of other quantities parameters –location parameter || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 20

triangular (a,b,m) application –used as a rough model in absence of data –a, b, m are real numbers (a < m < b) location parameter a scale parameterb-a shape parameterm || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 21

poisson( ¸ ) application –number of events that occur in an interval of time when events are occurring at a constant rate –number of items demanded from inventory || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 22

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 23

Empirical Distributions use observed data themselves to specify distribution directly –generate random variables from empirical distribution –(if no theoretical distribution can be fitted) define a continuous piecewise-linear distribution function –sort X j ’s into increasing order –X (i) denotes the i th smallest value of all X j ’s || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 24

Empirical Distribution (example) observation: X 1 = 3, X 2 = 8, X 3 = 18, X 4 = 10, X 5 = 13, X 6 = 6 sorted observation: X (1) = 3, X (2) = 6, X (3) = 8, X (4) = 10, X (5) = 13, X (6) = 18 distribution F(X (i) ) F(X (i) ) = (i-1)/(n-1) F(X (1) ) = F(3) = 0/5 = 0 F(X (2) ) = F(6) = 1/5 F(X (3) ) = F(8) = 2/5 etc… || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 25 F(X) if X (i) · X · X (i+1) F(X) = (i-1)/(n-1) + (X –X (i) )/((n-1)*(X (i+1) -X (i) ) F(12) = ?? interval: X (4) · 12 < X (5) (n = 6, i = 4) F(12) = 3/5 + 2/(5*3) = 0.68

Empirical Distribution (example) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 26

Specifying Distribution useful distributions use values directly define empirical distribution fit theoretical distribution || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 27

Necessary Steps for fitting a theoretical distribution hypothesize family –summary statistics –histogram –quantile summary & box plots estimate parameters how representative is fitted distribution? –Chi-Square Goodness of fit test –Kolmogorov-Smirnoff Test || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 28

Hypothesizing families of distributions first step in selecting a particular input distribution: –decide upon general family appears to be appropriate prior knowledge might be helpful –service times should never be generated from a normal distribution WHY???? approaches –summary statistics –histograms –quantile summaries and box plots || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 29

Summary Statistics some distributions are characterized at least partially by functions of their true paramters sample estimate –estimate for range minimumX (1) maxiumumX (n) –measure of tendency mean ¹ median x || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 30

Summary Statistics (cont.) sample estimate –measure of variability variance ¾ 2 coefficient of variation cv –measure of symmetry skewness || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 31

Histograms graphic estimate of the plot of the density function corresponding to the distribution of data –density functions tend to have recognizable shapes in many cases –graphical estimate of a density should provide a good clue to the distribution that might be tried as a model for the data || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 32

Histograms how to –break up range of values into k disjoint adjacent intervals (same width) [b 0, b 1 ), [b 1, b 2 ), …, [b k-1, b k ) ¢ b = b j – b j-1 –you might want to throw out a few extremely large or small X i ’s to avoid getting an unwidely-looking histogram plot –let h j be the proportion of X i ’s that are in the j th interval [b j-1, b j ) –hint: try several values of ¢ b and choose the smallest one that gives a “smooth” histogram || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 33

Histogram (example) create 1000 random variables ~N(0,1) –create histogram || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 34

Quantile Summaries useful for determining whether the underlying probability density function is skewed to the right or left –if F(x) is the distribution function for a continuous random variable –q-quantile of F(x) is that number x q such that F(x q ) = q medianx 0.5 lower/upper quartilesx 0.25 / x 0.75 lower/upper octiles x / x || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 35

Quantile Summaries QuantileDepthSample ValuesMidpoint Mediani = (n+1)/2X (i) X (i) Quartilesj = (floor(i)+1)/2X (j) X (n-j+1) [X (j) + X [n-j+1) ]/2 Octilesk = (floor(j)+1)/2X (k) X (n-k+1) [X (k) + X [n-k+1) ]/2 Extremes1X (1) X (n) [(X (1) + X (n) ]/2 –if the underlying distribution of the X i ’s is symmetric, then the midpoints should be approximately equal –if the underlying distribution is skewed to the right (left), then the midpoints should be increasing (decreasing) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 36

Box Plots (example) graphical representation of quantile summary –fifty percent of observations fall within the horizontal boundaries of the box [x 0.25, x 0.75 ] || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 37

Necessary Steps for fitting a theoretical distribution hypothesize family –summary statistics –histogram –quantile summary & box plots estimate parameters how representative is fitted distribution? –Chi-Square Goodness of fit test –Kolmogorov-Smirnoff Test || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 38

Estimation of Parameters After one ore more candidate families of distributions have been hypothesized we most somehow specify the values of their parameters in order to have a completely specified distributions for possible use in simulation maximum –likelihood estimators (MLEs) –estimator = numerical function of the data –unknown parameter µ –hypothesized density function f µ (x) –likelihood function L( µ ) –estimator is value µ that maximizes L µ over all permissible values of µ || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 39

Estimation for Parameters (example) exponential distribution with unknown parameter ¯ ( µ = ¯ ) –f ¯ (x) = (1/ ¯ ) e -x/ ¯ for x ¸ 0 –likelihood function L( ¯ ) –we seek value of ¯ that maximizes L( ¯ ) over all ¯ > 0 –easier to work with its logarithm (maximize l( ¯ ) instead of L( ¯ )) –maximize: set derivative equal to zero and solve for ¯ || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 40

Necessary Steps for fitting a theoretical distribution hypothesize family –summary statistics –histogram –quantile summary & box plots estimate parameters how representative is fitted distribution? –Chi-Square Goodness of fit test –Kolmogorov-Smirnoff Test || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 41

Goodness-of-Fit Tests Statistical hypothesis tests used to assess formally whether the observations X 1, X 2, … X n are independent samples form a particular distribution with distribution function H 0 the X i ’s are IID random variables with distribution function be careful: failure to reject H 0 should not be interpreted as “accepting H 0 as being true”. we’ll concentrate on two different ones –chi-square test –Kolmogorov-Smirnoff tests || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 42

Chi-Square Goodness-of-Fit Test more formal comparison of a histogram with the fitted density or mass function how to –divide range into k adjacent intervals [a 0, a 1 ), [a 1, a 2 ), …, [ a k-1, a k ) how to choose number and size of intervals? ! equiprobable –determine N j (number of X i ’s in the j th interval [a j-1, a j ) –compute p j (expected proportion of the X i ’s that would fall in the j th interval if we were sampling from the fitted distribution –determine test statistic χ² and reject H 0 if its too large || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 43

Chi-Square Goodness-of-Fit Test (cont.) case 1: all parameters of the fitted distribution are known –if H 0 is true, Â 2 converges in distribution (as n → 1 ) to a chi-square distribution with k-1 degrees of freedom –for large n, a test with approximate level ® is obtained by rejecting H 0 if – upper 1 - ® critical point for a chi-square distribution with k-1 dfs || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 44

Chi-Square Goodness-of-Fit Test (cont.) case 2: m parameters had to be estimated to specify fitted distribution –if H 0 is true, then as n ! 1 the distribution function of  2 converges to a distribution function that lies between the distribution function with k-1 and k-m-1 degrees of freedom – the upper 1 - ® critical point of the asymptotic distribution of   (in general not known) –reject H 0 if –do not reject H 0 if –ambiguous situation if recommendation reject H 0 if (conservative) || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 45

Kolmogorov-Smirnov Goodness-of-Fit Test compares an empirical distribution function with the distribution function of the hypothesized distribution –not necessary to group data –valid for any sample size n –tend to be more powerful than chi-squared tests –but: only valid if all parameters of the hypothesized distribution are known and the distribution is continuous || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 46

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) compute tests statistics –define empirical distribution function –test statistic D n corresponds to largest (vertical) distance between F n (x) and hypothesized distribution function of || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 47

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) case 1: all parameters of estimated distribution function are known –distribution of D n does not depend on (if is continuous) –reject H 0 if –c 1- ® (does not depend on n) given in the following table 1 - ® c 1- ® || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 48

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) case 2: –hypothesized distribution is N( ¹, ¾ 2 ) with both ¹ and ¾ 2 unknown (estimated), estimated distribution function –D n is calculated the same way as in case 1 - different critical points –reject H 0 if –c’ 1- ® (does not depend on n) given in the following table 1 - ® c’ 1- ® || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 49

Kolmogorov-Smirnov Goodness-of-Fit Test (cont.) case 3 –hypothesized distribution is exponentially distributed (exp( ¸ )) –with ¸ unknown (estimated using ) –estimated distribution function –reject H 0 if –c’’ 1- ® (does not depend on n) given in the following table 1 - ® c’ 1- ® || WS 2008 || Dr. Verena Schmid || PR KFK PM/SCM/TL Praktikum Simulation I 50