0K. Salah Input Analysis Ref: Law & Kelton, Chapter 6.

Slides:



Advertisements
Similar presentations
1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Selecting Input Probability Distribution. Introduction need to specify probability distributions of random inputs –processing times at a specific machine.
Sampling Distributions (§ )
Continuous Probability Distributions.  Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height.
Chapter 8 Random-Variate Generation
Chapter 8 Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
DISTRIBUTION FITTING.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Probability Densities
Simulation Modeling and Analysis
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Horng-Chyi HorngStatistics II127 Summary Table of Influence Procedures for a Single Sample (I) &4-8 (&8-6)
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Probability Distributions Random Variables: Finite and Continuous A review MAT174, Spring 2004.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Chapter 5 Continuous Random Variables and Probability Distributions
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
Market Risk VaR: Historical Simulation Approach
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
Chapter 4 Continuous Random Variables and Probability Distributions
SIMULATION MODELING AND ANALYSIS WITH ARENA
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Chapter 6 The Normal Probability Distribution
Graduate Program in Engineering and Technology Management
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Input Analysis 1.  Initial steps of the simulation study have been completed.  Through a verbal description and/or flow chart of the system operation.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
0 Simulation Modeling and Analysis: Input Analysis K. Salah 8 Generating Random Variates Ref: Law & Kelton, Chapter 8.
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
Modeling and Simulation CS 313
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Modeling and Simulation Input Modeling and Goodness-of-fit tests
Traffic Modeling.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Random Sampling, Point Estimation and Maximum Likelihood.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
CPSC 531:Input Modeling Instructor: Anirban Mahanti Office: ICT 745
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Learning Simio Chapter 10 Analyzing Input Data
Basic Business Statistics
Sampling and estimation Petter Mostad
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
Modeling and Simulation CS 313
Modeling and Simulation CS 313
Data Analysis for Two-Way Tables
Goodness-of-Fit Tests
Discrete Event Simulation - 4
Modelling Input Data Chapter5.
Chapter 8 Random-Variate Generation
Summary Table of Influence Procedures for a Single Sample (I)
Applied Statistics and Probability for Engineers
Presentation transcript:

0K. Salah Input Analysis Ref: Law & Kelton, Chapter 6

1K. Salah Driving simulation models Stochastic simulation models use random variables to represent inputs such as inter-arrival times, service times, probabilities of storms, proportion of ATM customers making a deposit. We need to know the distribution family and parameters of each of these random variables. Some methods to do this: Collect real data, and feed this data to the simulation model. This is called trace- driven simulation. Collect real data, build an empirical distribution of the data, and sample from this distribution in the simulation model. Collect real data, fit a theoretical distribution to the data, and sample from that distribution in the simulation model. We will examine the last two of these methods.

2K. Salah Why Is This an Issue? The above graph shows service times (LOS) for a hospital in Ontario. How would you model a patient’s length of stay? N = 546

3K. Salah Option 1: Trace We could attempt a trace. In this scheme, we would hold this list of numbers in a file. When we generate the first patient, we would assign him or her an LOS of 4; the 2 nd 1; the 3 rd 7; … Traces have the advantage of being simple and reproducing the observed behaviour exactly. Traces don’t allow us to generate values outside of our sample. Our sample is usually of a limited size, meaning that we may not have observed the system in all states.

4K. Salah Option 2: Empirical Distribution A second idea might be to use an empirical distribution to model LOS. We will use the LOS and cumulative frequency as an input to our model. For instance, assume we pick a random number (say.62) as F(x). This corresponds to x of between 3 and 4 (~3.3 by interpolation). The LOS will represent x, and the cumulative frequency represents F(x). Empirical distributions may however, be based on a small sample size and may have irregularities. The empirical distribution cannot generate an x value outside of the range [lowest observed value, highest observed value].

5K. Salah Option 3: Fitted Distribution

6K. Salah Why Theoretical Distributions? Theoretical distributions “smooth out” the irregularities that may be present in trace and empirical distributions. Gives the simulation the ability to generate wider range of values. –Test extreme conditions. There may be a compelling reason to use a theoretical distribution. Theoretical distributions are a compact way to represent very large datasets. Easy to change, very practical.

7K. Salah Describing Distributions A probability distribution is described by its family and its parameters. The choice of family is usually made by examining the density function, because this tends to have a unique shape for each distribution family. Distribution parameters are of three types: location parameter(s)  scale parameter(s)  shape parameter(s)  x f(x) 11 33 22 x  1 1  2 2  3 3 x  1 1  2 2  3 3

8K. Salah Examples: continuous distributions x f(x) b a Uniform distribution (a is the location parameter; (b-a) is the shape parameter) Uses:1 st model in which only a, b are known. Essential for generation of other distributions.

9K. Salah Examples: continuous distributions x f(x) Exponential distribution (one scale parameter) Uses:Inter-arrival times. Notes:Special case of Weibull and Gamma (α = 1, β = β) If X 1, X 2, …, X m are independent expo(β) then X 1 + X 2 + …+ X m is distributed as an m-Erlang or gamma(m, β)

10K. Salah Examples: continuous distributions... Gamma distribution (one scale parameter,one shape parameter) x  =1  =2  =3 f(x)  =1 Uses:Task completion time. Notes:For positive integer (m) gamma (m, β) is an m-erlang. If X 1, X 2, …, X n are independent gamma(α i,β) then X 1 + X 2 + …+ X n is distributed as an gamma (α 1 + α 2 +…+ α n, β)

11K. Salah Examples: continuous distributions... Weibull distribution (one scale parameter,one shape parameter) x  =1  =2 f(x)  =1 Uses:Task completion time; equipment failure time Notes:The expo(β) and the Weibull(1, β) are the same distribution.

12K. Salah Examples: continuous distributions... Normal distribution (one scale parameter,one shape parameter) x f(x) Uses:Errors from a set point. Quantities that are the sum of a large number of other quantities Notes:

13K. Salah Examples: discrete distribution Bernoulli distribution x p(x) p 1-p 0 1 Uses:Outcome of an experiment that either succeeds or fails. Notes:If X 1, X 2, …, X t are independent Bernoulli trials, then X 1 +X 2 + … + X t is binomially distributed with parameters (t, p).

14K. Salah Examples: discrete distribution Binomial Distribution x p(x) p 01 Uses:Number of defectives in a batch of size t. Notes:If X 1, X 2, …, X m are independent and distributed bin(t i,p), then X 1 +X 2 + … + X m is binomially distributed with parameters (t 1 + t 2 + …+ t m, p). The binomial distribution is symmetric only if p = t = 5 p = 0.5

15K. Salah Poisson Pareto

16K. Salah Selecting an Input Distribution - Family The 1 st step in any input distribution fit procedure is to hypothesize a family (i.e. exponential). Prior knowledge about the distribution and its use in the simulation can provide useful clues. –Normal shouldn’t be used for service times, since negative values can be returned. Mostly, we will use heuristics to settle on a distribution family.

Summary Statistics

18K. Salah Summary Example Consider the following sample CV suggests NOT exponential Conclusion: Gamma? Weibull? Beta? Skew suggests a left skewed family. Continuous data

19K. Salah Draw a Histogram Law & Kelton suggest drawing a histogram. Use the plot to “eyeball” the family. Law and Kelton suggest trying several plots with varying bar width. Pick a bar width such that the plot isn’t too boxy, nor too scraggly.

20K. Salah Histogram Example – I

21K. Salah Histogram Example – II

22K. Salah Histogram Example – III

23K. Salah Sturge’s Rule Select k (# bins): Int( log 10 n), where n = number of samples. We will guess at a gamma distribution.

24K. Salah Parameter Estimation To estimate parameters for our distribution, we use the Maximum Likelihood Estimators (MLE’s) for the selected distribution. For a gamma we’ll use the following approximation. Calculate T: Use table 6.20 to obtain α. α = Calculate B:

25K. Salah A Note on MLEs Maximum likelihood estimator (MLE) Method for determining parameters of a hypothesized distribution. We assume that we have collected n IID observations. We define the likelihood function: The MLE is simply the value of theta that maximizes L(θ) over all values of θ. In simple terms, we want to pick an MLE that will give us a good estimate of the underlying parameter of interest.

26K. Salah MLE for an Exponential Distribution We want to estimate ß: So, the problem is to maximize the rhs of the above equation. To make things simpler we’ll maximize ln(L(ß)) Take the 1 st derivative and set = 0. Solve for ß In general the MLEs are difficult to calculate for most distributions. See Chapter 6 of Law and Kelton.

27K. Salah Determining the “Goodness” of the Model As you might imagine, determine whether our hypothesize model is “good” is fraught with difficulties. Law and Kelton suggest both heuristic and analytical tests to determine “goodness”. Heuristic tests: –Density/Histogram over plots. –Frequency comparisons. –Distribution function difference plots. –Probability-Probability Plots (P-P plots). Analytical tests: –Chi-Squared tests. –Kolmogorov-Smirnov tests

28K. Salah Frequency Comparisons Cum. Distn Functions

29K. Salah Distribution Function Difference Plot Plot the CDF of the hypothesized distribution – the CDF as observed in the data. If the fit is perfect, this plot should be a straight line at 0. L&K suggest the difference should be less than.10 for all points.

30K. Salah P-P Plot Note: We are plotting F(x) vs. F

31K. Salah Q-Q Plot L&K note that Q-Q plots tend to emphasize errors in the tail. Our fitted distribution doesn’t look appropriate in the upper tail.

32K. Salah Analytic Test – Chi Square Test 1.Divide the range into k adjacent intervals. 2.N j = # of X i ’s in j th interval. 3.Determine the proportion of X i ’s that should fit into the j th interval. For continuous data equal-probability approach is recommended. Pj’s are set to be equal values For continuous R.V.; For discrete R.V.; Pj; Total probability that the random var. will take values in jth interval = P(X=x| x>=aj-1, x<=aj)

33K. Salah Analytic Test – Chi Squared Test 1.The critical test statistic is: 2.A number of authors suggest that the intervals be grouped such that E i is >= 5 in all cases. 3.H 0 : X conforms to the assumed distribution. H 1 : X does not conform to the assumed distribution. Reject if X 0 2 > X 2 k-1,1-α

34K. Salah Example

35K. Salah Example 6.15

36K. Salah Example 6.15 aj’s ObservedExpected k # of intervals Less than the critical value

37K. Salah Example 6.16

38K. Salah Example 6.16 Determine the intervals so that Pj’s are more or less equal Less than the critical value

39K. Salah P value Alpha Critical valueTest statistic calculated Chi-square density with k-1 d.f. P value; Cumulative Prb. to the right of the test stat. if P value > alpha Don’t reject Ho

40K. Salah The Kolomogorov-Smirnov (K-S) Test The K-S test compares an empirical distribution function(F^(x)) with a hypothesized distribution function (F n (x)). The K-S test is somewhat stronger than the Chi-Square test. This test doesn’t require that we aggregate any of our samples into a minimum batch size. We define a test statistic D n : The largest value over all x

41K. Salah K-S Example Let’s say we had collected 10 samples (0.09, 0.23, 0.24, 0.26, 0.36, 0.38, 0.55, 0.62, 0.65, and 0.76) and have developed an emprical cdf. We want to test the hypothesis that our sample is drawn from a U(0,1) distribution f(x) = 1 0 F ^ (x)

42K. Salah K-S Example F ^ (x) F n (x)

43K. Salah K-S Example D n is simply the largest of the gaps between F ^ (x) and F n (x). Remember – we need to find the D n - and D n + at every discontinuity

44K. Salah K-S Example SamplexF^(x) (Fit Distribution) F n (x) (Empirical) Dn-Dn- Dn+Dn Our D n is the largest of the D n -, D n + columns. In this case D n = 0.25.

45K. Salah K-S Test The value of D n when appropriately adjusted (see next slide), is found to be < the critical point (1.224) for  = Thus we cannot reject H 0.

46K. Salah Adjusted Critical Values Law and Kelton present critical values for the K-S tables that are non- standard. They use a compressed table based on work from Stevens (1962). CaseMulitply D n by All parameters in F^(x) known Normal Dist n Expo Dist n For tables of critical values see pgs of L&K