Graduate Program in Engineering and Technology Management

Slides:

Advertisements

Similar presentations

Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.

Advertisements

1 Chi-Square Test -- X 2 Test of Goodness of Fit.

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Outline input analysis input analyzer of ARENA parameter estimation

Eastern Mediterranean University Department of Industrial Engineering IENG461 Modeling and Simulation Systems Computer Lab 2 nd session ARENA (Input Analysis)

Sampling Distributions (§ )

1 The Output Analyzer Separate application, also accessible via Tools menu in Arena Reads binary files saved by Arena Various kinds of output-data display,

Chapter 8 Random-Variate Generation

Chapter 8 Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.

1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.

DISTRIBUTION FITTING.

A Review of Probability and Statistics

Simulation Modeling and Analysis

Agenda Purpose Prerequisite Inverse-transform technique

A Summary of Random Variable Simulation Ideas for Today and Tomorrow.

Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides

Inferences About Process Quality

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.

Lecture 6 Data Collection and Parameter Estimation.

SIMULATION MODELING AND ANALYSIS WITH ARENA

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.

Chapter 5 Modeling & Analyzing Inputs

Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.

Input Analysis 1.  Initial steps of the simulation study have been completed.  Through a verbal description and/or flow chart of the system operation.

Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 

Chapter 4 Continuous Random Variables and their Probability Distributions The Theoretical Continuous Distributions starring The Rectangular The Normal.

0 Simulation Modeling and Analysis: Input Analysis K. Salah 8 Generating Random Variates Ref: Law & Kelton, Chapter 8.

Chapter 5 Statistical Models in Simulation

Modeling and Simulation CS 313

Modeling and Simulation Input Modeling and Goodness-of-fit tests

Chapter 9 Input Modeling

Traffic Modeling.

Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.

2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.

CPSC 531:Input Modeling Instructor: Anirban Mahanti Office: ICT 745

1 Statistical Distribution Fitting Dr. Jason Merrick.

Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)

CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.

Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.

1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.

קורס סימולציה ד " ר אמנון גונן 1 ההתפלגויות ב ARENA Summary of Arena’s Probability Distributions Distribution Parameter Values Beta BETA Beta, Alpha Continuous.

MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.

Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.

ETM 607 – Input Modeling General Idea of Input Modeling Data Collection Identifying Distributions Parameter estimation Goodness of Fit tests Selecting.

Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.

Goodness-of-Fit Chi-Square Test: 1- Select intervals, k=number of intervals 2- Count number of observations in each interval O i 3- Guess the fitted distribution.

Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.

Learning Simio Chapter 10 Analyzing Input Data

Chapter 8 Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.

Chapter 9 Input Modeling

Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.

2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. In this chapter, we.

MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.

Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.

Modeling and Simulation CS 313

ASV Chapters 1 - Sample Spaces and Probabilities

Chapter 4 Continuous Random Variables and Probability Distributions

Modeling and Simulation CS 313

Chapter 7: Sampling Distributions

Subject Name: SYSTEM MODELING AND SIMULATION [UNIT-7 :INPUT MODELING]

CPSC 531: System Modeling and Simulation

Statistical Methods Carey Williamson Department of Computer Science

Discrete Event Simulation - 4

Chapter 8 Random-Variate Generation

Presentation transcript:

Graduate Program in Engineering and Technology Management INPUT modeling Simulation-4 Aslı Sencer

Steps of input modeling Collect data from real system of interest Requires substantial time and effort Use expert opinion in case of no sufficient data Identify a probability distribution to represent the input process Draw frequency distribution, histograms Choose a family of theoretical distribution Estimate the parameters of the selected distribution Apply goodness-of-fit tests to evaluate the chosen distribution and the parameters Chi-square tests Kolmogorov Smirnov Tests If these tests are not justified, choose a new theoretical distribution and go to step 3! If all theoretical distributions fail, then either use emprical distribution or recollect data.

Step 1: Data Collection includes lots of difficulties Nonhomogeneous interarrival time distribution; distribution changes with time of the day, days of the week, etc. You can’t merge all these data for distribution fitting! Two arrival processes might be dependent; like demand for washing machines and dryers. You shouldn’t treat them seperately! Start and end of service durations might not be clear; You should split the service into well defined processes! Machines may breakdown randomly; You should collect data for up and down times!

Step 2.1: Identify the Probability Distribution Raw Data 10 8 5 1 6 4 2 3 9 7 11 Histogram with Discrete Data Arrivals per period Frequency 12 1 10 2 19 3 17 4 5 8 6 7 9 11

Step 2.1: Identify the Probability Distribution Raw Data Histogram with Continuous Data 79.919 3.081 0.062 1.961 5.845 3.027 6.505 0.021 0.013 0.123 6.769 59.899 1.192 34.760 5.009 18.387 0.141 43.565 24.420 0.433 144.695 2.663 17.967 0.091 9.003 0.941 0.878 3.148 2.157 7.579 0.624 5.380 3.371 7.078 23.960 0.590 1.928 0.300 0.002 0.543 7.004 31.764 1.005 1.147 0.219 3.217 14.382 1.008 2.336 4.562 Component Life (days) Frequency [0-3) 23 [3-6) 10 [6,9) 5 [9-12) 1 [12-15) [15-18) 2 [18-21) [21-24) [24-27) [27-30) [30-33) [33-36) ... [42-45) [57-60) [78-81) [144-147)

Step 2.2: Selecting the family of distributions The purpose of preparing a histogram is to infer a known pdf or pmf. This theoretical distribution is used to generate random variables like interarrival times and service times during simulation runs. Exponential, normal and poisson ditributions are frequently encountered and are not difficult to analyze. Yet there are beta, gamma and weibull families that provide a wide variety of shapes.

Applications of Exponential Distribution Used to model time between independent events, like arrivals or breakdowns Inappropriate for modeling process delay times

Applications of Poisson Distribution Discrete distribution, used to model the number of independent events occuring per unit time, Eg. Batch sizes of customers and items If the time betweeen successive events is exponential, then the number of events in a fixed time intervals is poisson.

Applications of Beta Distribution: Often used as a rough model in the absence of data Represent random proportions Can be transformed into scaled beta sample Y=a+(b-a)X

Applications of Erlang Distribution Used to represent the time required to complete a task which can be reprsented as the sum of k exponentially distributed durations. For large k, Erlang approaches normal distribution. For k=1, Erlang is the exponential distribution with rate=1/β. Special case of gamma distribution in which α, the shape parameter of gamma distribution is k.

Applications of Gamma Distribution Used to represent time required to complete a task Same as Erlang distribution when the shape parameter α is an integer.

Applications of Johnson Dist. Flexible domain being bounded or unbounded allows it to fit many data sets. If δ>0, the domain is bounded If δ<0, the domain is unbounded

Applications of Lognormal Distribution Used to represent quantities which is the product of large number of random quantities Used to represent task times which are skewed to right. If X~LOGN( ), then lnX ~NORM(μ,σ)

Applications of Weibull Distribution Widely used in reliability models to represent lifetimes. If the system consists of large number of parts that fail independently, time between successive failures can be Weibull. Used to model nonnegative task times that are skewed to left. It turns out to be exponential distribution when =1.

Applications of Continuous Empirical Distribution Used to incorporate empirical data as an alternative to theoretical distribution, when there are multimodes, significant outliers, etc.

Applications of Discrete Empirical Distribution Used for discrete assignments such as job type, visitation sequence or batch size

Step 3: Estimate the parameters of the selected distribution A theoretical distribution is specified by its parameters that are obtained from the whole population data. Ex: Let V,W,X,Y,Z be random variables, then V~N(µ,σ2), where µ is the mean and σ2 is the variance. W~Poisson (λ), where λ is the mean X~Exponential (β), where β is the mean Y~Triangular (a,m,b), where a, m,b are the minimum,mod and the maximum of the data Z~Uniform (a,b), where a and b are the minimum and maximum of the data These parameters are estimated by using the point estimators defined on the sample data

Step 3: Estimate the parameters of the selected distribution Sample mean and the sample variance are the point estimators for the population mean and population variance Let Xi; i=1,2,...,n iid random variables (raw data are known) , then the sample mean and sample variance s2 are calculated as Discrete Raw Data Continuous Raw Data 10 8 5 1 6 4 2 3 9 7 11 79.919 3.081 0.062 1.961 5.845 3.027 6.505 0.021 0.013 0.123 6.769 59.899 1.192 34.760 5.009 18.387 0.141 43.565 24.420 0.433 144.695 2.663 17.967 0.091 9.003 0.941 0.878 3.148 2.157 7.579 0.624 5.380 3.371 7.078 23.960 0.590 1.928 0.300 0.002 0.543 7.004 31.764 1.005 1.147 0.219 3.217 14.382 1.008 2.336 4.562

Step 3: Estimate the parameters of the selected distribution If the data are discrete and have been grouped in a frequency distribution, i.e., the raw data are not known, then where k is the number of distinct values of X and fj; j=1,2,...,k is the observed frequency of the value Xj of X. Arrivals per period Frequency 12 6 7 1 10 5 2 19 8 3 17 9 4 11

Step 3: Estimate the parameters of the selected distribution If the data are discrete or continuous and have been grouped in class intervals, i.e., the raw data are not known, then where fj; j=1,2,...,c is the observed frequency of the jth class interval and mj is the midpoint of the jth interval. Component Life (days) Frequency [0-3) 23 [21-24) 1 ... [3-6) 10 [24-27) [57-60) [6,9) 5 [27-30) [9-12) [30-33) [78-81) [12-15) [33-36) [15-18) 2 [144-147) [18-21) [42-45)

Step 3: Estimate the parameters of the selected distribution The minimum, mod (i.e., data value with the highest frequency) and maximum of the population data are estimated from the sample data as Xt is the data value that has the highest frequency.

Step 4: Goodness of fit test Goodness of fit tests (GFTs) provide helpful guidance for evaluating the suitability of the selected input model as a simulation input. GFTs check the discrepancy between the emprical and the selected theoretical distribution to decide whether the sample is taken from that theoretical distribution or not. The role of sample size, n: If n is small, GFTs are unlikely to reject any theoretical distribution, since discrepancy is attributed to the sampling error! If n is large, then GFTs are likely to reject almost all distributions.

Step 4: Goodness of fit tests Chi square test Chi square test is valid for large sample sizes and for both discrete and continuous assumptions when parameters are estimated with maximum likelihood. Hypothesis test: Ho: The random variable X conforms to the theoretical distribution with the estimated parameters Ha: The random variable does NOT conform to the theoretical distribution with the estimated parameters We need a test statistic to either reject or fail to reject Ho. This test statistic should measure the discrepency between the theoretical and the emprical distribution. If this test statistic is high, then Ho is rejected, Otherwise we fail to reject Ho! (Hence we accept Ho)

Step 4: Goodness of fit tests Chi square test Test statistic: Arrange n observations into a set of k class intervals or cells. The test statistic is given by where Oi is the observed frequency in the ith class interval and Ei is the expected frequency in the ith class interval. where pi is the theoretical probability associated with the ith class, i.e., pi =P(random variable X belongs to ith class).

Step 4: Goodness of fit tests Chi square test Recommendations for number of class intervals for continuous data It is suggested that . In case it is smaller, then that class should be combined with the adjacent classes. Similarly the corresponding Oi values should also be combined and k should be reduced by every combined cell. Sample Size, n Number of Class Intervals k 20 Do not use chi-square test 50 5-10 100 10 to 20 >100 to n/5

Step 4: Goodness of fit tests Chi square test Evaluation Let α =P(rejecting Ho when it is true); the significance level is 5%. If probability of the test statistic < α, reject Ho and the distribution otherwise, fail to reject Ho. follows the chi-square distribution with k-s-1 degress of freedom, where s is the number of estimated parameters. Fail to Reject Ho Reject Ho

Chi-square distribution table (k-s-1) α 𝜒 𝛼,𝑘−𝑠−1 2

Step 4: GFT - chi square test Ex: poisson distribution Consider the discrete data we analyzed in step 2. Ho: # arrivals, X~ Poisson (λ=3.64) Ha: ow λ is the mean rate of arrivals, =3.64 The following probabilities are found by using the pmf P(0)=0.026 P(6)=0.085 P(1)=0.096 P(7)=0.044 P(2)=0.174 P(8)=0.020 P(3)=0.211 P(9)=0.008 P(4)=0.192 P(10)=0.003 P(5)=0.140 P(>11)=0.001

Step 4: GFT - chi square test Ex: poisson distribution Calculation of the chi-square test statistic with k-s-1=7-1-1=5 degrees of freedom and α=0,05. So, Ho is rejected!

Step 4: GFT - chi square test Ex: arena input analyzer Distribution Summary Distribution: Normal Expression: NORM(225, 89) Square Error: 0.037778 Chi Square Test Number of intervals = 12 Degrees of freedom = 9 Test Statistic = 1.22e+004 Corresponding p-value < 0.005 Data Summary Number of Data Points = 27009 Min Data Value = 1 Max Data Value = 1.88e+003 Sample Mean = 225 Sample Std Dev = 89 Histogram Summary Histogram Range = 0.999 to 1.88e+003 Number of Intervals = 40 Reject Normal distribution at 5% significance level! Fit all summary Function Sq Error ----------------------- Normal 0.0506 Gamma 0.0625 Beta 0.0639 Erlang 0.0673 Weibull 0.079 Lognormal 0.0926 Exponential 0.286 Triangular 0.311 Uniform 0.36

Step 4: GFT - chi square test Ex: arena input analyzer Distribution Summary Distribution: Lognormal Expression: 2 + LOGN(145, 67.9) Square Error: 0.000271 Chi Square Test Number of intervals = 4 Degrees of freedom = 1 Test Statistic = 207 Corresponding p-value < 0.005 Data Summary Number of Data Points = 21547 Min Data Value = 2 Max Data Value = 6.01e+003 Sample Mean = 146 Sample Std Dev = 79.5 Histogram Summary Histogram Range = 2 to 6.01e+003 Number of Intervals = 40 Reject Lognormal distribution at 5% significance level!

Step 4: GFT - chi square test Ex: arena input analyzer Distribution Summary Distribution: Weibull Expression: 0.999 + WEIB(94.7, 0.928) Square Error: 0.002688 Chi Square Test Number of intervals = 20 Degrees of freedom = 17 Test Statistic = 838 Corresponding p-value < 0.005 Data Summary Number of Data Points = 12418 Min Data Value = 1 Max Data Value = 1.47e+003 Sample Mean = 108 Sample Std Dev = 135 Histogram Summary Histogram Range = 0.999 to 1.47e+003 Number of Intervals = 40 Reject Weibull distribution at 5% significance level!

Step 4: Goodness of fit tests Drawbacks of Chi-square GFT The Chi-square test uses the estimates of the parameters obtained from the sample that decreases the degrees of freedom. Chi-square test requires the data to be placed in class intervals in the continuous distributions where these classes are arbitrary and affects the value of the chi-square test statistic. The distribution of the chi-square test statistic is known approximately and the power of the test (probability of rejecting an incorrect theoretical distribution) is sometimes low. Hence other GFTs are also needed!

Step 4: Goodness of fit tests Kolmogorov-Smirnov test Useful when the sample sizes are small and when no parameters are estimated from the sample data. Compares the cdf of the theoretical distribution, F(x) with the emprical cdf, SN(x) of the sample of N observations. Hypothesis test: Ho: Data follow the selected pdf Ha: Data do NOT follow the selected pdf Test Statistic: The largest deviation, D between F(x) and SN(x).

Step 4: Goodness of fit tests Kolmogorov-Smirnov test Steps of K-S Test: Rank the data so that Calculate the maximum discrepancy D between F and SN,

Step 4: Goodness of fit tests Kolmogorov-Smirnov test If F is discrete , where If F is continuous

Step 4: Goodness of fit tests Kolmogorov-Smirnov test Evaluation

Step 4: Goodness of fit tests Example: Kolmogorov-Smirnov test Consider the data: 0.44, 0.81, 0.14, 0.05, 0.93 Ho: Data are uniform between (0,1) Ha: ow i 1 2 3 4 5 0.05 0.14 0.44 0.81 0.93 0.20 0.40 0.60 0.80 1.00 0.15 0.26 0.16 - 0.07 0.04 0.21 0.13 Since D=0.26 < = 0.565 Ho is not rejected! Data are uniform between (0,1)