Input Analysis 1.  Initial steps of the simulation study have been completed.  Through a verbal description and/or flow chart of the system operation.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

1 Chi-Square Test -- X 2 Test of Goodness of Fit.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Copyright © 2011 Pearson Education, Inc. Statistical Tests Chapter 16.
Inference for Regression
Selecting Input Probability Distribution. Introduction need to specify probability distributions of random inputs –processing times at a specific machine.
DISTRIBUTION FITTING.
Simulation Modeling and Analysis
Chapter 7 Sampling and Sampling Distributions
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Experimental Evaluation
Inferences About Process Quality
BCOR 1020 Business Statistics
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Introduction to Linear Regression and Correlation Analysis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Verification & Validation
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Modeling and Simulation CS 313
Modeling and Simulation Input Modeling and Goodness-of-fit tests
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 8: Confidence.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Estimating a Population Proportion
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
ENGINEERING STATISTICS I
Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Chapter 10 Verification and Validation of Simulation Models
Chapter Eight: Using Statistics to Answer Questions.
Learning Simio Chapter 10 Analyzing Input Data
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Chapter 13 Understanding research results: statistical inference.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Building Valid, Credible & Appropriately Detailed Simulation Models
BPS - 5th Ed. Chapter 231 Inference for Regression.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Modeling and Simulation CS 313
Chapter Nine Hypothesis Testing.
Logic of Hypothesis Testing
Sampling Distributions and Estimation
Modeling and Simulation CS 313
Chapter 10 Verification and Validation of Simulation Models
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Goodness-of-Fit Tests
CONCEPTS OF ESTIMATION
Discrete Event Simulation - 4
Presentation transcript:

Input Analysis 1

 Initial steps of the simulation study have been completed.  Through a verbal description and/or flow chart of the system operation to be simulated, the random components of the system have been identified.  The needed data has been identified and collected.  For random components input analysis will cover how to specify what distributions should be used in the simulation model. 2

Example 3

Identifying Random System Components  Examples  The amount of variability/randomness is relative to the system. E.g., Machine processing time – not fixed but low variability. 4

Identifying Random System Components  The concept of relative variability for a random variable X, as measured by the coefficient of variation ( CV(X) ) may help. 5

Identifying Random System Components  Consider the single server model with a utilization = Both the service times and interarrival times have a CV=1. The average number in queue = 57 jobs.  If the service time CV=0  If the service time CV =

Identifying Random System Components 7

 If the system contains many different random components, and data collection resources are limited focus on random components with higher CVs. 8

Collecting Data  Collect data that corresponds to what will be simulated.  More is better.  Examples Customer/job arrivals.  Collect the number of customer arrivals or collect customer interarrival times? Job processing times.  What does the time between jobs include? 9

Collecting Data  Through observation of the system  From historical data Must be cautious about using data from databases without a complete understanding of what is included in the data Will often require additional processing 10

Collecting Data Through Observation - Example  Simulation of vehicle traffic at an intersection. Arrivals of cars to the intersection – what will you collect? 11

Collecting Data Through Observation - Planning  Create your data collection scenario  Consider What data will be recorded  Once data is collected  12

Input Analysis  Assume you have “raw” data collected for some random component of the system.  Example: Service time data (minutes)  13

Input Analysis  The random component of the system will be represented in the simulation model as observations from a probability distribution.  14

Input Analysis  This process is called “fitting a distribution”. Fitting a distribution - Selecting and justifying the proper distribution to use in the simulation to represent the random system component.  In general there are two choices for the type of distribution fitted. 15

Input Analysis  When to use empirical distributions. 16

Input Analysis  Empirical distributions Generates observations that match the percentage of different observations in the collected data.  Example – Discrete case 17

Input Analysis  Good features of empirical distributions. Simulated distribution represents actual data collected over a specific period of time, which is good for validation purposes, if performance measures for the same period of time are known.  Bad features of empirical distributions. The data collected only reflects observations over the time period in which it was collected, and may not accurately represent what could occur. 18

Input Analysis  Arena Discrete empirical distribution 19 DISC(0.25, 1, 0.5, 1.25, 0.8, 1.75, 1.0, 2)

Input Analysis  Arena Continuous empirical distribution 20 CONT(0.25, 1, 0.5, 1.25, 0.8, 1.75, 1.0, 2)

Fitting Theoretical Distributions  General procedure 1.Guess/hypothesize an appropriate theoretical distribution with specific parameters. 2.Apply a statistical test to the hypothesis that the observed data are observations from the theoretical distribution from step 1. 3.Repeat if necessary. Software available to quickly perform fitting. 21

Fitting Theoretical Distributions  Step 1 – Suggesting a distribution. Use process knowledge/theory. Descriptive statistics. Histograms.  Process knowledge/Theory Inspections, large population, inspection sample size =n. Parts processed between failures. 22

Fitting Theoretical Distributions  Process knowledge/Theory Customer arrivals to a system from a large population. Normal distribution. 23

Fitting Theoretical Distributions  Descriptive statistics Coefficient of variation (for X continuous) – CV(X)  Different theoretical distributions have different values or ranges of possible values. X~Exponential : CV= 1. X~U(a,b) : X~Lognormal : 0 < CV < ∞. 24

Fitting Theoretical Distributions  Descriptive statistics Lexis ratio (for X discrete = Var(X)/E(X)) denoted τ(X)  Different theoretical distributions have different values or ranges of possible values. X~Geometric(p) : τ(X)= 1/p. X~Binomial(n,p) : τ(X)= 1-p. X~Poisson : τ(X)= 1. 25

Fitting Theoretical Distributions  When data has been collected, the CV or lexis ratio can be estimated from the data using the sample variance/standard deviation, and sample mean. 26

Fitting Theoretical Distributions  Skewness – A measure of symmetry. 27

Fitting Theoretical Distributions  Histograms – A graph of the frequency of observations within predefined, non- overlapping intervals.  It is an estimate of the shape of a random variables probability mass function or density function.  Density functions/probability mass functions tend to have recognizable shapes. 28

Fitting Theoretical Distributions 29

Fitting Theoretical Distributions  Making a histogram 1.Break up the range of values covered by the data into k disjoint intervals [b 0,b 1 ), [b 1,b 2 ),…,[b k-1,b k ). All intervals should be the same width Δb except for the first and last intervals (if necessary). 2.For j = 1,2,…,k, let q j = the proportion of the observations in the jth interval defined in 1. 3.Histogram value at x, h(x) = 0 if x < b 0, q j if b j-1 ≤x<b j, and 0 if x ≥ b k. 30

Fitting Theoretical Distributions  Making histograms Histograms may be changed subjectively by changing the intervals and Δb (the interval size). Guidelines (Hines, et al. 2003)  Keep the number of intervals between 5 and 20.  Set the number of intervals to be about the square root of the number of observations. 31

In-class Exercise  For data collected from different random components of a system. Estimate the CV. Construct a histogram. Suggest a theoretical distribution to represent the random component. 32

In-class Exercise 33

In-class Exercise 34

Parameter Estimation  After suggesting a possible theoretical distribution to represent a random system component, the parameters for that distribution must be estimated from the data ( x 1, x 2, …, x n ).  Different distributions may have a different number of parameters.  Estimating parameters from data for some distributions is intuitive. 35

Parameter Estimation  Attention will be restricted to maximum likelihood estimators or MLEs. Good statistical properties (needed as part of a goodness of fit test).  Examples of MLEs X~Exponential(λ)  λ = 1/E[X]  The MLE for λ is 36

Parameter Estimation  Examples of MLEs X~Uniform(a,b)  X~Normal(μ,σ 2 )  37

Parameter Estimation  Not all MLEs for distribution parameters are so intuitive.  Example: X~Lognormal(σ,μ) where σ>0 is called a shape parameter and μ Є (-∞, ∞) is called a scale parameter. 38

Statistical Goodness of Fit Tests  A distribution has been suggested to represent a random component of a system.  The data has been use to compute estimates (MLE) of the distribution parameters.  Statistical goodness of fit tests determine whether there is enough evidence to say that the data is not well represented by the suggested distribution. 39

Statistical Goodness of Fit Tests  Let be the suggested distribution with parameters θ obtained from the data using MLE estimators.  The statistical goodness of fit tests will evaluate the null hypothesis 40

Statistical Goodness of Fit Tests  Three tests will be reviewed. Chi-square goodness of fit test. Kolmogorov-Smirnov test. Anderson-Darling test. 41

Chi-square Goodness of Fit Test  Intuitively think of this test as a formal way to compare a modified histogram to a theoretical distribution. 42

Chi-square Goodness of Fit Test  Test procedure 1.Divide the entire range of the hypothesized distribution into k adjacent intervals – (a 0,a 1 ), [a 1,a 2 ),…,[a k-1,a k ) where a 0 could be -∞, and a k could be + ∞. 43

Chi-square Goodness of Fit Test  Test procedure 2.Tabulate the number of observations falling in each interval – (a 0,a 1 ), [a 1,a 2 ),…,[a k-1,a k ). Let, N j = The number of observations falling in the jth interval [a j-1,a j ). ∑ N j = n : the total number of observations. 3.Calculate the expected number of observations in each interval from the suggested distribution and estimated parameters. Discrete Continuous 44

Chi-square Goodness of Fit Test  Discrete 45

Chi-square Goodness of Fit Test  Continuous 46

Chi-square Goodness of Fit Test  Test procedure 4.Calculate the test statistic TS 47

Chi-square Goodness of Fit Test  If the distribution tested is a good fit, should TS be large or small?  How do we know what large or small is? What do we compare TS to? What is the sampling distribution of the test statistic. 48

Chi-square Goodness of Fit Test  Result – If MLE’s are used to estimate the m parameters of a suggested distribution, then if H 0 is true, as n→∞, the distribution of TS (the sampling distribution) converges to a Chi-square distribution that lies between two Chi- square distributions with k-1 and k-m-1 degrees of freedom.  If represents the value of the Chi-square distribution ( df = degrees of freedom) such that of the area under the Chi-square distribution lies to the right of this value, then for the corresponding value for the distribution of TS 49

Chi-square Goodness of Fit Test 50 Density Function Plots for the Chi-Square Distribution

Chi-square Goodness of Fit Test  To be conservative, reject H 0 if  Needed values from the chi-square distribution can be found using the CHIINV function in Excel. 51

Chi-square Goodness of Fit Test  As with histograms, there are no precise answers for how to create intervals. 52

Example 53

Example 54

In-class Exercise 55

In-class Exercise 56

In-class Exercise 57

Kolmogorov-Smirnov (K-S) Test  Pros Does not require partitioning of the data into intervals. When the parameters of a distribution are known, it is a more powerful test than the chi-square test (less probability of a type II error). 58

Kolmogorov-Smirnov (K-S) Test  Cons Usually the parameters of a distribution are estimated.  Critical value (those values to compare to a test statistic) have been estimated for some hypothesized distributions using Monte Carlo simulation. Normal Exponential Others (see Law and Kelton reference)  Difficult to use in the discrete case (ARENA Input Analyzer will generate no K-S test results). 59

Kolmogorov-Smirnov (K-S) Test  The parameters known case is used when the distribution parameters are estimated, and estimated critical values are not available. This is conservative – The type I error is smaller than specified. 60

Kolmogorov-Smirnov (K-S) Test  The K-S test uses the empirical probability distribution function generated from the data as a basis for computing a test statistic. Suppose there are n data observations collected, x 1,…,x n. The empirical distribution function F n (x) is This is a right continuous step function. 61

Example 62

Kolmogorov-Smirnov (K-S) Test  Intuitive description of the K-S test – Let be the distribution function of the suggested distribution. Overlay on F n (x) and find the maximum absolute difference.  K-S test statistic D n: 63

Kolmogorov-Smirnov (K-S) Test 64

Kolmogorov-Smirnov (K-S) Test  Test – Reject if 65

66

67

In-class Exercise 68 Conduct a K-S test to test the hypothesis that the data are observations from a U(0,10) distribution.

Anderson-Darling Test  Think of as a weighted K-S test.  More weight given to differences when the hypothesized probabilities are low – similar to percentage differences vs. absolute differences 69

Anderson-Darling Test  Critical values have been estimated for some hypothesized distributions 70

Goodness of Fit Tests  If a p-value is computed Low p-values (< 0.05) indicate a poor fit. 71

Assessing Independence  It has been assumed that all observations of data for some random system component are independent.  The chi-square, K-S, and A-D tests are testing for independent identically distributed random variables.  Most discrete event simulation models and software assume random components are independent. 72

Assessing Independence  When presenting Monte Carlo simulation we talked about correlation. Two or more random components of a simulation (e.g., engineering economic calculation) may be correlated.  E.g., Repair costs in year one and year two.  There was no simulated time component in these simulations.  In discrete event simulation correlation may occur, Between different random components. Within the same random component. 73

Assessing Independence  Examples Different random components.  Within the same random component.  74

Recognizing Non-independence  Method 1 – Correlation estimates and hypothesis testing. E.g., Rank correlation estimates.  Method 2 – Correlation plot.  Method 3 – Scatter diagram. 75

Recognizing Non-independence  Let x 1, x 2,…, x n be collected data listed in the time order of collection.  Sample correlation  As j approaches n this estimate is very poor since the number of values used for the estimate is small. 76

Recognizing Non-independence  Scatter plot. Plot of ( x i, x i+1 ) or ( x i,y i ). Visual subjective interpretation. 77

Recognizing Non-independence  Correlation plot – Applicable when examining correlation within a single random component. Plot C j vs. j. 78

Example  Wait in queue. Correlated? Single server system – Utilization =

Example – Correlation Estimates 80

Example – Scatter Diagram 81

Example Scatter Diagram – Zero Correlation 82

83

Recognizing Non-independence  If you determine that correlations exist, what do you do? 84

What to do with Minimal Information?  Suppose you cannot collect data or that none is available.  Suggested no data distributions. 85

What to do with Minimal Information?  Getting information about distribution parameters. 86

Non-stationary Arrival Processes  A very common feature of many systems (in particular service systems) is that the arrival rate (e.g., customers) varies as a function of time. The arrival process is called non-stationary. 87

Non-stationary Arrival Processes  A practical method to approximate a non- stationary arrival process is to use a “piecewise constant” rate function.  Over predefined time intervals assume a constant (and different) arrival rate. 88

Non-stationary Arrival Processes  Example – A very common arrival process used in simulation is the Poisson process. 89

Non-stationary Arrival Processes  Implementation of a non-stationary Poisson process. Common but incorrect approach.  Have the mean interarrival time (or interarrival rates) be represented with a variable. Change the value of this variable for each interval. 90

Non-stationary Arrival Processes  What’s wrong with this approach? 91

Non-stationary Arrival Processes  One correct approach for implementing a non- stationary Poisson Process is called thinning. 92

Non-stationary Arrival Processes 93

Non-stationary Arrival Processes  Arena implements a correct non- stationary Poisson arrival process. 94

Non-stationary Arrival Processes 95