Download presentation
1
Learning Simio Chapter 10 Analyzing Input Data
2
Outline Working with various types of data.
Fitting distributions to data. Summary of common distributions. Modeling customer arrivals. Modeling task times. Sensitivity of results to data. In this chapter we will discuss input data and its role in your model. We will discuss some common distributions and their appropriate use. Chapter 10
3
Model Input Data A model has both structure and input data.
Both the model structure and the input data have a significant impact on the results. The data can be a problematic aspect of a modeling project. Chapter 10
4
Typical Data Cases No data exists. Data exists in the wrong form.
Lots of good data exists. Chapter 10
5
No data exists Consider using the Triangular or Pert distributions (minimum, mode, maximum) for activity times. Hypothesize distributions based on the underlying processes, and make educated guesses for the parameters. Run experiments to test sensitivity of results to the parameters. Don’t use a mean in place of a distribution. Chapter 10
6
Data exists in the wrong form.
Data observed from a different real-world process. Time between failures when failures are count based. Time to repair when repairs are resource constrained. Data recorded during a “slow time” or a “busy time Values from multiple processes with no discriminatory information (e.g., repair times without noting the type of stoppage). Use the data that does exist to make intelligent guesses for the required data. Chapter 10
7
Lots of data exists If a large amount of data is available an empirical distribution may be used – however a theoretical distribution is preferred (compact, fast, easy to change). If possible, hypothesize a distribution based on the underlying process (combine data and theory). Use goodness of fit software to test the hypothesis and estimate the parameters. Chapter 10
8
Data Fitting Procedure
Assess IID assumptions. Independent observations. Identically distributed. Use software to view the data using a histogram Hypothesize a distribution family/form. Use software to: Estimate distribution parameters Assess quality of fit Chapter 10
9
Sample Data Sets Chapter 10
Suppose that you have a set of observed values of the phenomenon for which you’re developing an input model. One of our “first steps” is to use a frequency histogram to get an idea of the probability mass/density function and the “general shape” of the distribution. Chapter 10
10
Common Distributions Binomial – Models the number of successes in n trials, when the trials are independent with common success probability, p; for example; the number of defective computer chips found in a lot of n chips. Negative Binomial – Models the number of trials required to achieve k successes; for example, the number of computer chips that we must inspect to find 4 defective chips. Poisson – Models the number of independent events that occur in a fixed amount of time or space; for example, the number of customers that arrive to a store during 1 hour, or the number of defects found in 30 square meters of sheet metal. Normal – Models the distribution of a process that can be thought of as the sum of a number of component processes; for example, a time to assemble a product that is the sum of times required for each assembly operation. Lognormal – Models the distribution of a process that can be thought of as the product of a number of component processes; for example, the rate of an investment, when interest is compounded, is the product of the returns for a number of periods. Banks et al., pp Chapter 10
11
Common Distributions Exponential – Models the time between independent events, or a process time that is memoryless; for example, the times between the arrivals from a large population of potential customers who act independently of each other. The exponential is a highly variable distribution; it is sometime overused because it often leads to mathematically tractable models. Recall that, if the time between events is exponentially distributed, then the number of events in a fixed period of time is Poisson. Gamma – An extremely flexible distribution used to model nonnegative random variables (can be shifted away from 0 by adding a constant). Beta – An extremely flexible distribution used to model bounded random variables. The beta can be shifted away from 0 by adding a constant and can be given a range larger than [0, 1] by multiplying by a constant. Erlang – Models processes that can be viewed as the sum of several exponentially distributed processes; for example, a computer network fails when a computer and two backup computers fail, and each has a TTF that is exponentially distributed. Banks et al., pp Chapter 10
12
Common Distributions Weibull – Models the time to failure for components; for example, the time to failure for a disk drive. The exponential is a special case of the Weibull. Discrete or Continuous Uniform – Models complete uncertainty: All outcomes are equally likely. This distribution is often used inappropriately, when there are no data. Triangular – Models a process for which only the minimum, most likely, and maximum values of the distribution are known; for example, the minimum, most likely, and maximum time required to test a product. This model is a marked improvement over the uniform distribution [in many cases]. Pert – A special case of the Beta with minimum, most likely, and maximum values. The pert provides a “smooth” alternative to the triangular in the absence of data. Empirical – Samples from the distribution of the actual data collected; often used when no theoretical distribution seems appropriate. Banks et al., pp Chapter 10
13
Goodness-of-fit (GOF) Tests
Statistical hypothesis tests that are used to assess formally whether the observations X1, X2, …, Xn constitute an independent sample from a particular distribution function Hypothesis: H0: The Xi’s are IID random variables with the specified distribution function. Chapter 10
14
GOF Test Considerations
Failure to reject the null hypothesis should not be interpreted as “accepting H0 as being true.” GOF tests are not very powerful for small-to-moderate sample sizes. Also, when n is large, the tests will often reject H0 since even minute differences will be detected. Chapter 10
15
Some GOF Software Options
General packages EasyFit ( Simulation specific packages Stat::Fit ( ExpertFit ( Chapter 10
16
Modeling Arrivals If arrivals are independent and random, they follow a Poisson process. The number of arrivals in a fixed time is Poisson. The time between arrivals is exponential. In some cases the arrival rate may vary over time – Simio supports step-wise linear arrival rates using a Rate Table. Chapter 10
17
Modeling Task Times Use a distribution with a range >= 0 (e.g. not the Normal or JohnsonUB). In the absence of data Triangular and Pert are possible choices. With supporting data the Gamma, LogNormal, Weibull, LogLogisitc, Beta, PearsonIV, and JohnsonSB are possible choices. Chapter 10
18
Gamma, Log Normal, Weibull
Chapter 10
19
Determining what data is critical
Some data may have a dominant impact on performance. The variability is often more important than the mean. Run scenarios specifically designed to determine the sensitivity of the model to the data inputs. Chapter 10
20
References Leemis, L, “Input Modeling Techniques for Discrete-Event Simulations,” Proceedings of the 2001 Winter Simulation Conference, Washington, DC, December 2001. Vincent, S., “Input Data Analysis,” in Handbook of Simulation, Edited by J. Banks, John Wiley & Sons, Inc, New York, NY, pp , 1998. Chapter 9 – Input Modeling (Banks et al.) Chapter 6 – Selecting Input Probability Distributions (Law) Leemis – Theoretical Vincent – Practical Banks and Law chapters focus primarily on “fitting” distributions from “historical” data Chapter 10
21
Summary Distributions are the primary method for capturing variability in the system. Never use a mean in place of a distribution for a random component. When data exists hypothesize a distribution and estimate parameters and test using goodness-of-fit software. In the absence of data, use appropriate distributions. Arrivals – exponential time between arrivals, or non-stationary Poisson. Activities – triangular or pert. Use the model to determine the critical data elements. Chapter 10
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.