Download presentation
Presentation is loading. Please wait.
1
Lecture 6 Data Collection and Parameter Estimation
2
2 Input Modeling In real-world simulation applications, determining appropriate distributions for input data is a major task from the standpoint of time and resource requirements. Faulty models of the inputs will lead to outputs whose interpretation may give rise to misleading recommendations. Steps to develop a useful model for input data Collect data from the real system of interest Identify a probability distribution to represent the input process In real-world simulation applications, determining appropriate distributions for input data is a major task from the standpoint of time and resource requirements. Faulty models of the inputs will lead to outputs whose interpretation may give rise to misleading recommendations. Steps to develop a useful model for input data Collect data from the real system of interest Identify a probability distribution to represent the input process
3
3 Input Modeling (cont’) Choose parameters that determine a specific instance of the distribution family Evaluate the chosen distribution and the associated parameters for goodness-of-fit Choose parameters that determine a specific instance of the distribution family Evaluate the chosen distribution and the associated parameters for goodness-of-fit
4
4 Data Collection Plan your data collection process Always try to find ways that can help you collect data efficiently and accurately (equipment, barcoding, receipts, personnel, video, etc) Collect only data that is useful for your project Plan your data collection process Always try to find ways that can help you collect data efficiently and accurately (equipment, barcoding, receipts, personnel, video, etc) Collect only data that is useful for your project
5
5 Identifying the Distribution HISTOGRAMS Divide the range of the data into intervals Label the horizontal axis to conform to the intervals selected Determine the frequency of occurrences within each interval Label the vertical axis so that the total occurrences can be plotted for each interval Plot the frequencies on the vertical axis HISTOGRAMS Divide the range of the data into intervals Label the horizontal axis to conform to the intervals selected Determine the frequency of occurrences within each interval Label the vertical axis so that the total occurrences can be plotted for each interval Plot the frequencies on the vertical axis
6
6 Identifying the Distribution (cont’) SELECTING THE FAMILY OF DISTRIBUTOINS Recall if the histogram drawn from your resembles any kind of statistical distribution Use physical basis (e.g. usage, discrete or continuous) of the distribution as a guide Use software The exponential, normal, and Poisson distributions are frequently encountered and are not difficult to analyze from a computational standpoint SELECTING THE FAMILY OF DISTRIBUTOINS Recall if the histogram drawn from your resembles any kind of statistical distribution Use physical basis (e.g. usage, discrete or continuous) of the distribution as a guide Use software The exponential, normal, and Poisson distributions are frequently encountered and are not difficult to analyze from a computational standpoint
7
7 Identifying the Distribution (cont’) QUANTILE-QUANTILE PLOTS Evaluate the fit of the chosen distribution(s) Compare the actual values with the values derived from the chosen distribution The nearer to become a straight line, the better the accuracy QUANTILE-QUANTILE PLOTS Evaluate the fit of the chosen distribution(s) Compare the actual values with the values derived from the chosen distribution The nearer to become a straight line, the better the accuracy 99.7999.56100.17100.33 100.26100.4199.9899.83 100.23100.27100.02100.47 99.5599.6299.6599.82 99.9699.90100.0699.85
8
8 Identifying the Distribution (cont’) ObservedValue j Value 1 99.556 99.8211 99.9816 100.26 2 99.567 99.8312 100.0217 100.27 3 99.628 99.8513 100.0618 100.33 4 99.659 99.9014 100.1719 100.41 5 99.7910 99.9615 100.2320 100.47 EstimatedValue j Value 1 99.436 99.8211 100.0116 100.20 2 99.587 99.8612 100.0417 100.25 3 99.668 99.9013 100.0818 100.32 4 99.739 99.9414 100.1219 100.40 5 99.7810 99.9715 100.1620 100.55
9
9 Parameter Estimation Sample Mean and Sample Variance Calculate sample mean ( ) and variance ( ) from the collected data Based on the distribution chosen, convert the parameters from the sample mean and variance which is (are) used for the distribution Sample Mean and Sample Variance Calculate sample mean ( ) and variance ( ) from the collected data Based on the distribution chosen, convert the parameters from the sample mean and variance which is (are) used for the distribution DistributionParameter(s)Suggested Estimator(s) Poisson Exponential Normal , 2
10
10 Goodness-of-Fit Tests Provides helpful (quantitative) guidance for evaluating the suitability of a potential input model Used in large samples size data Use tables to determine accept or reject Provides helpful (quantitative) guidance for evaluating the suitability of a potential input model Used in large samples size data Use tables to determine accept or reject
11
11 Goodness-of-Fit Tests (cont’) Chi-Square Test This test is applied to for testing the hypothesis that a random sample of size n of the random variable X follows a specific distributional form The test is valid for large sample sizes, for both discrete and continuous distributional assumptions O i is the observed frequency in the i th class interval E i is the expected frequency in that class interval Chi-Square Test This test is applied to for testing the hypothesis that a random sample of size n of the random variable X follows a specific distributional form The test is valid for large sample sizes, for both discrete and continuous distributional assumptions O i is the observed frequency in the i th class interval E i is the expected frequency in that class interval
12
12 Goodness-of-Fit Tests (cont’) Example 9.13 (Poisson Assumption) H 0 : the random variable is Poisson distributed H 1 : the random variable is not Poisson distributed For = 3.64, the probabilities associated with various values of x: It is significantly to reject H 0 at the 0.05 level of significance. Example 9.13 (Poisson Assumption) H 0 : the random variable is Poisson distributed H 1 : the random variable is not Poisson distributed For = 3.64, the probabilities associated with various values of x: It is significantly to reject H 0 at the 0.05 level of significance. P(0) = 0.026P(4) = 0.192P(8) = 0.020 P(1) = 0.096P(5) = 0.140P(9) = 0.008 P(2) = 0.174P(6) = 0.085P(10) = 0.003 P(3) = 0.211P(7) = 0.044P(11) = 0.001 2.619.22.0 9.614.00.8 17.48.50.3 21.14.40.1 E(x)=np
13
13 Goodness-of-Fit Tests (cont’) xixi Observed frequency, O i Expected Frequency, E i 0122.6 1109.6 21917.40.15 31721.10.80 41019.24.41 5814.02.57 678.50.26 754.4 852.0 930.8 1030.3 1110.1 100100.027.68 22 12.27.87 177.611.62
14
14 Goodness-of-Fit Tests (cont’) Example 9.14 (Exponential Assumption) H 0 : the random variable is Exponential distributed H 1 : the random variable is not Exponential distributed Let k = 8, then each interval will have probability p = 0.125 Example 9.14 (Exponential Assumption) H 0 : the random variable is Exponential distributed H 1 : the random variable is not Exponential distributed Let k = 8, then each interval will have probability p = 0.125
15
15 Goodness-of-Fit Tests (cont’) It is significantly to reject H 0 at the 0.05 level of significance. Class Interval Observed frequency, O i Percentage FactorExpected Frequency, E i [0, 1.590)19 P(X 0.159) – P(X 0) = 0.125 6.2526.01 [1.590, 3.425)10 P(X 3.425) – P(X 1.590) = 0.125 6.252.25 [3.425, 5.595)3 P(X 5.595) – P(X 3.425) = 0.125 6.250.81 [5.595, 8.252)6 P(X 8.252) – P(X 5.595) = 0.125 6.250.01 [8.252, 11.677)1 P(X 11.677) – P(X 8.252) = 0.125 6.254.41 [11.677, 16.503)1 P(X 16.503) – P(X 11.677) = 0.125 6.254.41 [16.503, 24.755)4 P(X 24.755) – P(X 16.503) = 0.125 6.250.81 [24.755, ) 6 P(X ) – P(X 24.755) = 0.125 6.250.01 501.0005039.6
16
16 Selecting Input Models without Data Engineering data A product or process has performance ratings provided by the manufacturer (for example, a laser printer fan produce 4 pages/minute) Expert option Talk to people who are experienced with the process or similar processes. Physical or conventional limitations Most real processes have physical limits on performance (for example, computer data entry cannot be faster than a person can type) The nature of the process Select the family of distribution Engineering data A product or process has performance ratings provided by the manufacturer (for example, a laser printer fan produce 4 pages/minute) Expert option Talk to people who are experienced with the process or similar processes. Physical or conventional limitations Most real processes have physical limits on performance (for example, computer data entry cannot be faster than a person can type) The nature of the process Select the family of distribution
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.