Barry L. Nelson Northwestern University

Slides:



Advertisements
Similar presentations
Chapter 16 Inferential Statistics
Advertisements

Hypothesis Testing making decisions using sample data.
Interpreting Data: How to Make Sense of the Numbers Chris Neely Research Officer Federal Reserve Bank of St. Louis February 25, 2004.
6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
Finance, Financial Markets, and NPV
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
GrowingKnowing.com © Estimates We are often asked to predict the future! When will you complete your team project? When will you make your first.
Estimation of a Population Mean
Chapter 8 Parameter Estimates and Hypothesis Testing.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
1 Probability and Statistics Confidence Intervals.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Chapter 9 Estimating a Population Proportion Created by Kathy Fritz.
Understanding Credit Cards Learning about the little piece of plastic with a big responsibility Comparecards.com.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
How Bad Is Oops?. When we make a decision while hypothesis testing (to reject or to do not reject the H O ) we determine which kind of error we have made.
JavaScript Part 1 Introduction to scripting The ‘alert’ function.
Decision Making Under Uncertainty
Mr Barton’s Maths Notes
Confidence Intervals Excel
Adam R. Brown Jeremy C. Pope
Confidence Intervals for Proportions
Confidence Intervals GrowingKnowing.com © 2011
Unit 5 – Chapters 10 and 12 What happens if we don’t know the values of population parameters like and ? Can we estimate their values somehow?
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
Make Sure You Have Your Dependent Variable and Factor Selected
Statistics fundamentals
Psychology Unit Research Methods - Statistics
Unit 5: Hypothesis Testing
Probability & Statistics Displays of Quantitative Data
Chapter 21 More About Tests.
Evidence-Based Medicine Appendix 1: Confidence Intervals
Mr F’s Maths Notes Number 7. Percentages.
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Investment returns III: Loose ends
QM222 Class 8 Section A1 Using categorical data in regression
QM222 A1 Visualizing data using Excel graphs
CHAPTER 9 Testing a Claim
Chapter 8: Selecting an appropriate price level
Understanding Randomness
Introduction to Summary Statistics
Research and Its Applications
Confidence Interval Estimation and Statistical Inference
Introduction to Summary Statistics
CHAPTER 9 Testing a Claim
Daniela Stan Raicu School of CTI, DePaul University
Inferential Statistics
Stat 217 – Day 17 Review.
CHAPTER 9 Testing a Claim
Mr Barton’s Maths Notes
Mr Barton’s Maths Notes
CHAPTER 9 Testing a Claim

Tackling Timed Writings
Statistics Frequencies
Confidence Intervals for Proportions
Psych 231: Research Methods in Psychology
95% Confidence Interval μ
CHAPTER 9 Testing a Claim
Daniela Stan Raicu School of CTI, DePaul University
CHAPTER 9 Testing a Claim
Confidence Intervals for Proportions
CHAPTER 9 Testing a Claim
Density Curves Normal Distribution Area under the curve
Presentation transcript:

Barry L. Nelson Northwestern University …adapted from… The MORE Plot: Displaying Measures Of Risk & Error from Simulation Output Barry L. Nelson Northwestern University

The risk myth: No one will understand The answer goes something like this: Remember statistics class? I didn’t think so. You can argue about whether statistics is just hard, or the way we teach it makes it hard, but either way the main messages get lost. READ DILBERT…. For simulation I think the most important message that gets lost is the difference between risk and error. But I can make you an expert in 5 minutes, then show you way it is so important.

Risk for the masses Likely Unlikely 273 199 378 Suppose we have run a simulation and one of the outputs is the number of barrels, in thousands, of a particular chemical that we need annually. This number depends on a complex host of things: demand for our product, yield loss, etc. We might be interested in how much to stock or on whether we should pay for an option to get more at a fixed price later in the year. We get a histogram because we simulated yearly need for this chemical, and simulated many years, just like we simulated many games of Jai Alai. Just like in Jai Alai, there are at least two questions: How many barrels should we expect to use, and have we done enough simulation to really answer that question? Humans love to average so let’s drop in the sample average. Clearly we could need much more or much less than the average, so let’s also mark off a big chunk of the possible need, and label it in an easy to understand way. Right away we get what I believe is the important insight: that the future is pretty variable and our needs can be within a wide range. In baseball, a player’s batting average last year is a meaningful historical statistic. But a simulation is not trying to create history, most often it is trying to say something about what will happen in the future and whether we can live with that; the average doesn’t tell us. But have we done enough simulation to be confident in making any decision yet? As a final embellishment let’s put in a measure of error on each of those arrow heads. These say that we are highly confident the arrow head belongs SOMEWHERE in each interval, we just aren’t sure where. Do you think we done enough simulation to make a decision yet? So just like in the Jai Alai simulation, let’s do some more runs.

Risk for the masses Likely Unlikely 273 199 378 5th percentile of the observed data 95th percentile of the observed data Suppose we have run a simulation and one of the outputs is the number of barrels, in thousands, of a particular chemical that we need annually. This number depends on a complex host of things: demand for our product, yield loss, etc. We might be interested in how much to stock or on whether we should pay for an option to get more at a fixed price later in the year. We get a histogram because we simulated yearly need for this chemical, and simulated many years, just like we simulated many games of Jai Alai. Just like in Jai Alai, there are at least two questions: How many barrels should we expect to use, and have we done enough simulation to really answer that question? Humans love to average so let’s drop in the sample average. Clearly we could need much more or much less than the average, so let’s also mark off a big chunk of the possible need, and label it in an easy to understand way. Right away we get what I believe is the important insight: that the future is pretty variable and our needs can be within a wide range. In baseball, a player’s batting average last year is a meaningful historical statistic. But a simulation is not trying to create history, most often it is trying to say something about what will happen in the future and whether we can live with that; the average doesn’t tell us. But have we done enough simulation to be confident in making any decision yet? As a final embellishment let’s put in a measure of error on each of those arrow heads. These say that we are highly confident the arrow head belongs SOMEWHERE in each interval, we just aren’t sure where. Do you think we done enough simulation to make a decision yet? So just like in the Jai Alai simulation, let’s do some more runs.

Risk for the masses Likely Unlikely 273 199 378 5th percentile of the observed data 95th percentile of the observed data Suppose we have run a simulation and one of the outputs is the number of barrels, in thousands, of a particular chemical that we need annually. This number depends on a complex host of things: demand for our product, yield loss, etc. We might be interested in how much to stock or on whether we should pay for an option to get more at a fixed price later in the year. We get a histogram because we simulated yearly need for this chemical, and simulated many years, just like we simulated many games of Jai Alai. Just like in Jai Alai, there are at least two questions: How many barrels should we expect to use, and have we done enough simulation to really answer that question? Humans love to average so let’s drop in the sample average. Clearly we could need much more or much less than the average, so let’s also mark off a big chunk of the possible need, and label it in an easy to understand way. Right away we get what I believe is the important insight: that the future is pretty variable and our needs can be within a wide range. In baseball, a player’s batting average last year is a meaningful historical statistic. But a simulation is not trying to create history, most often it is trying to say something about what will happen in the future and whether we can live with that; the average doesn’t tell us. But have we done enough simulation to be confident in making any decision yet? As a final embellishment let’s put in a measure of error on each of those arrow heads. These say that we are highly confident the arrow head belongs SOMEWHERE in each interval, we just aren’t sure where. Do you think we done enough simulation to make a decision yet? So just like in the Jai Alai simulation, let’s do some more runs. Confidence interval (95%) for the 5th percentile Confidence interval (95%) for the 95th percentile

Nelson’s Method: What does this really produce? Build the confidence interval for the b-th percentile using the OBSERVED b1(b2) percentile

Taking a Look n lower sample upper 100 0.0071 1 0.0929 9 500 0.0309 15 0.0691 35 1000 0.0365 36 0.0635 64 2000 0.0404 81 0.0596 119 10000 0.0457 457 0.0543 543

As we simulate more… 280 Don’t move much 190 380 Here is the result we get if we run the simulation for many more years. Notice that uncertainty about the future does not disappear; you can’t simulate away risk. But we do improve our estimate of future uncertainty by running the simulation longer. With this information we can balance the various costs associated with the decision and do something rational. “Use MOE to get MOR” means use measures of error to get measures of risk. The big box is a measure of future risk, and that is probably what you need to support your decision. The little intervals are measure of error; they tell us if we have done enough simulation. BTW, nothing beyond Stat 100 was used to build this plot. Now my research colleagues in the audience are very nervous. They have questions like what did you assume about the data? What if I am only interested in upside risk? What if I want to change the definition of “likely?” All valid questions that entirely miss the point: If a plot like this was our default, our starting point in displaying simulation output, then it would encourage people to consider risk and help them decide when they have a good estimate of it. 190 280 380 Interval widths shrink

What happens when you simulate more? Let’s do a little bit more practice before looking at an example. Here are results from a simulation of an order fulfillment system looking at the time from order receipt to delivery, which I have called “cycle time.” We want to decide how long to promise when we take orders on our web site so we have very little chance of being late. And just like in the eye doctor’s office, will ask you if each chart is better or worse. The first plot is the initial simulation we ran. Have we run long enough yet? What did you look at to decide a promise date? How about now? Now? What would you promise? I hope I have convinced you that the idea that no one can understand risk, and how it relates to how long we ran the simulation, is a myth. But I haven’t answered the original question: How much risk could an IE miss if an IE did miss risk?

What sells? Here is one thing I learned about the magazine business: Who is on the cover matters for some titles. If, for instance, you get this cover [Aniston], then the magazines fly out of the pockets. But this cover [Fowler], not so much. And because it is the same cover at all stores, pretty much the same thing happens at all stores for these kinds of titles. Thus there can be big system-wide swings in sales. Where does that get reflected in the long-run average profit? Here are simulation results for two titles with the same weekly demand distribution, except that one of them does not have this common cover effect while the other does. You are now all experts at looking at risk, but just to help a bit more I’ll drop in a highlight at 0 profit. Both titles would end up with the same weekly stocking quantity, as they should, since their long-run average profits store by store are the same. But there is a lot more cash flow risk when there is a common cover effect. If I am unprepared for these big swings, if they are unexpected, then a few bad weeks might cause me to quickly abandon my “optimal” stocking policy thinking it must be wrong. And that could be a big mistake, particularly if I use an ad hoc fix that results in an unknown loss of potential profit over the long run. That is a perfectly good simulation gone bad because we did not also measure risk.

What sells? Here is one thing I learned about the magazine business: Who is on the cover matters for some titles. If, for instance, you get this cover [Aniston], then the magazines fly out of the pockets. But this cover [Fowler], not so much. And because it is the same cover at all stores, pretty much the same thing happens at all stores for these kinds of titles. Thus there can be big system-wide swings in sales. Where does that get reflected in the long-run average profit? Here are simulation results for two titles with the same weekly demand distribution, except that one of them does not have this common cover effect while the other does. You are now all experts at looking at risk, but just to help a bit more I’ll drop in a highlight at 0 profit. Both titles would end up with the same weekly stocking quantity, as they should, since their long-run average profits store by store are the same. But there is a lot more cash flow risk when there is a common cover effect. If I am unprepared for these big swings, if they are unexpected, then a few bad weeks might cause me to quickly abandon my “optimal” stocking policy thinking it must be wrong. And that could be a big mistake, particularly if I use an ad hoc fix that results in an unknown loss of potential profit over the long run. That is a perfectly good simulation gone bad because we did not also measure risk. 11

What is our estimate of the likelihood that the next observation falls within the endpoints? (Called a prediction interval) May be worth noting that SLAM used to display the entire empirical cdf. @Risk has a default display sort of like this, except no CI’s on the mean and %tiles.

CONCLUSIONS MORE displays difference between a CI on the mean and a Prediction Interval (for subsequent observations) MORE shows effects of simulation sample size on predictors Precise probabilistic statements about the values calculated are elusive