Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.

Slides:



Advertisements
Similar presentations
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Advertisements

Hydrologic Statistics
Choosing a Probability Distribution
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
DISTRIBUTION FITTING.
A Review of Probability and Statistics
Simulation Modeling and Analysis
Topic 2: Statistical Concepts and Market Returns
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Lecture II-2: Probability Review
 Catalogue No: BS-338  Credit Hours: 3  Text Book: Advanced Engineering Mathematics by E.Kreyszig  Reference Books  Probability and Statistics by.
Describing Data: Numerical
Identifying Input Distributions 1. Fit Distribution to Historical Data 2. Forecast Future Performance and Uncertainty ◦ Assume Distribution Shape and Forecast.
Chapter 5 Modeling & Analyzing Inputs
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Chapter 5 Statistical Models in Simulation
Chapter 3 Basic Concepts in Statistics and Probability
Modeling and Simulation CS 313
Modeling and Simulation Input Modeling and Goodness-of-fit tests
Traffic Modeling.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Theory of Probability Statistics for Business and Economics.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Why statisticians were created Measure of dispersion FETP India.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
1 Statistical Distribution Fitting Dr. Jason Merrick.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
“ Building Strong “ Delivering Integrated, Sustainable, Water Resources Solutions Statistics 101 Robert C. Patev NAD Regional Technical Specialist (978)
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Introduction to Statistics Santosh Kumar Director (iCISA)
Academic Research Academic Research Dr Kishor Bhanushali M
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Probability Review Risk Analysis for Water Resources Planning and Management Institute for Water Resources 2008.
Learning Simio Chapter 10 Analyzing Input Data
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
Sampling and estimation Petter Mostad
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Modeling and Simulation CS 313
Chapter 14 Fitting Probability Distributions
Statistical Modelling
Chapter 3 Describing Data Using Numerical Measures
Modeling and Simulation CS 313
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Chapter 3 Describing Data Using Numerical Measures
Hydrologic Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009

Probability x Consequence Quantitative risk assessment requires you to use probability Sometimes you will estimate the probability of an event Sometimes you will use distributions to –Describe data –Model variability –Represent our uncertainty What distribution do you use?

Probability—Language of Random Variables Constant Variables Some things vary predictably Some things vary unpredictably Random variables It can be something known but not known by us

Checklist for Choosing a Distributions From Some Data 1.Can you use your data? 2.Understand your variable a)Source of data b)Continuous/discrete c)Bounded/unbounded d)Meaningful parameters e)Univariate/multivariate f)1 st or 2 nd order 3.Look at your data— plot it 4.Use theory 5.Calculate statistics 6.Use previous experience 7.Distribution fitting 8.Expert opinion 9.Sensitivity analysis

First! Do you have data? If so, do you need a distribution or can you just use your data? Answer depends on the question(s) you’re trying to answer as well as your data

Use Data If your data are representative of the population germane to your problem use them One problem could be bounding data –What are the true min & max? Any dataset can be converted into a –Cumulative distribution function –General density function

Fitting Empirical Distribution to Data If continuous & reasonably extensive May have to estimate minimum & maximum Rank data x(i) in ascending order Calculate the percentile for each value Use data and percentiles to create cumulative distribution function

When You Can’t Use Your Data Given wide variety of distributions it is not always easy to select the most appropriate one –Results can be very sensitive to distribution choice Using wrong assumption in a model can produce incorrect results Incorrect results can lead to poor decisions Poor decisions can lead to undesirable outcomes

Understand Your Data What is source of data? –Experiments –Observation –Surveys –Computer databases –Literature searches –Simulations –Test case The source of the data may affect your decision to use it or not. Understand your variable

Type of Variable? Is your variable discrete or continuous ? Do not overlook this! –Discrete distributions- take one of a set of identifiable values, each of which has a calculable probability of occurrence –Continuous distributions- a variable that can take any value within a defined range Barges in a tow Houses in floodplain People at a meeting Results of a diagnostic test Casualties per year Relocations and acquisitions Average number of barges per tow Weight of an adult striped bass Sensitivity or specificity of a diagnostic test Transit time Expected annual damages Duration of a storm Shoreline eroded Sediment loads Understand your variable

What Values Are Possible? Is your variable bounded or unbounded? –Bounded-value confined to lie between two determined values –Unbounded-value theoretically extends from minus infinity to plus infinity –Partially bounded-constrained at one end (truncated distributions) Use a distribution that matches Understand your variable

Continuous Distribution Examples Unbounded –Normal –t –Logistic Left Bounded –Chi-square –Exponential –Gamma –Lognormal –Weibull Bounded –Beta –Cumulative –General/histogram –Pert –Uniform –Triangle Understand your variable

Discrete Distribution Examples Unbounded –None Left Bounded –Poisson –Negative binomial –Geometric Bounded –Binomial –Hypergeometric –Discrete –Discrete Uniform Understand your variable

Are There Parameters Does your variable have parameters that are meaningful? –Parametric--shape is determined by the mathematics describing a conceptual probability model Require a greater knowledge of the underlying –Non-parametric—empirical distributions for which the mathematics is defined by the shape required Intuitively easy to understand Flexible and therefore useful Understand your variable

Choose Parametric Distribution If Theory supports choice Distribution proven accurate for modelling your specific variable (without theory) Distribution matches any observed data well Need distribution with tail extending beyond the observed minimum or maximum Understand your variable

Choose Non-Parametric Distribution If Theory is lacking There is no commonly used model Data are severely limited Knowledge is limited to general beliefs and some evidence Understand your variable

Parametric and Non-Parametric Normal Lognormal Exponential Poisson Binomial Gamma Uniform Pert Triangular Cumulative Understand your variable

Is It Dependent on Other Variables Univariate and multivariate distributions –Univariate--describes a single parameter or variable that is not probabilistically linked to any other in the model –Multivariate--describe several parameters that are probabilistically linked in some way Engineering relationships are often multivariate Understand your variable

Do You Know the Parameters? First or Second order distribution –First order—a probability distribution with precisely known parameters (N(100,10)) –Second order--a probability with some uncertainty about its parameters (N(m,s)) Risknormal(risktriang(90,100,103),riskuniform(8,11)) Understand your variable

Continuing Checklist for Choosing a Distributions 3.Look at your data—plot it 4.Use theory 5.Calculate statistics 6.Use previous experience 7.Distribution fitting 8.Expert opinion 9.Sensitivity analysis

Plot--Old Faithful Eruptions What do your data look like? You could calculate Mean & SD and assume its normal Beware, danger lurks Always plot your data

Which Distribution? Examine your plot Look for distinctive shapes of specific distributions –Single peaks –Symmetry –Positive skew –Negative values –Gamma, Weibull, beta are useful and flexible forms

Theory-Based Choice Most compelling reason for choice Formal theory –Central limit theorem Theoretical knowledge of the variable –Behavior –Math—range Informal theory –Sums normal, products lognormal –Study specific –Your best documented thoughts on subject

Calculate Statistics Summary statistics may provide clues Normal has low coefficient of variation and equal mean and median Exponential has positive skew and equal mean and standard deviation Consider outliers

Outliers Extreme observations can drastically influence a probability model No prescriptive method for addressing them If observation is an error remove it If not what is data point telling you? –What about your world-view is inconsistent with this result? –Should you reconsider your perspective? –What possible explanations have you not yet considered?

Outliers (cont) Your explanation must be correct, not merely plausible –Consensus is poor measure of truth If you must keep it and can't explain it –Use conventional practices and live with skewed consequences –Choose methods less sensitive to such extreme observations (Gumbel, Weibull)

Previous Experience Have you dealt with this issue successfully before? What did other analyses or risk assessments use? What does the literature reveal?

Goodness of Fit Provides statistical evidence to test hypothesis that your data could have come from a specific distribution H 0 these data come from an “x” distribution Small test statistic and large p mean accept H 0 It is another piece of evidence not a determining factor

GOF Tests Chi-Square Test –Most common— discrete & continuous –Data are divided into a number of cells, each cell with at least five –Usually 50 observations or more Kolomogorov- Smirnov Test –More suitable for small samples than Chi- Square –Better fit for means than tails Andersen-Darling Test –Weights differences between theoretical and empirical distributions at their tails greater than at their midranges –Desirable when better fit at extreme tails of distribution are desired

Kolmogorov-Smirnov Statistic Blue = data Red = true/hypothetical Find biggest difference between the two K-S statistic is largest difference consistent with your –n –α

No Data Available Modelers must resort to judgment Knowledge of distributions is valuable in this situation

Defining Distributions w/ Expert Opinion Data never collected Data too expensive or impossible Past data irrelevant Opinion needed to fill holes in sparse data New area of inquiry, unique situation that never existed

What Experts Estimate The distribution itself –Judgment about distribution of value in population –E.g. population is normal Parameters of the distribution –E.g. mean is x and standard deviation is y

Modeling Techniques Disaggregation (Reduction) Subjective Probability Elicitation PDF or CDF Parametric or Non-parametric distributions

Elicitation Techniques Needed Literature shows we do not assess subjective probabilities well In part due to heuristics we use –Representativeness –Availability –Anchoring and adjustment There are methods to counteract our heuristics and to elicit our expert knowledge

Sensitivity Analysis Unsure which is the best distribution? Try several –If no difference you are free to use any one –Significant differences mean doing more work

Take Away Points Choosing the best distribution is where most new risk assessors feel least comfortable. Choice of distribution matters. Distributions come from data and expert opinion. Distribution fitting should never be the basis for distribution choice.

Charles Yoe, Ph.D. Questions?