Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008.

Slides:



Advertisements
Similar presentations
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Advertisements

Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Hydrologic Statistics
Modeling Process Quality
Choosing a Probability Distribution
DISTRIBUTION FITTING.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
A Review of Probability and Statistics
Simulation Modeling and Analysis
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Topic 2: Statistical Concepts and Market Returns
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
BCOR 1020 Business Statistics
Chapter 4 Continuous Random Variables and Probability Distributions
Math 647 March 18 & 20, Probability Probability: Probability: Probability of exceedance: Pr[ X > x ] or Pr [X x ] or Pr [X < x ] Relative frequency/proportions:
Chapter 5 Modeling & Analyzing Inputs
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Chapter 5 Statistical Models in Simulation
Chapter 3 Basic Concepts in Statistics and Probability
Modeling and Simulation CS 313
Modeling and Simulation Input Modeling and Goodness-of-fit tests
Traffic Modeling.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
Theory of Probability Statistics for Business and Economics.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
1 Statistical Distribution Fitting Dr. Jason Merrick.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
“ Building Strong “ Delivering Integrated, Sustainable, Water Resources Solutions Statistics 101 Robert C. Patev NAD Regional Technical Specialist (978)
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
ETM 607 – Input Modeling General Idea of Input Modeling Data Collection Identifying Distributions Parameter estimation Goodness of Fit tests Selecting.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Section 8-2 Basics of Hypothesis Testing.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Introduction to Statistics Santosh Kumar Director (iCISA)
Academic Research Academic Research Dr Kishor Bhanushali M
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Learning Simio Chapter 10 Analyzing Input Data
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
Chapter 6 Chapter 16 Sections , 4.0, Lecture 16 GRKS.XLSX Lecture 16 Low Prob Extremes.XLSX Lecture 16 Uncertain Emp Dist.XLSX Lecture 16 Combined.
Sampling and estimation Petter Mostad
Choosing a Probability Distribution Water Resource Risk Analysis Davis, CA 2009.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Modeling and Simulation CS 313
MATH-138 Elementary Statistics
Statistical Modelling
Modeling and Simulation CS 313
Hydrologic Statistics
Presentation transcript:

Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008

Probability x Consequence Quantitative risk assessment requires you to use probability distributions to Describe data Model variability Represent our uncertainty What distribution do I use?

First! Do you have data? If so, do you need a distribution or can you just use your data? Answer depends on the question(s) you’re trying to answer as well as your data

Use Data If your data are representative of the population germane to your problem use them One problem could be bounding data What are the true min & max? Any dataset can be converted into a Cumulative distribution function General density function

Fitting Empirical Distribution to Data If continuous & reasonably extensive * May have to estimate minimum & maximum * Rank data x(i) in ascending order * Calculate the percentile for each value * Use data and percentiles to create cumulative distribution function

When You Need a Distribution Have some data but not enough for empirical distribution A probability model is needed Many random events do follow standard probability models Not everything follows a model

Dilemma Given wide variety of distributions it is not always easy to select the most appropriate one Results can be very sensitive to distribution choice Using wrong assumption in a model can produce incorrect results Incorrect results can lead to poor decisions Poor decisions can lead to undesirable outcomes

First, Understand Your Data 1. Is your variable discrete or continuous ? Do not overlook this! Discrete distributions- take one of a set of identifiable values, each of which has a calculable probability of occurrence. Continuous distributions- a variable that can take any value within a defined range

What Values Are Possible? 2. Is your variable bounded or unbounded? Bounded-value confined to lie between two determined values Unbounded-value theoretically extends from minus infinity to plus infinity Partially bounded-constrained at one end (truncated distributions) Use a distribution that matches

Are There Parameters 3. Does your variable have parameters that are meaningful? Parametric--model-based distributions, for which the shape is determined by the mathematics describing a conceptual probability model Require a greater knowledge of the underlying Non-parametric—empirical distributions for which the mathematics is defined by the shape required Intuitively easy to understand Flexible and therefore useful

Is It Dependent on Other Variables 4. Univariate and multivariate distributions Univariate--describes a single parameter or variable that is not probabilistically linked to any other in the model Multivariate--describe several parameters that are probabilistically linked in some way

Do You Know the Parameters? 5. First or Second order distribution First order—a probability distribution with precisely known parameters (N(100,10)) Second order--a probability with some uncertainty about its parameters (N( m, s ))

Continuous Distribution Examples Unbounded Normal t Logistic Left Bounded Chi-square Exponential Gamma Lognormal Weibull Bounded Beta Cumulative General/histogram Pert Uniform Triangle

Discrete Distribution Examples Unbounded None Left Bounded Poisson Negative binomial Geometric Bounded Binomial Hypergeometric Discrete Discrete Uniform

Parametric and Non-Parametric Normal Lognormal Exponential Poisson Binomial Gamma Uniform Pert Triangular Cumulative

2 General Approaches to Choosing Distributions Choose the math (parametric) Choose the shape (nonparametric) Empirical data exist No data are available

Choose Parametric Distribution If Theory supports choice Distribution proven accurate for modelling your specific variable (without theory) Distribution matches observed data well Need distribution with tail extending beyond the observed minimum or maximum

Choose Non-Parametric Distribution If Theory is lacking There is no commonly used model Data are severely limited Knowledge is limited to general beliefs and some evidence

What is source of data? Experiments Observation Surveys Computer databases Literature searches Simulations Test case The source of the data may affect your decision to use it or not.

Checklist for Choosing a Distributions From Some Data 1. Understand your variable (preceding) 2. Look at your data—plot it 3. Use theory 4. Calculate statistics 5. Use previous experience 6. Distribution fitting 7. Expert opinion 8. Sensitivity analysis

Plot--Old Faithful Eruptions Find this distribution! You could fit data to this Mean & SD and assume its normal Beware, danger lurks Always plot your data

Which Distribution? Examine a histogram Look for distinctive shapes of specific distributions Single peaks Symmetry Positive skew Negative values Gamma, Weibull, beta are useful and flexible forms

Which Distribution? Summary statistics can provide clues Normal has low coefficient of variation and equal mean and median Exponential has positive skew and equal mean and standard deviation Go to RiskView and check this out.

Outliers Extreme observations can drastically influence a probability model No prescriptive method for addressing them If observation is an error remove it If not what is data point telling you? What about your world-view is inconsistent with this result? Should you reconsider your perspective? What possible explanations have you not yet considered?

Outliers (cont) Your explanation must be correct, not merely plausible Consensus is poor measure of truth If you must keep it and can't explain it Use conventional practices and live with skewed consequences Choose methods less sensitive to such extreme observations (Gumbel, Weibull)

Goodness of Fit Provides statistical evidence to test hypothesis about nature of the distribution H 0 these data come from an “x” distribution Small test statistic and large p are “desirable” for accepting H 0 Another piece of evidence not a determining factor

Chi-Square Test Most common—discrete & continuous Tests H 0 that sample data come from a specific distribution versus H that they do not Non-parametric and one-sided Data are divided into a number of cells, each cell with at least five Usually 50 observations or more

Kolomogorov-Smirnov Test Tests H 0 that continuous sample data come from a specific distribution versus H that they do not More suitable for small samples than Chi-Square Sort data in ascending order and find greatest difference between theoretical value for each ranked observation and that observation’s theoretical counterpart Better fit for means than tails Less than 0.03 indicates a good fit to hypothesized distribution

Kolmogorov-Smirnov Statistic

Andersen-Darling Test Similar to K-S—continuous variables Weights differences between theoretical and empirical distributions at their tails greater than at their midranges Desirable when better fit at extreme tails of distribution are desired Value less than 1.5 is usually a good fit

No Data Available Modelers must resort to judgment Knowledge of distributions is valuable in this situation

Defining Distributions w/ Expert Opinion Data never collected Data too expensive or impossible Past data irrelevant Opinion needed to fill holes in sparse data New area of inquiry, unique situation that never existed

What Experts Estimate The distribution itself Judgment about distribution of value in population E.g. population is normal Parameters of the distribution E.g. mean is x and standard deviation is y

Modeling Techniques Disaggregation (Reduction) Subjective Probability Elicitation * PDF or CDF * Parametric or Non-parametric distributions

Take Away Points Choosing the best distribution is where most new risk assessors feel least comfortable. Choice of distribution matters. Distributions come from data and expert opinion. Distribution fitting should never be the basis for distribution choice.