Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Slides:



Advertisements
Similar presentations
Managerial Decision Modeling with Spreadsheets
Advertisements

Statistical Issues in Research Planning and Evaluation
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
Introduction to Probability and Probabilistic Forecasting L i n k i n g S c i e n c e t o S o c i e t y Simon Mason International Research Institute for.
International Insurance-Reinsurance Forum 2012 Managing catastrophic risks From Seismic Hazard Analysis to Seismic Risk mitigation C.O.Cioflan Ph.D., Senior.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistical Methods Chichang Jou Tamkang University.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
Time-dependent seismic hazard maps for the New Madrid seismic zone and Charleston, South Carolina areas James Hebden Seth Stein Department of Earth and.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 12 Inferring from the Data. Inferring from Data Estimation and Significance testing.
Getting Started with Hypothesis Testing The Single Sample.
Inferential Statistics
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Statistics Introduction 1.)All measurements contain random error  results always have some uncertainty 2.)Uncertainty are used to determine if two or.
Outline: Lecture 4 Risk Assessment I.The concepts of risk and hazard II.Shaking hazard of Afghanistan III.Seismic zone maps IV.Construction practice What.
AM Recitation 2/10/11.
Eruption Forecasting through the Bayesian Event Tree: the software package BET_EF INGV BET: a probabilistic tool for Eruption Forecasting and Volcanic.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing:.
HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi
Statistical inference: confidence intervals and hypothesis testing.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Determining Sample Size
Bad assumptions or bad luck: Tohoku’s embarrassing lessons for earthquake hazard mapping What’s going wrong and what to do? Tohoku, Japan March 11, 2011.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Statistical Decision Theory
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Individual values of X Frequency How many individuals   Distribution of a population.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Research opportunities using IRIS and other seismic data resources John Taber, Incorporated Research Institutions for Seismology Michael Wysession, Washington.
Are we successfully addressing the PSHA debate? Seth Stein Earth & Planetary Sciences, Northwestern University.
Earthquake forecasting using earthquake catalogs.
Earthquake hazard isn’t a physical thing we measure. It's something mapmakers define and then use computer programs to predict. To decide how much to believe.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Uncertainty Management in Rule-based Expert Systems
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
9. As hazardous as California? USGS/FEMA: Buildings should be built to same standards How can we evaluate this argument? Frankel et al., 1996.
What to do given that earthquake hazard maps often fail
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Timothy Aman, FCAS MAAA Managing Director, Guy Carpenter Miami Statistical Limitations of Catastrophe Models CAS Limited Attendance Seminar New York, NY.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
TEMPLATE DESIGN © Vertically Integrated Seismological Analysis I : Modeling Nimar S. Arora, Michael I. Jordan, Stuart.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
Chapter 13 Understanding research results: statistical inference.
9. As hazardous as California? USGS/FEMA: Buildings should be built to same standards How can we evaluate this argument? Frankel et al., 1996.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA 2 Department of Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Antonella Peresan F. Vaccari, A. Magrin, G.F. Panza, S. Cozzini B.K. Rastogi, I. Parvez Antonella Peresan F. Vaccari, A. Magrin, G.F. Panza, S. Cozzini.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
A new prior distribution of a Bayesian forecast model for small repeating earthquakes in the subduction zone along the Japan Trench Masami Okada (MRI,
Metrics, Bayes, and BOGSAT: Recognizing and Assessing Uncertainties in Earthquake Hazard Maps Seth Stein 1, Edward M. Brooks 1, Bruce D. Spencer 2 1 Department.
Why aren't earthquake hazard maps better. Seth Stein1, M
PCB 3043L - General Ecology Data Analysis.
Statistical Process Control
Engineering Geology and Seismology
Testing Hypotheses I Lesson 9.
And now the Framework WP4.
Presentation transcript:

Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA 2 Department of Statistics and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA 3 Department of Mathematics and Geosciences, University of Trieste. Italy 4 SAND Group, ICTP. Trieste. Italy Edward Brooks 1, Seth Stein 1, Bruce D. Spencer 2 Antonella Peresan 3,4 Antonella Peresan 3,4 Metrics, observations, and biases in quantitative assessment of seismic hazard model predictions CSNI Workshop on Testing PSHA Results and Benefit of Bayesian Techniques for Seismic Hazard Assessment Pavia, Italy (4-6 February 2015)

 What’s going wrong with existing maps?  How can we improve forecasts?  How can we quantify their uncertainties?  How can we measure their performance?  How do we know when to update them?  How good do they have to be useful?  How do we make sensible policy given forecasts limitations? Forecasting ground shaking: many maps… and many questions

Assessing performances of seismic hazard maps Stein S., Geller R. and Liu M. (2011). Bad assumptions or bad luck: why earthquake hazard maps need objective testing. Bad assumptions or bad luck: why earthquake hazard maps need objective testing Seth Stein Robert Geller Mian Liu Seism. Res. Lett., 82:5 September – October 2011

Conclusions: the comparison between observations and predictions can provide only limited constraints on probabilistic seismic hazard estimates. This is particularly true for ground accelerations above 0.1g (relevant for strucural damage) Assessing performances of seismic hazard maps

Jordan, Marzocchi, Michael & Gerstenberger (2014) - SRL 85, no. 5, Assessing performances of seismic hazard maps

Geller 2011 Geller (2011) argued that “all of Japan is at risk from earthquakes, and the present state of seismological science does not allow us to reliably differentiate the risk level in particular geographic areas,” so a map showing uniform hazard would be preferable to the existing maps. How should we test this idea?

How good a baseball player was Babe Ruth? The answer depends on the metric used. In many seasons Ruth led the league in both home runs and in the number of times he struck out. By one metric he did very well, and by another, very poorly.

From users’ perspective, what specifically should hazard maps seek to accomplish? Different users likely want different things How do we measure how well they meet users requirements? No agreed way yet…

Lessons from meteorology Weather forecasts are routinely evaluated to assess how well their predictions matched what actually occurred: "it is difficult to establish well-defined goals for any project designed to enhance forecasting performance without an unambiguous definition of what constitutes a good forecast." (Murphy, 1993) Information about how a forecast performs is crucial in determining how best to use it. The better a weather forecast has worked to date, the more we factor it into our daily plans.

Chosing appropriate metrics is crucial in assessing performance of forecasts. Silver (2012) shows that TV weather forecasts have a "wet bias" - predicting more rain than actually occurs, probably because they feel that customers accept unexpectedly sunny weather, but are annoyed by unexpected rain.

From users’ perspective, what specifically should hazard maps seek to accomplish? How do we measure how well they do it? How much can we improve them? How can we quantify their large uncertainties?

How to measure map performance? Implicit probabilistic map criterion: after appropriate time predicted shaking exceeded at only a fraction p of sites Define fractional site exceedance metric M0(f,p) = |f – p| where f is fraction of sites exceeding predicted shaking Ideal map has M0 = 0 M0=0

Fractional site exceedance is a useful metric but only tells part of the story Both maps are successful, but… This map exposed some sites to much greater shaking than predicted. This situation could reflect faults that had larger earthquakes than assumed. M0=0

This map significantly overpredicted shaking, which could arise from overestimating the magnitude of the largest earthquakes. M0=0 Fractional site exceedance is a useful metric but only tells part of the story All these maps are successful, but… M0=0

Other metrics can provide additional information beyond the fractional site exceedance M0 Squared misfit to the data M1(s,x) =  i (x i - s i ) 2 /N measures how well the predicted shaking compares to the highest observed. From a purely seismological view, M1 tells us more than M0 about how well a map performed.

Other metrics can provide additional information beyond the fractional site exceedance M0 Because underprediction does potentially more harm than overprediction, we could weight underprediction more heavily. Asymmetric squared misfit M2(s,x) =  i wi(xi - si)2/N with wi = a for (xi - si) > 0 and wi = b for (xi - si) ≤ 0 More useful for hazard mitigation than M1

Other metrics can provide additional information beyond the fractional site exceedance M0 Shaking-weighted asymmetric squared misfit We could use larger weights for areas predicted to be the most hazardous, so the map is judged most on how it does there.

Other metrics can provide additional information beyond the fractional site exceedance M0 Exposure-weighted asymmetric squared misfit We could use larger weights for areas with the largest exposure of people or property, so the map is judged most on how it does there.

Although no single metric fully characterizes map performance, using several metrics can provide valuable insight for assessing and improving hazard maps

Comparing maps could be done via the skill score SS(s,r,x) = 1 - M(s,x) / M(r,x) where M is any of the metrics, x is the maximum observed shaking, s is the map prediction, and r is the prediction of a reference map produced using a selected null hypothesis (e.g. uniform hazard). The skill score would be positive if the map's predictions did better than those of the map made with the null hypothesis, and negative if they did worse. We could assess how well maps have done after a certain time, and whether successive generations of maps do better.

Nekrasova et al., BC – 2002 AD

One possible space-time sampling bias…  The probabilistic map with 2% probability of exceedance in 50 years (i.e. ground shaking expected at least once in 2475 years) significantly overestimates the shaking reported over a comparable time span (about 2200 years).  The deterministic map, which is not associated to a specific time span, also tends to overestimate the ground shaking with respect to past earthquakes. Historical catalog thought to be incomplete (Stucchi et al., 2004) and may underestimate the largest shaking due to space-time sampling bias

a) TOTAL – [1000,1500)b) TOTAL – [1500,2000) Intensity differences between the NDSHA map obtained for the entire catalog (TOTAL) and the maps obtained for the time intervals (500 years catalog): a) [1000,1500) e b) [1500, 2000) Dependence of seismic hazard estimates on the time span of the input catalog: NDSHA map

TOTAL – [1000,1500)TOTAL – [1500,2000) Intensity differences between the NDSHA map obtained for the entire catalog (TOTAL) and the maps obtained, considering the seismogenic nodes, for the time intervals: a) [1000,1500) e b) [1500, 2000) Dependence of seismic hazard estimates on the time span of the input catalog: NDSHA map

Options after an earthquake yields shaking larger than anticipated: Either regard the high shaking as a low- probability event allowed by the map Or – as usually done - accept that high shaking was not simply a low- probability event and revise the map

No formal or objective criteria are used to decide whether to change map & how Done via BOGSAT (“Bunch Of Guys Sitting Around Table”) Challenge: a new map that better describes the past may or may not better predict the future

Deciding whether to remake a map is like deciding after a coin has come up heads a number of times whether to continue assuming that the coin is fair and the run is a low-probability event, or to change to a model in which the coin is assumed to be biased. Changing the model may describe future worse ?

Bayes’ Rule – how much to change depends on one’s confidence in prior model Revised probability model = Likelihood of observations given the prior model x Prior probability model If you were confident that the coin was fair, you would probably not change your model. If you were given the coin at a magic show, your confidence would be lower and you would be more likely to change your model. ?

Assume Poisson earthquake recurrence with λ = 1/T = 1/50 = 0.02 years This estimate is assumed (prior) to have mean μ and standard deviation σ If earthquake occurs after only 1 year The updated forecast, described by the posterior mean, increasingly differs from the initial forecast (prior mean) when the uncertainty in the prior distribution is larger. The less confidence we have in the prior model, the more a new datum can change it.

 We need  We need agreed ways of assessing how well hazard maps performed and thus whether one map performed better than another.  This information is crucial to tell how much confidence to have in using them for very expensive policy decisions.  Although no single metric alone fully characterizes map behavior, using several metrics can provide useful insight for comparing and improving maps. and  Deciding when and how to revise hazard maps should combine BOGSAT – subjective judgement given limited information - and Bayes – ideas about parameter uncertainty. Conclusions

Challenge U.S. Meteorologists (Hirschberg et al., 2011) have adopted a goal of “routinely providing the nation with comprehensive, skillful, reliable, sharp, and useful information about the uncertainty of hydrometeorological forecasts.” Although seismologists have a tougher challenge and a longer way to go, we should try to do the same for earthquake hazards.

Bayes and BOGSAT: Issues in When and How to Revise Earthquake Hazard Maps Bayes and BOGSAT: Issues in When and How to Revise Earthquake Hazard Maps Seth Stein, Bruce D. Spencer, Edward Brooks Seismological Research Letters, Volume 86, Number doi: / What to do after an earthquake yielding shaking larger than anticipated?

 Japanese seismic-hazard maps before and after the 2011 Tohoku earthquake. The predicted hazard has been increased both along the east coast, where the 2011 earthquake occurred, and on the west coast. ( ‑ shis.bosai.go.jp/map/?lang=en)  Comparison of successive Italian hazard maps. The 1999 map was updated to reflect the 2002 Molise earthquake, and the 2006 map will likely be updated after the 2012 Emilia earthquake (Stein et al., 2013).

How to measure map performance? Implicit probabilistic map criterion: after appropriate time predicted shaking exceeded at only a fraction p of sites Define fractional site exceedance metric M0(f,p) = |f – p| where f is fraction of sites exceeding predicted shaking Ideal map has M0 = 0 M0=0

Italian maps, which predicted the expected shaking in the next 500 years, require updating within a decade.

Japanese maps required updating within a decade. No formal or objective criteria are used to decide when & what to change Done via BOGSAT (“Bunch Of Guys Sitting Around Table”) Probability of intensity 6-lower in 30 yrs