Bayesian Spatial Modeling of Extreme Precipitation Return Levels Daniel COOLEY, Douglas NYCHKA, and Philippe NAVEAU (2007, JASA)

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009.
1 McGill University Department of Civil Engineering and Applied Mechanics Montreal, Quebec, Canada.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Bayesian Estimation in MARK
Analysis of Extremes in Climate Science Francis Zwiers Climate Research Division, Environment Canada. Photo: F. Zwiers.
Hypothesis testing Week 10 Lecture 2.
Extremes ● An extreme value is an unusually large – or small – magnitude. ● Extreme value analysis (EVA) has as objective to quantify the stochastic behavior.
Climate Change and Extreme Wave Heights in the North Atlantic Peter Challenor, Werenfrid Wimmer and Ian Ashton Southampton Oceanography Centre.
Chapter 6 Continuous Random Variables and Probability Distributions
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Evaluating Hypotheses
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
CHAPTER 6 Statistical Analysis of Experimental Data
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
Extreme Value Analysis, August 15-19, Bayesian analysis of extremes in hydrology A powerful tool for knowledge integration and uncertainties assessment.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Climate Variability and Uncertainty in Flood Risk Management in Colorado: An interdisciplinary project on extremes Rebecca Morss, Doug Nychka Mary Downton,
Flood Frequency Analysis
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Common Probability Distributions in Finance. The Normal Distribution The normal distribution is a continuous, bell-shaped distribution that is completely.
H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK L ÁSZLÓ M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest,
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Extreme Value Analysis What is extreme value analysis?  Different statistical distributions that are used to more accurately describe the extremes of.
February 3, 2010 Extreme offshore wave statistics in the North Sea.
Bayesian Analysis and Applications of A Cure Rate Model.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Some advanced methods in extreme value analysis Peter Guttorp NR and UW.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
1 A non-Parametric Measure of Expected Shortfall (ES) By Kostas Giannopoulos UAE University.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
A Survey of Statistical Methods for Climate Extremes Chris Ferro Climate Analysis Group Department of Meteorology University of Reading, UK 9th International.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
CS Statistical Machine learning Lecture 24
Extreme Value Theory for High Frequency Financial Data Abhinay Sawant April 20, 2009 Economics 201FS.
Identification of Extreme Climate by Extreme Value Theory Approach
New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop,
Extreme Value Analysis
Chapter 13 Sampling distributions
Chris Ferro Climate Analysis Group Department of Meteorology University of Reading Extremes in a Varied Climate 1.Significance of distributional changes.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Application of Extreme Value Theory (EVT) in River Morphology
Univariate Gaussian Case (Cont.)
SUR-2250 Error Theory.
MCMC Output & Metropolis-Hastings Algorithm Part I
Extreme Value Theory for High-Frequency Financial Data
Overview of Downscaling
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Flood Frequency Analysis
When we free ourselves of desire,
Outlier Discovery/Anomaly Detection
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Hydrologic Statistics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 09: BAYESIAN LEARNING
Environmental Statistics
Continuous Random Variables: Basics
Presentation transcript:

Bayesian Spatial Modeling of Extreme Precipitation Return Levels Daniel COOLEY, Douglas NYCHKA, and Philippe NAVEAU (2007, JASA)

Background July 28, 1997, a rainstorm in Fort Collins, Colorado –killed five people –caused $250 million in damage 1976 Big Thompson flood near Loveland, Colorado –Killed 145 people 1965 South Platte flood –$600 million in damages around Denver

extreme precipitation events understanding their frequency and intensity is important for public safety and long-term planning Challenges –limited temporal records –extrapolate the distributions to locations where observations are not available Data –Precipitation amount at some stations –Possibly some other covariates

Measure of extreme events Return level –The r-year return level is the quantile that has probability 1/r of being exceeded in a particular year. P(X>t r ) = 1/r Precipitation return levels –given in the context of the duration of the precipitation event –The r-year return level of a d-hour (e.g., 6- or 24-hour) duration interval is reported. –The standard levels for the NWS’s most recent data products are quite extensive with duration intervals ranging from 5 minutes to 60 days and with return levels for 2–500 years. This article focuses on providing return level estimates for daily precipitation (24 hours)

Most recent precipitation atlas for Colorado Produced in 1973 the atlas provides point estimates of 2-, 5-, 10-, 25-, 50-, and 100-year return levels for duration intervals of 6 and 24 hours. Shortcoming –it does not provide uncertainty measures of its point estimates

Extreme value theory (EVT) Statistical models for the tail of a probability distribution Univariate case: generalized extreme value (GEV) distribution –Given iid continuous data Z1,Z2,...,Zn and letting Mn = max(Z1,Z2,...,Zn), it is known that if the normalized distribution of Mn converges as n→∞, then it converges to a GEV

Generalized Pareto distribution (GPD) Using the maxima only disregards other extreme data that could provide additional information. GPD –based on the exceedances above a threshold –Exceedances (the amounts that observations exceed a threshold u) should approximately follow a GPD as u becomes large and sample size increases

GPD Tail of the distribution Scale parameter Shape parameter controls the tail –

More EVT Exceedance rate

Extreme of spatial data Weather describes the state of the atmosphere at a given time –Extreme weather events can be modeled by theory on the dependence of extreme observations Climate at a given location is the distribution over a long period of time –climatological quantities, such as return levels, and their spatial dependence must be modeled outside of the framework above –How does the distribution of precipitation vary over space?

Goal Let Z(x) denote the total precipitation for a given period of time (e.g., 24 hours) and at location x. The goal is to provide inference for the probability P(Z(x) > z + u) for all locations, x, in a particular domain and for u large –Given this function, one can compute return levels and other summary measures –To produce a return level map with measure of uncertainty

Basic idea In the GPD model, we add a spatial component by considering all parameters to be functions of a location x in the study area. We assume that the values of result from a latent spatial process that characterizes the extreme precipitation and arises from climatological and orographic effects. The dependence of the parameters characterizes the similarity of climate at different locations

A Bayesian study A study of 24-hour precipitation extremes for the Front Range region of Colorado –Estimate potential flooding –Apr 1 – Oct 31 –75% of Colorado’s population lives in this area

Study Region

Data 56 weather stations Daily total precipitation amounts during –21 stations have over 50 years of data –14 stations have less than 20 years of data –All stations have some missing values Covariates –Elevation –Mean precipitation (MSP) –Remark: covariate information is needed for the entire region to interpolate over the study region and produce a precipitation map

Boulder Station

Data Precision Boulder Station –prior to 1971, precipitation was recorded to the nearest 1/100th of an inch (.25 mm) –after 1971, recorded to the nearest 1/10th of an inch (2.5 mm) All but three stations similarly switched their level of precision around 1970 Low precision data is a discretization of the high precision data

Treatment to discretization True value is uniformly distributed around the observed value –What is the effect of such an assumption? Adjust the likelihood –d is the length of the interval

How to choose the threshold u? Bias-variance trade off –If u is large, distribution is close to GPD –If u is large, less data can be used Finally, the threshold is taken as 0.55 inches –a threshold sensitivity analysis of model runs indicates that the shape parameter is more consistently estimated above this threshold –7789 exceedances (2% of the original data)

Residual dependence Assumption –the precipitation observations are conditionally independent spatially and temporally given the stations’ parameters –the spatial dependence is accounted for in the stations’ parameters This conditional independence may not be true, though.

temporal independence Temporal dependence –When dependence is short range and extremes do not occur in clusters, maxima still converges to GEV in distribution –If a station had consecutive days that exceeded the threshold, we declustered the data by keeping only the highest measurement –Declustering actually did not change the results much

Spatial dependence The authors tested for spatial dependence in the annual maximum residuals of the stations –there was a low level of dependence between stations within 24 km (15 miles) of one another and no detectable dependence beyond this distance. –there are very few stations within this distance that record data for the same time period

Seasonal effects Restricting our analysis to the nonwinter months reduces seasonality inspecting the data from several sites showed no obvious seasonal effect

Model for Threshold Exceedance Hierarchical model –Layer 1: data at each station –Layer 2: the latent process that drives the climatological extreme precipitation for the region –Layer 3: the prior distributions of the parameters that control the latent process

Data layer for return level A GPD distribution Reparametrization Let be the kth recorded precipitation amount at location density

Process layer A structure that relates the parameters of the data layer to the orography and climatology of the region. Spatial (longitude/latitude) space  climate (elevation/MSP) space –Stations are sparse in the spatial space –Stations far away spatially can be close in the climate space –MSP: mean precipitation

Scale parameter : A Gaussian process with

Shape parameter A single value for the entire study region with a Unif(-Inf, Inf) prior Two values –One for the mountain stations –One for the plain stations A Gaussian process with structure similar to the scale parameter

Process layer

Priors of Prior independence Regression parameter: noninformative Spatial parameter –Noninformative leads to improper posterior –Informative priors from MLE Shape parameter

Priors

Model for Exceedance Rate To know the return level, we need to know both the model parameters and the exceedance rate Assume each station’s number of exceedances is binomial with probability parameter Logit transformation Assume the logit transformed parameter as a Gaussian process Similar prior specification

MCMC Metropolis within Gibbs –Proposal distribution is obtained using normal approximation or random walk –Three parallel chains –Each chain has 20,000 iterations –2000 burn-in steps –Test for convergence: Gelman<1.05 Draws are used to perform spatial interpolation and inference

Point estimate for log- transformed GPD scale parameter

Point estimate for 25- year return level for daily precipitation

0.025 and quantile of the 25-year return level

Sensitivity analysis Sensitivity of the inference to prior of Ran Model 7 with –Original prior for : Unif[6/7,12] –Alternative prior : Unif[0.214,6] –Posterior of is sensitive to the prior –But the product is less sensitive, and it is what is important for interpolation

Conclusions A Bayesian analysis for spatial extremes –Model for exceedances –Model for threshold exceedance rate parameter By performing the spatial analysis on locations defined by climatological coordinates, the authors were able to better model regional differences for this geographically diverse study area. Produce a map of return levels with features not well shown by the 1973 atlas –an east–west region of higher return levels north of the Palmer Divide –a region of lower return levels around Greeley –region-wide uncertainty measures