Some tricks for estimating space-time point process models.

Slides:

Advertisements

Similar presentations

Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Advertisements

Los Padres National Forest

Linear Regression.

Estimating ETAS 1. Straightforward aspects. 2. Main obstacles. 3. A trick to simplify things enormously. 4. Simulations and examples. 1.

1 – Stress contributions 2 – Probabilistic approach 3 – Deformation transients Small earthquakes contribute as much as large earthquakes do to stress changes.

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.

Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.

EARS1160 – Numerical Methods notes by G. Houseman

1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes.

1 Applications of point process modeling, separability testing, & estimation to wildfire hazard assessment 1.Background 2.Problems with existing models.

Parametric Inference.

Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

PARTIAL DERIVATIVES PARTIAL DERIVATIVES One of the most important ideas in single-variable calculus is:  As we zoom in toward a point on the graph.

Classification and Prediction: Regression Analysis

Stability and accuracy of the EM methodology In general, the EM methodology yields results which are extremely close to the parameter estimates of a direct.

Forecasting occurrences of wildfires & earthquakes using point processes with directional covariates Frederic Paik Schoenberg, UCLA Statistics Collaborators:

Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.

880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.

Oct-03Learning to Use FOFEM 5: Advanced Lesson Missoula Fire Sciences Laboratory Systems for Environmental Management Learning to Use FOFEM 5 Volume II:

Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,

1 Statistical Distribution Fitting Dr. Jason Merrick.

A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Karen Felzer & Emily Brodsky Testing Stress Shadows.

CAMELS CCDAS A Bayesian approach and Metropolis Monte Carlo method to estimate parameters and uncertainties in ecosystem models from eddy-covariance data.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Gaussian Mixture Models and Expectation-Maximization Algorithm.

1 Applications of space-time point processes in wildfire forecasting 1.Background 2.Problems with existing models (BI) 3.A separable point process model.

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

1 1.Definitions & examples 2.Conditional intensity & Papangelou intensity 3.Models a) Renewal processes b) Poisson processes c) Cluster models d) Inhibition.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Vector Analysis 15 Copyright © Cengage Learning. All rights reserved.

Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.

Entropy generation transient analysis of a grassfire event through numerical simulation E. Guelpa V. VERDA (IEEES-9), May 14-17, 2017, Split, Croatia.

1. Analysis and Reanalysis Products

Random Numbers and Simulation

Data Transformation: Normalization

Chapter 7. Classification and Prediction

HST 583 fMRI DATA ANALYSIS AND ACQUISITION

Hidden Markov Models.

ECOSYSTEM MODEL EVALUATION TEMPLATE

CS b553: Algorithms for Optimization and Learning

Model Inference and Averaging

A Simple Artificial Neuron

Machine Learning Basics

USGS Getty Images Frederic Paik Schoenberg, UCLA Statistics

Clustering (3) Center-based algorithms Fuzzy k-means

Generalized Linear Models (GLM) in R

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Model evaluation for forecasts of wildfire activity and spread

"Did your model account for earthworms?" Rick Paik Schoenberg, UCLA

Statistics Review ChE 477 Winter 2018 Dr. Harding.

CHAPTER 26: Inference for Regression

Rick Paik Schoenberg, UCLA Statistics

Rick Paik Schoenberg, UCLA Statistics

Discrete Event Simulation - 4

FSOI adapted for used with 4D-EnVar

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Cross-validation for the selection of statistical models

Computing and Statistical Data Analysis / Stat 7

Boltzmann Machine (BM) (§6.4)

#21 Marginalize vs. Condition Uninteresting Fitted Parameters

ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.

Feature Selection Methods

Chapter 3 General Linear Model

Simulation Berlin Chen

Presentation transcript:

Some tricks for estimating space-time point process models. Rick Paik Schoenberg, UCLA 1. Nonparametric estimation. 2. Parametric estimation of clustering. 3. Separable estimation for covariates.

1. Nonparametric estimation of clustering. Estimate the rate l(t,x,y,m) = m(x,y,m) + g(t-ti, x-xi, y-yi, mi). from Marsan and Lengliné (2008).

1. Nonparametric estimation. Marsan, D. and Lengliné, O. (2008). Extending earthquakes' reach through cascading. Science 319, 1076-1079.

l(t,x,y,m) = m(x,y,m) + g(t-ti, x-xi, y-yi, mi). The idea in Marsan and Lengliné (2008) is to use a kind of E-M algorithm: 0. Start with some naive estimate of m and g. 1. Using this, compute wij = P(eq i triggered earthquake j), and w0j = P(earthquake j was a mainshock). 2. Using wij and w0j, estimate m and g by MLE. Repeat steps 1 and 2 until convergence. With m a constant, the estimate of g in step 2 is simply g(Dt, Dr, m) = ∑A wij / [Nm S ∂t], where A is the set of pairs of points whose distance ~ Dt, Dr, mi ~ m, i.e. ti – tj in (Dt - ∂t, Dt + ∂t), etc. Nm is the number of eqs with mi ~ m, and S is the total surface covered by the disk with radius Dr +/- ∂r.

l(t,x,y,m) = m(x,y,m) + g(t-ti, x-xi, y-yi, mi). This nonparametric estimate can then be used as a guide to come up with a parametric form for the triggering function, g. Fox et al., 2015.

2. Parametric estimation of clustering. The Hawkes process (Hawkes 1971) is a useful form for modeling clusteted processes, where l(t,x,y,m) = m(x,y,m) + g(t-ti, x-xi, y-yi, mi). An example is the Epidemic-Type Aftershock Sequence (ETAS) model of Ogata (1988, 1998), which has been used for earthquakes as well as invasive species (Balderama et al. 2012) and crime (Mohler et al. 2011). With ETAS, m(x,y,m) = m(x,y) f(m), and e.g.

parametric point process estimation is easy? a. Point processes can generally be very well estimated by maximum likelihood estimation (MLE). MLEs are generally consistent, asymptotically normal, and efficient Fisher (1925), Cramer (1946), Ogata (1978), Le Cam (1990). b. Maximizing a function is fast and easy. 400 yrs of work on this since Newton (1669). Every computer package has a strong quasi-Newton minimization routine, such as optim() in R. For instance, suppose the function you want to minimize is f(a,b) = (a-4.19)4 + (b-7.23)4. f = function(p) (p[1]-4.19)^4 + (p[2]-7.23)^4 p = c(1,1) ## initial guess at (a,b) est = optim(p,f)$par c. The likelihood function is fairly simple. In practice, it’s convenient to compute log(likelihood), and minimize that. log(likelihood) = ∑ log{l(ti,xi,yi)} - ∫∫∫ l(t,x,y) dt dx dy. ETAS (Ogata ‘98), l(t,x,y) = µ r(x,y) + ∑g(t-tj, x-xj, y-yj; Mj), where g(t,x,y,M) = K (t+c)-p ea(M-M0) (x2 + y2 + d)-q or K (t+c)-p {(x2 + y2)e-a(M-M0) + d}-q.

Parametric point process estimation is easy? ETAS (Ogata ‘98), l(t,x,y) = µ r(x,y) + ∑g(t-tj, x-xj, y-yj; Mj) where g(t,x,y,M) = K (t+c)-p ea(M-M0) (x2 + y2 + d)-q or K (t+c)-p {(x2 + y2)e-a(M-M0) + d}-q. To make these spatial and temporal parts of g densities, I suggest writing g as g(t,x,y,M) = {K (p-1) cp-1 (q-1) dq-1 / p} (t+c)-p ea(M-M0) (x2 + y2 + d)-q or g(t,x,y,M) = {K (p-1) cp-1 (q-1) dq-1 / p} (t+c)-p {(x2 + y2)e-a(M-M0) + d}-q. Two reasons for this: 1) It is easy to see how to replace g by another density. 2) For each earthquake (tj,xj,yj,Mj), ∫∫∫ g(t-tj, x-xj, y-yj; Mj) dt dx dy = K ea(M-M0).

Actually, parametric point process estimation is hard! Main obstacles: A. Small problems with optimization routines. Extreme flatness, local maxima, choosing a starting value. B. The integral term in the log likelihood. log(likelihood) = ∑ log{l(ti,xi,yi)} - ∫∫∫ l(t,x,y) dt dx dy. The sum is easy, but the integral term is extremely difficult to compute. By far the hardest part of estimation (Ogata 1998, Harte 2010). Ogata ‘98: divide space around each pt into quadrants and integrate over them. PtProcess, Harte ‘10: uses optim() or nlm(). User must calculate the ∫ somehow. Numerical approximation is slow and is a poor approx. for some values of Q. 1,000 computations x 1,000 events x 1,000 function calls = 1 billion computations. Problem B. contributes to reluctance to repeat estimation and check on A.

Suggestions (Schoenberg 2013). 1. Write the triggering function as a density in time and space. 2. Approximate the integral term as K ∑ ea(Mj-M0). 3. Given data from time 0 to time T, estimate ETAS progressively, using data til time T/100, then 2T/100, ..., and assess convergence.

What is behind this integral approximation? Recall that log(likelihood) = ∑ log{l(ti,xi,yi)} - ∫∫∫ l(t,x,y) dt dx dy, where l(t,x,y) = µ r(x,y) + ∑g(t-tj, x-xj, y-yj; Mj). After writing g as a density, ∫∫∫ g(t-tj, x-xj, y-yj; Mj) dt dx dy = K ea(M-M0), if the integral were over infinite time and space. Technically, in evaluating ∫∫∫ l(t,x,y) dt dx dy, the integral is just over time [0,T] and in some spatial region S. I suggest ignoring this and approximating ∫∫∫ g(t-tj, x-xj, y-yj; Mj) dt dx dy ~ K ea(M-M0). Thus, ∫∫∫ l(t,x,y) dt dx dy = µT + ∫∫∫ ∑ g(t-tj, x-xj, y-yj; Mj) dt dx dy = µT + ∑ ∫∫∫ g(t-tj, x-xj, y-yj; Mj) dt dx dy ~ µT + K ∑ ea(Mj-M0), which doesn’t depend on c,p,q,d, and is extremely easy to compute.

Simulation and example. 100 ETAS processes, estimated my way.

Simulation and example. % error in µ, from 100 ETAS simulations.

Simulation and example. CA earthquakes from 14 months after Hector Mine, M ≥ 3, from SCSN / SCEDC, as discussed in Ogata, Y., Jones, L. M. and Toda, S. (2003).

Simulation and example. Hector Mine data. Convergence of ETAS parameters is evident after ~ 200 days. Note that this repeated estimation over different time windows is easy if one uses the integral trick. This seems to lead to more stability in the estimates, since a little local maximum is less likely to appear repeatedly in all these estimations.

Simulation and example. Hector Mine data. ETAS parameter estimates as ratios of final estimates.

Parametric point process estimation. Suggestions (Schoenberg 2013). 1. Write the triggering function as a density in time and space. 2. Approximate the integral term as K ∑ ea(Mj-M0). 3. Given data from time 0 to time T, estimate ETAS progressively, using data til time T/100, then 2T/100, ..., and assess convergence. It’s fast, easy, and works great. Huge reduction in run time. Enormous reduction in programming.

3. Separable Point Process estimation for covariates. for example, when modeling wildfire incidence as a function of fuel moisture, precipitation, temperature, windspeed, ....

Burning Index (BI) NFDRS: Spread Component (SC) and Energy Release Component (ERC), each based on dozens of equations. BI = [10.96 x SC x ERC] 0.46 Uses daily weather variables, drought index, and vegetation info. Human interactions excluded. Predicts: flame length, … area/fire? # of fires? # of fires? Total burn area?

Some BI equations: (From Pyne et al., 1996:) Rate of spread: R = IR x (1 + fw + fs) / (rbe Qig). Oven-dry bulk density: rb = w0/d. Reaction Intensity: IR = G’ wn h hMhs. Effective heating number: e = exp(-138/s). Optimum reaction velocity: G’ = G’max (b / bop)A exp[A(1- b / bop)]. Maximum reaction velocity: G’max = s1.5 (495 + 0.0594 s1.5) -1. Optimum packing ratios: bop = 3.348 s -0.8189. A = 133 s -0.7913. Moisture damping coef.: hM = 1 - 259 Mf /Mx + 5.11 (Mf /Mx)2 - 3.52 (Mf /Mx)3. Mineral damping coef.: hs = 0.174 Se-0.19 (max = 1.0). Propagating flux ratio: x = (192 + 0.2595 s)-1 exp[(0.792 + 0.681 s0.5)(b + 0.1)]. Wind factors: sw = CUB (b/bop)-E. C = 7.47 exp(-0.133 s0.55). B = 0.02526 s0.54. E = 0.715 exp(-3.59 x 10-4 s). Net fuel loading: wn = w0 (1 - ST). Heat of preignition: Qig = 250 + 1116 Mf. Slope factor: fs = 5.275 b -0.3 (tan f)2. Packing ratio: b = rb / rp.

Good news for BI: BI is positively associated with wildfire occurrence. Positive correlations with number of fires, daily area burned, and area per fire. Properly emphasizes windspeed, relative humidity.

Some problems with BI Correlations are low. Corr(BI, area burned) = 0.09 Corr(BI, # of fires) = 0.13 Corr(BI, area per fire) = 0.076 Corr(date, area burned) = 0.06 Corr(windspeed, area burned) = 0.159 Too high in Winter (esp Dec and Jan) Too low in Fall (esp Sept and Oct)

Separable Estimation for Point Processes Consider l(t, x1, …, xk; q). [For fires, x1=location, x2 = area.]

Separable Estimation for Point Processes Consider l(t, x1, …, xk; q). [For fires, x1=location, x2 = area.] Say l is multiplicative in mark xj if l(t, x1, …, xk; q) = q0 lj(t, xj; qj) l-j(t, x-j; q-j), where x-j = (x1,…,xj-1, xj+1,…,xk), same for q-j and l-j If l ~is multiplicative in xj ^ and if one of these holds, then qj, the partial MLE, is consistent (Schoenberg 2015): S l-j(t, x-j; q-j) dm-j = g, for all q-j. S lj(t, xj; qj) dmj = g, for all qj. ^ ~ S lj(t, x; q) dm = S lj(t, xj; qj) dmj = g, for all q.

Individual Covariates: Suppose l is multiplicative, lj(t,xj; qj) = f1[X(t,xj); b1] f2[Y(t,xj); b2], X and Y are independent, and the log-likelihood is differentiable w.r.t. b1, then the partial MLE of b1 is consistent. Suppose l is multiplicative and the jth component is additive: lj(t,xj; qj) = f1[X(t,xj); b1] + f2[Y(t,xj); b2]. If f2 is small S f2(Y; b2)2 / f1(X;~b1) dm / T ->p 0], then the partial MLE b1 is consistent.

Impact Model building. Model evaluation / dimension reduction. Excluded variables.

r = 0.16 (sq m)

(sq m) (F)

Relative AICs (Poisson - Model, so higher is better): Model Construction Wildfire incidence seems roughly multiplicative. (only marginally significant in separability test) Windspeed. RH, Temp, Precip. Tapered Pareto size distribution f, smooth spatial background m. [*] l(t,x,a) = f(a) m(x) b1exp(b2RH + b3WS) (b4 + b5Temp)(max{b6 - b7Prec,b8}) Relative AICs (Poisson - Model, so higher is better): Poisson RH BI Model [*] 262.9 302.7 601.1

% of fires correctly alarmed Comparison of Predictive Efficacy False alarms per year % of fires correctly alarmed BI 150: 32 22.3 Model [*]: 34.1 BI 200: 13 8.2 15.1

Earthquake weather example, Schoenberg 2015. model with temp model without temp

Conclusions: Start with nonparametric estimates of triggering and bg rate. For a parametric model with clustering, write the triggering function g as a density times K, so you can estimate by MLE, essentially approximating the integral of g as K. Build initial attempts at point process models by examining each covariate individually.