Estimating ETAS 1. Straightforward aspects. 2. Main obstacles. 3. A trick to simplify things enormously. 4. Simulations and examples. 1.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Linear Regression.
Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Expectation Maximization
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Asymptotic error expansion Example 1: Numerical differentiation –Truncation error via Taylor expansion.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Forecasting JY Le Boudec 1. Contents 1.What is forecasting ? 2.Linear Regression 3.Avoiding Overfitting 4.Differencing 5.ARMA models 6.Sparse ARMA models.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Maximum likelihood (ML) and likelihood ratio (LR) test
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Lecture #18 EEE 574 Dr. Dan Tylavsky Nonlinear Problem Solvers.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.
Maximum likelihood (ML) and likelihood ratio (LR) test
1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Parametric Inference.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Maximum likelihood (ML)
Simplifying a Variable Expression
Stability and accuracy of the EM methodology In general, the EM methodology yields results which are extremely close to the parameter estimates of a direct.
Probability theory: (lecture 2 on AMLbook.com)
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
The Triangle of Statistical Inference: Likelihoood
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Lecture 19: More EM Machine Learning April 15, 2010.
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
Chapter 7 Point Estimation
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Jointly distributed Random variables Multivariate distributions.
Intro to Probability Slides from Professor Pan,Yan, SYSU.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
Electricity and Magnetism
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
Computer Graphics Inf4/MSc Computer Graphics Lecture 4 Line & Circle Drawing.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Some tricks for estimating space-time point process models.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Rick Paik Schoenberg, UCLA Statistics
More about Posterior Distributions
Introduction to EM algorithm
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Estimating ETAS 1. Straightforward aspects. 2. Main obstacles. 3. A trick to simplify things enormously. 4. Simulations and examples. 1

1. Point process estimation (or inversion) is easy. a) Point processes can generally be very well estimated by maximum likelihood estimation (MLE). MLEs are generally consistent, asymptotically normal, and efficient. Fisher (1925), Cramer (1946), Ogata (1978), Le Cam (1990). b) Maximizing a function is fast and easy. 400 yrs of work on this since Newton (1669). Every computer package has a strong quasi-Newton minimization routine, such as optim() in R. For instance, suppose the function you want to minimize is f(a,b) = (a-4.19) 4 + (b-7.23) 4. f = function(p) (p[1]-4.19)^4 + (p[2]-7.23)^4 p = c(1,1) ## initial guess at (a,b) est = optim(p,f) c) The likelihood function is fairly simple. In practice, it’s a little easier to compute log(likelihood), and minimize that. log(likelihood) = ∑ log{ (t i,x i,y i )} - ∫∫∫  (t,x,y) dt dx dy. ETAS (Ogata ‘98), (t,x,y) = µ  (x,y) + ∑g(t-t j, x-x j, y-y j; M j ), where g(t,x,y,M) = K (t+c) -p e a(M-M0) (x 2 + y 2 + d) -q or K (t+c) -p {(x 2 + y 2 )e -a(M-M0) + d} -q. 2

1. Point process estimation (or inversion) is easy. ETAS (Ogata ‘98), (t,x,y) = µ  (x,y) + ∑g(t-t j, x-x j, y-y j; M j ) where g(t,x,y,M) = K (t+c) -p e a(M-M0) (x 2 + y 2 + d) -q or K (t+c) -p {(x 2 + y 2 )e -a(M-M0) + d} -q. To make these spatial and temporal parts of g densities, I suggest writing g as g(t,x,y,M) = {K (p-1) c p-1 (q-1) d q-1 /  } (t+c) -p e a(M-M0) (x 2 + y 2 + d) -q or g(t,x,y,M) = {K (p-1) c p-1 (q-1) d q-1 /  } (t+c) -p {(x 2 + y 2 )e -a(M-M0) + d} -q. Two reasons for this: 1) It’s easy to see how to replace g by another density. 2) For each earthquake (t j,x j,y j,M j ), ∫∫∫ g(t-t j, x-x j, y-y j; M j ) dt dx dy = K e a(M-M0). 3

2. Point process estimation is hard! Main obstacles: a) Small problems with optimization routines. Extreme flatness, local maxima, choosing a starting value. b) The integral term in the log likelihood. log(likelihood) = ∑ log{ (t i,x i,y i )} - ∫∫∫  (t,x,y) dt dx dy. The sum is easy, but the integral term is extremely difficult to compute. By far the hardest part of estimation (Ogata 1998, Harte 2010). Ogata ‘98: divide space around each pt into quadrants and integrate over them. Harte ‘10: PtProcess uses optim() or nlm(). User must calculate the ∫ somehow. Numerical approximation is slow and is a poor approx. for some values of p,q, etc. 1,000 computations x 1,000 events x 1,000 function calls = 1 billion computations. Problem b) contributes to a reluctance to repeat estimation and check on a). 4

3. A suggested trick. Recall that log(likelihood) = ∑ log{ (t i,x i,y i )} - ∫∫∫  (t,x,y) dt dx dy, where (t,x,y) = µ  (x,y) + ∑g(t-t j, x-x j, y-y j; M j ). After normalizing g, ∫∫∫ g(t-t j, x-x j, y-y j; M j ) dt dx dy = K e a(M-M0), if the integral were over infinite time and space. Technically, in evaluating ∫∫∫  (t,x,y) dt dx dy, the integral is just over time [0,T] and in some spatial region S. I suggest ignoring this and approximating ∫∫∫ g(t-t j, x-x j, y-y j; M j ) dt dx dy ~ K e a(M-M0). Thus, ∫∫∫  (t,x,y) dt dx dy = µT + ∫∫∫ ∑ g(t-t j, x-x j, y-y j; M j ) dt dx dy = µT + ∑ ∫∫∫ g(t-t j, x-x j, y-y j; M j ) dt dx dy ~ µT + K ∑ e a(Mj-M0), which doesn’t depend on c,p,q,d, and is extremely easy to compute. 5

4. Simulation and example. 100 ETAS processes, estimated my way. 6

4. Simulation and example. % error in µ, from 100 ETAS simulations. 7

4. Simulation and example. CA earthquakes from 14 months after Hector Mine, M ≥ 3, from SCSN / SCEDC, as discussed in Ogata, Y., Jones, L. M. and Toda, S. (2003). 8

4. Simulation and example. Hector Mine data. Convergence of ETAS parameters is evident after ~ 200 days. Note that this repeated estimation over different time windows is easy if one uses the integral trick. This seems to lead to more stability in the estimates, since a little local maximum is less likely to appear repeatedly in all these estimations. 9

4. Simulation and example. Hector Mine data. ETAS parameter estimates as ratios of final estimates. 10

Suggestions. 1. Write the triggering function as a density in time and space. 2. Approximate the integral term as K ∑ e a(Mj-M0). 3. Given data from time 0 to time T, estimate ETAS using data til time T/100, then 2T/100, etc., and assess convergence. Conclusions. It’s fast, easy, and works great. Huge reduction in run time. Enormous reduction in programming. 11