Stability and accuracy of the EM methodology In general, the EM methodology yields results which are extremely close to the parameter estimates of a direct.

Slides:

Advertisements

Similar presentations

Unsupervised Learning

Advertisements

Estimating ETAS 1. Straightforward aspects. 2. Main obstacles. 3. A trick to simplify things enormously. 4. Simulations and examples. 1.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Statistical Estimation and Sampling Distributions

Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.

1 Multiple Frame Surveys Tracy Xu Kim Williamson Department of Statistical Science Southern Methodist University.

Integration of sensory modalities

Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},

Chapter 2: Lasso for linear models

What is Statistical Modeling

Overview Full Bayesian Learning MAP learning

Yan Y. Kagan Dept. Earth and Space Sciences, UCLA, Los Angeles, CA , EARTHQUAKE PREDICTABILITY.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Earthquake spatial distribution: the correlation dimension (AGU2006 Fall, NG43B-1158) Yan Y. Kagan Department of Earth and Space Sciences, University of.

Yan Y. Kagan, David D. Jackson Dept. Earth and Space Sciences, UCLA, Los Angeles, CA ,

Evaluating Hypotheses

Parametric Inference.

Expectation Maximization Algorithm

Earthquake predictability measurement: information score and error diagram Yan Y. Kagan Department of Earth and Space Sciences, University of California.

Maximum Likelihood (ML), Expectation Maximization (EM)

Expectation-Maximization

A First Peek at the Extremogram: a Correlogram of Extremes 1. Introduction The Autocorrelation function (ACF) is widely used as a tool for measuring Serial.

Visual Recognition Tutorial

Yan Y. Kagan Dept. Earth and Space Sciences, UCLA, Los Angeles, CA , Global.

Yan Y. Kagan Dept. Earth and Space Sciences, UCLA, Los Angeles, CA , Rules of the Western.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Maximum likelihood (ML)

Yan Y. Kagan Dept. Earth and Space Sciences, UCLA, Los Angeles, CA , STATISTICAL.

If we build an ETAS model based primarily on information from smaller earthquakes, will it work for forecasting the larger (M≥6.5) potentially damaging.

Ch 8.1 Numerical Methods: The Euler or Tangent Line Method

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):

FULL EARTH HIGH-RESOLUTION EARTHQUAKE FORECASTS Yan Y. Kagan and David D. Jackson Department of Earth and Space Sciences, University of California Los.

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

Yan Y. Kagan Dept. Earth and Space Sciences, UCLA, Los Angeles, CA ,

Modern Navigation Thomas Herring

Toward urgent forecasting of aftershock hazard: Simultaneous estimation of b-value of the Gutenberg-Richter ’ s law of the magnitude frequency and changing.

Sampling W&W, Chapter 6. Rules for Expectation Examples Mean: E(X) =  xp(x) Variance: E(X-  ) 2 =  (x-  ) 2 p(x) Covariance: E(X-  x )(Y-  y ) =

Robust Quantification of Earthquake Clustering: Overcoming the Artifacts of Catalog Errors Ilya Zaliapin Department of Mathematics and Statistics University.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

Random stress and Omori's law Yan Y. Kagan Department of Earth and Space Sciences, University of California Los Angeles Abstract We consider two statistical.

Gile Sampling1 Sampling. Fundamental principles. Daniel Gile

Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.

1. Difficulty of point process model evaluation. 2. RELM and CSEP. 3. Numerical summaries (L-test, N-test, etc.). 4. Functional summaries (error diagrams,

1 Psych 5500/6500 Introduction to the F Statistic (Segue to ANOVA) Fall, 2008.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

GLOBAL EARTHQUAKE FORECASTS Yan Y. Kagan and David D. Jackson Department of Earth and Space Sciences, University of California Los Angeles Abstract We.

Yan Y. Kagan Dept. Earth and Space Sciences, UCLA, Los Angeles, CA ,

Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

Machine Learning 5. Parametric Methods.

California Earthquake Rupture Model Satisfying Accepted Scaling Laws (SCEC 2010, 1-129) David Jackson, Yan Kagan and Qi Wang Department of Earth and Space.

Topics 1 Specific topics to be covered are: Discrete-time signals Z-transforms Sampling and reconstruction Aliasing and anti-aliasing filters Sampled-data.

Distinguishing Artifacts of Earthquake Catalogs From Genuine Seismicity Patterns Ilya Zaliapin Department of Mathematics and Statistics University of Nevada,

Two Types of Empirical Likelihood Zheng, Yan Department of Biostatistics University of California, Los Angeles.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.

Jiancang Zhuang Inst. Statist. Math. Detecting spatial variations of earthquake clustering parameters via maximum weighted likelihood.

Abstract The space-time epidemic-type aftershock sequence (ETAS) model is a stochastic process in which seismicity is classified into background and clustering.

Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Virtual University of Pakistan

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Integration of sensory modalities

Parametric Methods Berlin Chen, 2005 References:

Learning From Observed Data

Mathematical Foundations of BME Reza Shadmehr

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Stability and accuracy of the EM methodology In general, the EM methodology yields results which are extremely close to the parameter estimates of a direct maximization of the log-likelihood function. In addition, the EM methodology is extremely robust and doesn’t depend very much on starting values. The figure on the left shows histograms for the six parameters of the triggering function. They are based on 100 estimations of the ETAS model using random starting values. The starting values are drawn from a uniform distribution between one third and the threefold value of the true parameter. The histograms show that the EM results for the ETAS parameters don’t depend very much on the starting values. Note that the robustness of the EM methodology allows for simulation studies in which one can also assess the standard errors of the estimates. The theoretical properties of the estimators in the “direct ML” context can be quite difficult to derive and are usually only asymptotic. The validity of these asymptotic properties for a given (limited) data set is mostly unclear. The figure on the left shows such a simulation study. ETAS processes were simulated and re-estimated 100 times. The histograms show that the estimates are very close to the ‘true’ values and give a sense of the standard errors involved in this estimation procedure. This shows that the EM methodology is not only robust but also very accurate. ETAS estimation using the (stochastic) Expectation Maximization (EM) algorithm Applications of the EM methodology to ETAS estimation problems Estimation of spatial-temporal point process models using the (stochastic) EM algorithm and its application to California earthquakes Alejandro Veen and Frederic Paik Schoenberg Introduction: Maximum Likelihood estimation of the ETAS model ETAS model as an incomplete data problem 1.If we knew the “missing” data, all model parameters could be estimated easily (M-step) 2.If we knew the model parameters, we could stochastically reconstruct the “missing” data, or compute expectations (stochastic reconstruction step or E- step) The E-step (or stochastic reconstruction step) is performed using the current parameter vector θ s (at step s of the algorithm) in order to compute the probability vector below (the triggering function is denoted as g). The M-step updates θ. Stochastic EM Stochastic reconstruction step Stochastically reconstruct missing data using θ s Maximization-step θ s+1 = argmax θ loglik( θ | observable data, reconstructed missing data) The stochastic EM algorithm may be more intuitive, as it actually reconstructs the part of the data that is “missing”. Using θ s, the probability that a given earthquake i is triggered by the preceding earthquake j (or the probability that earthquake i is a background event) can be computed. This probability vector is used to randomly assign earthquake i to a triggering “parent” earthquake or in order to classify it as a background event. Note that this idea is similar to what is called “stochastic reconstruction” in Zhuang, Ogata, and Vere-Jones (2004). EM Expectation-step Using θ s, compute the expectation of the sufficient statistics of the complete data (observable and reconstructed missing) Maximization-step θ s+1 = argmax θ E θs [ loglik( θ | observable data, reconstructed missing data) ] While less intuitive, the non-stochastic version of the EM algorithm has considerable advantages. Here, the probability vector is used to compute the expected log-likelihood function, thus eliminating the random fluctuations of the stochastic EM. It can be shown that the EM algorithm provides results that are asymptotically equivalent to the direct maximization of the log-likelihood function. Maximization of the partial log-likelihood function From a theoretical point of view, a full information maximization of the log- likelihood function is preferable as it is an efficient estimation procedure. In practice, however, with limited data sets and limited computational resources, the maximization of the partial log-likelihood based on the triggering function is not only computationally less expensive, but often also more accurate, especially if ‘nice’ expressions for the partial derivatives exist. Asymptotically, both approaches will yield the same results. To illustrate the computational advantages of the EM algorithm using the partial log-likelihood, remember that the direct maximization of the log-likelihood function requires that all 7 parameters be estimated at the same time. Our proposed methodology breaks the estimation down into separate steps: 1.Find μ 2.Find c and p 3.Find d and q 4.Find a 5.Find K 0 Acknowledgements This material is based upon work supported by the National Science Foundation under Grant No We thank Yan Kagan, Ilya Zaliapin, and Yingnian Wu for helpful comments and the Southern California Earthquake Center for its generosity in sharing its data. All computations have been performed using R ( References Dempster, A., Laird, N., and Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B, 39/1,, 1977, 1 – 38. Ogata, Y., Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes, Journal of the American Statistical Association, Vol. 83, 1988, 9 – 27. Ogata, Y., Space-time point-process models for earthquake occurrences, Annals of the Institute of Statistical Mathematics, Vol. 50, 1998, 379 – 402. Sornette, D. and Werner, M.J., Apparent Clustering and Apparent Background Earthquakes Biased by Undetected Seismicity, J. Geophys. Res., Vol Zhuang, J., Y. Ogata, and D. Vere-Jones (2002), Stochastic declustering of space-time earthquake occurrences, Journal of American Statistical Association, Vol. 97, No. 458, 369 – 380. Zhuang, J., Y. Ogata, and D. Vere-Jones (2004), Analyzing earthquake clustering features by using stochastic reconstruction, J. Geophys. Res., Vol UCLA Department of Statistics 8125 Math Sciences Bldg. Los Angeles, CA , USA Epidemic-type aftershock sequence (ETAS) model Introduced by Ogata (1988), the ETAS model has become the standard point process model for earthquake occurrences A range of temporal and spatial-temporal specifications exist with varying degrees of complexity. In this work, the following spatial-temporal ETAS model from Ogata (1998, p. 384) is used: where λ(x,y,t) is the conditional intensity of the process at location (x,y) and time t. The background intensity is denoted as μ(x,y) and the index for the earthquake occurrences i is ordered in time, such that t i ≤ t i+1. The parameters of the triggering function are K 0, a, c, p, d, q and only earthquakes with magnitudes not smaller than m 0 are included in the data set. ETAS estimation using Maximum Likelihood Estimation is usually performed using the Maximum Likelihood (ML) method. Closed form solutions for the maximum rarely exist as the log-likelihood function is typically highly non-linear. This is why the ML approach employs numerical maximization algorithms (or rather minimization algorithms with the negative log-likelihood (nll) as the objective function). The use of ML is backed by extensive theoretical work. Usually, ML estimators are (asymptotically) unbiased and the procedure allows for a full-information estimation procedure. On the other hand, some specifications of the ETAS model have become quite complex which brings some challenges when employing ML. For instance, reasonable starting values are needed and the algorithms can be quite slow for complex models and large data sets as the log-likelihood function can be flat or multimodal (or both). Moreover, due to the non-linearity of the nll and the lack of simple derivatives, all parameters have to be estimated at once. These figures show the log-likelihood of model λ(x,y,t) varying two parameters at a time and using the ‘true’ parameter values for the other parameters (see table on the right). The ‘true’ parameter values (red dot) are largely based on discussions with seismologists. Background events of a homogeneous Poisson process are simulated over a period of about 20 years and an area of 8°×5°, which is roughly the size of Southern California. Magnitudes are simulated from a truncated exponential distribution with values between 2 and 8. The formula of the log-likelihood is where θ is the parameter vector (i.e. θ = (μ, K 0, a, c, p, d, q)), N is the number of earthquakes in the data set, and S is the space-time window in which earthquakes occur (i.e. S = [x l, x r ]×[y l, y r ]×[t l, t r ]). Note that only two parameters can be shown at once in each of these pictures. However, the log-likelihood function takes on values in a 7-dimensional space and all 7 parameters have to be estimated at once. Moreover, the log-likelihood function is relatively flat and there are “interaction effects” in the sense that a value of d=0.08 and q=2.3, while much farther away from the ‘true’ parameters d=0.015 and q=1.8, have a higher log-likelihood value than, for instance, d=0.014 and q=1.9. “true” parameter value μ(x,y) K0K a c0.01 p1.5 d0.015 q1.8 observable data “missing” data complete data The expected partial log-likelihood function is not as flat and has less ‘interactions’ than the full-information likelihood function above. Thus, it is easier to estimate parameters. Estimation of K 0 The figure on the left shows that the EM algorithm (blue) performs better when estimating K 0 compared to a direct maximization of the log-likelihood function (orange). Using the same starting values (black circles) the EM results coincide with one another and are much closer to the true value (red) than the results of the ML procedure. Results from Southern California The picture on the left shows a data set compiled by the Southern California Earthquake Center (SCEC). We focus here on the spatial locations of a subset of the SCEC data occurring between 1/01/1984 and 06/17/2004 in a rectangular area around Los Angeles, California, between longitudes -122° and -114° and latitudes 32° and 37° (approximately 733 km by 556 km). The data set consists of earthquakes with magnitude not smaller than 3:0, of which 6,796 occurred. The estimation results using the EM methodology are given on the right. μ(x,y) K0K e-05 a c p1.215 d4.833e-05 q1.467