1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Linear Regression.
Brief introduction on Logistic Regression
Estimating ETAS 1. Straightforward aspects. 2. Main obstacles. 3. A trick to simplify things enormously. 4. Simulations and examples. 1.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Lwando Kondlo Supervisor: Prof. Chris Koen University of the Western Cape 12/3/2008 SKA SA Postgraduate Bursary Conference Estimation of the parameters.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
x – independent variable (input)
Correlation and Autocorrelation
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Applied Geostatistics
1 Applications of point process modeling, separability testing, & estimation to wildfire hazard assessment 1.Background 2.Problems with existing models.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Parametric Inference.
1 Some Current Problems in Point Process Research: 1. Prototype point processes 2. Non-simple point processes 3. Voronoi diagrams.
Visual Recognition Tutorial
SA basics Lack of independence for nearby obs
Topic 3: Regression.
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
Maximum likelihood (ML)
Classification and Prediction: Regression Analysis
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
STATISTICAL INFERENCE PART I POINT ESTIMATION
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Chapter 6 – Analysis of mapped point patterns This chapter will introduce methods for analyzing and modeling the spatial distribution of mapped point.
Goodness of fit testing for point processes with application to ETAS models, spatial clustering, and focal mechanisms (USGS) Frederic Paik Schoenberg,
First topic: clustering and pattern recognition Marc Sobel.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
1. Difficulty of point process model evaluation. 2. RELM and CSEP. 3. Numerical summaries (L-test, N-test, etc.). 4. Functional summaries (error diagrams,
What Does the Likelihood Principle Say About Statistical Process Control? Gemai Chen, University of Calgary Canada July 10, 2006.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
INTRODUCTION TO Machine Learning 3rd Edition
Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed.
Effect of the Reference Set on Frequency Inference Donald A. Pierce Radiation Effects Research Foundation, Japan Ruggero Bellio Udine University, Italy.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Logistic Regression Analysis Gerrit Rooks
1 Applications of space-time point processes in wildfire forecasting 1.Background 2.Problems with existing models (BI) 3.A separable point process model.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Review of statistical modeling and probability theory Alan Moses ML4bio.
1 1.Definitions & examples 2.Conditional intensity & Papangelou intensity 3.Models a) Renewal processes b) Poisson processes c) Cluster models d) Inhibition.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Estimating standard error using bootstrap
STATISTICAL INFERENCE PART I POINT ESTIMATION
Clustering (3) Center-based algorithms Fuzzy k-means
Latent Variables, Mixture Models and EM
"Did your model account for earthworms?" Rick Paik Schoenberg, UCLA
Rick Paik Schoenberg, UCLA Statistics
Rick Paik Schoenberg, UCLA Statistics
Probabilistic Models with Latent Variables
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Presentation transcript:

1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes

2 Maximum Likelihood Estimation: For a space-time point process N, the log-likelihood function is given by log L = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx. Why? x-x x x x---x t 1 t 2 t 3 t 4 t 5 t 6 T Consider the case where N is a Poisson process observed in time only. L= P(points at t 1,t 2,t 3,…, t n, and no others in [0,T]) = P(pt at t 1 ) x P(pt at t 2 ) x … x P(pt at t n ) x P{no others in [0,T]} = (t 1 ) x (t 2 ) x … x (t n ) x P{no others in [0,t 1 )} x … x P{no others in [t n,T]} = (t 1 ) x … x (t n ) x exp{-∫ o t 1 (u) du} x … x exp{-∫ t n T (u) du} =  (t i ) x exp{-∫ o T (u) du}. So log L = ∑ log (t i ) - ∫ o T (u) du.

3 log L = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx. Here (t,x) is the conditional intensity. The case where the Papangelou intensity p (t,x) is used instead is called the pseudo-likelihood. When depends on parameters , so does L: log L(  ) = ∫ o T ∫ S log (t,x;  ) dN - ∫ o T ∫ S (t,x;  ) dt dx. Maximum Likelihood Estimation (MLE): Find the value of  that maximizes L(  ). (In practice, by finding the value that minimizes -log L(  ).) Example: stationary Poisson process with rate (t,x) = . log L(  ) = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx = n log(  -  ST d logL(  )/d  = n/  - ST which = 0 when  = n/ST. o S o T

4 Under somewhat general conditions,  is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001). Important counter-examples: (  ) =  +  t, (  ) = exp{  +  t}, (for  < 0). Other problems with MLE: Bias can be substantial. e.g. Matern I,  = min{||(x i, y i ) - (x j, y j )||}. Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc. Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.

5 K-function & Variations: Usual K-function, for spatial processes only (Ripley 1978): Assume the null hypothesis that N is stationary Poisson, with constant rate  K(h) = 1/  Expected # of pts within distance h of a given pt] Estimated via 1/ [∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n], where = n/S. Under the null hypothesis, K(h) = 1/  h 2 ] =  h 2. Higher K indicates more clustering; lower K indicates inhibition. Centered version, L(h) = √[K(h)/  - h. L > 0 indicates clustering, L < 0 indicates inhibition. Version based on nearest-neighbors only (J-function): J(h) ~ 1/  Pr{nearest neighbor of a given point is within distance h}

6

7 K-function & Variations: Weighted K-function ( Baddeley, Møller and Waagepetersen 2002 ; Veen 2006): Null hypothesis is a general conditional intensity (x,y). Weight each point (x i,y j ) by a factor of w i = (x i,y i ) -1. Estimated K-function is S ∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2 ; K w (h) ^ = S ∑∑ i≠j w i w j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2, where w i = (x i,y i ) -1. Asymptotically normal, under certain regularity conditions (Veen 2006). Centered version: L w (h) ^ = √[K w (h) ^ /π] - h, for R 2. L w (h) ^ > 0 indicates more weight in clusters within h than expected according to the model for (x,y). ==> (x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data. L w (h) ^ (x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).

8 These statistics can be used for estimation as well as testing: Given a class of models with parameter  to be estimated, choose the parameter  that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h;  ) [Guan 2007]. Similarly for other statistics such as K w (h) ^ [Veen 2006].

9

10 Model  (x,y;  ) =  (x,y) + (1-  ). h (km)

11 3) How else can we tell how well a given pp model fits? a) Likelihood statistics (LR, AIC, BIC). [For instance, AIC = -2 logL(  ) + 2p. Overly simplistic. Not graphical. b) Other tests TTT, Khamaladze (Andersen et al. 1993) Cramèr-von Mises, K-S test (Heinrich 91) Higher moment and spectral tests (Davies 77) Integrated residual plots (Baddeley et al. 2005): Plot: N(A i ) - C(A i ), over various areas A i. Useful for the mean, but questionable power. Fine-scale interactions not inspected. d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)

12 For multi-dimensional point processes: ^ Stretch/compress one dimension according to, keeping others fixed. ^ Transformed process is Poisson with rate 1 iff. = almost everywhere.

13 Problems with multi-dimensional residual analysis: * Irregular boundary, plotting. * Points in transformed space can be hard to interpret. * For highly clustered processes: boundary effects, loss of power. Possible solutions: truncation, horizontal rescaling. Thinning: Suppose inf (x i,y i ) = b. Keep each point (x i,y i ) in original dataset with probability b / (x i,y i ). Obtain a different residual process, same scale as data. Can repeat many times --> many Poisson processes (but not quite independent!)

14

15

16

17

18

19

20

21 Conditional intensity (t, x 1, …, x k ;  ): [e.g. x 1 =location, x 2 = size.] Separability for Point Processes: Say is multiplicative in mark x j if (t, x 1, …, x k ;  ) =  0 j (t, x j ;  j ) -j (t, x -j ;  -j ), where x -j = (x 1,…,x j-1, x j+1,…,x k ), same for  -j and -j If  ~ is multiplicative in x j ^ and if one of these holds, then  j, the partial MLE,  =  j, the MLE: S -j (t, x -j ;  -j ) d  -j = , for all  -j. S j (t, x j ;  j ) d  j = , for all  j. ^ ~ S j (t, x;  ) d  = S j (t, x j ;  j ) d  j = , for all .

22 Individual Covariates: Suppose  is multiplicative, and j (t,x j ;  j ) = f 1 [X(t,x j );  1 ] f 2 [Y(t,x j );  2 ]. If H(x,y) = H 1 (x) H 2 (y), where for empirical d.f.s H,H 1,H 2, and if the log-likelihood is differentiable w.r.t.  1, then the partial MLE of  1 = MLE of  1. (Note: not true for additive models!) Suppose is multiplicative and the jth component is additive: j (t,x j ;  j ) = f 1 [X(t,x j );  1 ] + f 2 [Y(t,x j );  2 ]. If f 1 and f 2 are continuous and f 2 is small: S f 2 (Y;  2 ) 2 / f 1 (X; ~  1 ) d  p 0], then the partial MLE  1 is consistent.

23 Impact Model building. Model evaluation / dimension reduction. Excluded variables.

24 Model Construction For example, for Los Angeles County wildfires: Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date Tapered Pareto size distribution f, smooth spatial background . (t,x,a) =  1 exp{  2 R(t) +  3 W(t) +  4 P(t)+  5 A(t;60) +  6 T(t) +  7 [  8 - D(t)] 2 }  (x) g(a). Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.

25 r = 0.16 (sq m)

26 Testing separability in marked point processes: Construct non-separable and separable kernel estimates of by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004) May also consider: S 5 = mean absolute difference at the observed points. S 6 = maximum absolute difference at observed points.

27

28 S 3 seems to be most powerful for large-scale non-separability:

29 However, S 3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:

30 For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:

31 Los Angeles County Wildfire Example:

32 Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:

33 Summary: 1) MLE: maximize log L(  ) = ∫ o T ∫ S log (t,x;  ) dN - ∫ o T ∫ S (t,x;  ) dt dx. 2) Estimated K-function: S ∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2 ; L(h) ^ = √[K w (h) ^ /π] - h K w (h) ^ = S ∑∑ i≠j w i w j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2, where w i = (x i,y i ) -1. 3) Residuals: Integrated residuals: [N(A i ) - C(A i )]. Rescaled residuals [stretch one coordinate according to ∫ (x,y) d  ]. Thinned residuals [keep each pt with prob. b / (x i,y i ) ]. 4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs. 5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate. Next time: applications to models for earthquakes and wildfires.