Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes.

Similar presentations


Presentation on theme: "1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes."— Presentation transcript:

1 1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes

2 2 Maximum Likelihood Estimation: For a space-time point process N, the log-likelihood function is given by log L = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx. Why? -------------x-x-----------x-----------x----------x---x--------------------- 0 t 1 t 2 t 3 t 4 t 5 t 6 T Consider the case where N is a Poisson process observed in time only. L= P(points at t 1,t 2,t 3,…, t n, and no others in [0,T]) = P(pt at t 1 ) x P(pt at t 2 ) x … x P(pt at t n ) x P{no others in [0,T]} = (t 1 ) x (t 2 ) x … x (t n ) x P{no others in [0,t 1 )} x … x P{no others in [t n,T]} = (t 1 ) x … x (t n ) x exp{-∫ o t 1 (u) du} x … x exp{-∫ t n T (u) du} =  (t i ) x exp{-∫ o T (u) du}. So log L = ∑ log (t i ) - ∫ o T (u) du.

3 3 log L = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx. Here (t,x) is the conditional intensity. The case where the Papangelou intensity p (t,x) is used instead is called the pseudo-likelihood. When depends on parameters , so does L: log L(  ) = ∫ o T ∫ S log (t,x;  ) dN - ∫ o T ∫ S (t,x;  ) dt dx. Maximum Likelihood Estimation (MLE): Find the value of  that maximizes L(  ). (In practice, by finding the value that minimizes -log L(  ).) Example: stationary Poisson process with rate (t,x) = . log L(  ) = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx = n log(  -  ST d logL(  )/d  = n/  - ST which = 0 when  = n/ST. o S o T

4 4 Under somewhat general conditions,  is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001). Important counter-examples: (  ) =  +  t, (  ) = exp{  +  t}, (for  < 0). Other problems with MLE: Bias can be substantial. e.g. Matern I,  = min{||(x i, y i ) - (x j, y j )||}. Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc. Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.

5 5 K-function & Variations: Usual K-function, for spatial processes only (Ripley 1978): Assume the null hypothesis that N is stationary Poisson, with constant rate  K(h) = 1/  Expected # of pts within distance h of a given pt] Estimated via 1/ [∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n], where = n/S. Under the null hypothesis, K(h) = 1/  h 2 ] =  h 2. Higher K indicates more clustering; lower K indicates inhibition. Centered version, L(h) = √[K(h)/  - h. L > 0 indicates clustering, L < 0 indicates inhibition. Version based on nearest-neighbors only (J-function): J(h) ~ 1/  Pr{nearest neighbor of a given point is within distance h}

6 6

7 7 K-function & Variations: Weighted K-function ( Baddeley, Møller and Waagepetersen 2002 ; Veen 2006): Null hypothesis is a general conditional intensity (x,y). Weight each point (x i,y j ) by a factor of w i = (x i,y i ) -1. Estimated K-function is S ∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2 ; K w (h) ^ = S ∑∑ i≠j w i w j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2, where w i = (x i,y i ) -1. Asymptotically normal, under certain regularity conditions (Veen 2006). Centered version: L w (h) ^ = √[K w (h) ^ /π] - h, for R 2. L w (h) ^ > 0 indicates more weight in clusters within h than expected according to the model for (x,y). ==> (x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data. L w (h) ^ (x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).

8 8 These statistics can be used for estimation as well as testing: Given a class of models with parameter  to be estimated, choose the parameter  that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h;  ) [Guan 2007]. Similarly for other statistics such as K w (h) ^ [Veen 2006].

9 9

10 10 Model  (x,y;  ) =  (x,y) + (1-  ). h (km)

11 11 3) How else can we tell how well a given pp model fits? a) Likelihood statistics (LR, AIC, BIC). [For instance, AIC = -2 logL(  ) + 2p. Overly simplistic. Not graphical. b) Other tests TTT, Khamaladze (Andersen et al. 1993) Cramèr-von Mises, K-S test (Heinrich 91) Higher moment and spectral tests (Davies 77) Integrated residual plots (Baddeley et al. 2005): Plot: N(A i ) - C(A i ), over various areas A i. Useful for the mean, but questionable power. Fine-scale interactions not inspected. d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)

12 12 For multi-dimensional point processes: ^ Stretch/compress one dimension according to, keeping others fixed. ^ Transformed process is Poisson with rate 1 iff. = almost everywhere.

13 13 Problems with multi-dimensional residual analysis: * Irregular boundary, plotting. * Points in transformed space can be hard to interpret. * For highly clustered processes: boundary effects, loss of power. Possible solutions: truncation, horizontal rescaling. Thinning: Suppose inf (x i,y i ) = b. Keep each point (x i,y i ) in original dataset with probability b / (x i,y i ). Obtain a different residual process, same scale as data. Can repeat many times --> many Poisson processes (but not quite independent!)

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21 Conditional intensity (t, x 1, …, x k ;  ): [e.g. x 1 =location, x 2 = size.] Separability for Point Processes: Say is multiplicative in mark x j if (t, x 1, …, x k ;  ) =  0 j (t, x j ;  j ) -j (t, x -j ;  -j ), where x -j = (x 1,…,x j-1, x j+1,…,x k ), same for  -j and -j If  ~ is multiplicative in x j ^ and if one of these holds, then  j, the partial MLE,  =  j, the MLE: S -j (t, x -j ;  -j ) d  -j = , for all  -j. S j (t, x j ;  j ) d  j = , for all  j. ^ ~ S j (t, x;  ) d  = S j (t, x j ;  j ) d  j = , for all .

22 22 Individual Covariates: Suppose  is multiplicative, and j (t,x j ;  j ) = f 1 [X(t,x j );  1 ] f 2 [Y(t,x j );  2 ]. If H(x,y) = H 1 (x) H 2 (y), where for empirical d.f.s H,H 1,H 2, and if the log-likelihood is differentiable w.r.t.  1, then the partial MLE of  1 = MLE of  1. (Note: not true for additive models!) Suppose is multiplicative and the jth component is additive: j (t,x j ;  j ) = f 1 [X(t,x j );  1 ] + f 2 [Y(t,x j );  2 ]. If f 1 and f 2 are continuous and f 2 is small: S f 2 (Y;  2 ) 2 / f 1 (X; ~  1 ) d  p 0], then the partial MLE  1 is consistent.

23 23 Impact Model building. Model evaluation / dimension reduction. Excluded variables.

24 24 Model Construction For example, for Los Angeles County wildfires: Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date Tapered Pareto size distribution f, smooth spatial background . (t,x,a) =  1 exp{  2 R(t) +  3 W(t) +  4 P(t)+  5 A(t;60) +  6 T(t) +  7 [  8 - D(t)] 2 }  (x) g(a). Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.

25 25 r = 0.16 (sq m)

26 26 Testing separability in marked point processes: Construct non-separable and separable kernel estimates of by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004) May also consider: S 5 = mean absolute difference at the observed points. S 6 = maximum absolute difference at observed points.

27 27

28 28 S 3 seems to be most powerful for large-scale non-separability:

29 29 However, S 3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:

30 30 For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:

31 31 Los Angeles County Wildfire Example:

32 32 Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:

33 33 Summary: 1) MLE: maximize log L(  ) = ∫ o T ∫ S log (t,x;  ) dN - ∫ o T ∫ S (t,x;  ) dt dx. 2) Estimated K-function: S ∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2 ; L(h) ^ = √[K w (h) ^ /π] - h K w (h) ^ = S ∑∑ i≠j w i w j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2, where w i = (x i,y i ) -1. 3) Residuals: Integrated residuals: [N(A i ) - C(A i )]. Rescaled residuals [stretch one coordinate according to ∫ (x,y) d  ]. Thinned residuals [keep each pt with prob. b / (x i,y i ) ]. 4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs. 5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate. Next time: applications to models for earthquakes and wildfires.


Download ppt "1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes."

Similar presentations


Ads by Google