Download presentation
Presentation is loading. Please wait.
1
1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes
2
2 Maximum Likelihood Estimation: For a space-time point process N, the log-likelihood function is given by log L = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx. Why? -------------x-x-----------x-----------x----------x---x--------------------- 0 t 1 t 2 t 3 t 4 t 5 t 6 T Consider the case where N is a Poisson process observed in time only. L= P(points at t 1,t 2,t 3,…, t n, and no others in [0,T]) = P(pt at t 1 ) x P(pt at t 2 ) x … x P(pt at t n ) x P{no others in [0,T]} = (t 1 ) x (t 2 ) x … x (t n ) x P{no others in [0,t 1 )} x … x P{no others in [t n,T]} = (t 1 ) x … x (t n ) x exp{-∫ o t 1 (u) du} x … x exp{-∫ t n T (u) du} = (t i ) x exp{-∫ o T (u) du}. So log L = ∑ log (t i ) - ∫ o T (u) du.
3
3 log L = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx. Here (t,x) is the conditional intensity. The case where the Papangelou intensity p (t,x) is used instead is called the pseudo-likelihood. When depends on parameters , so does L: log L( ) = ∫ o T ∫ S log (t,x; ) dN - ∫ o T ∫ S (t,x; ) dt dx. Maximum Likelihood Estimation (MLE): Find the value of that maximizes L( ). (In practice, by finding the value that minimizes -log L( ).) Example: stationary Poisson process with rate (t,x) = . log L( ) = ∫ o T ∫ S log (t,x) dN - ∫ o T ∫ S (t,x) dt dx = n log( - ST d logL( )/d = n/ - ST which = 0 when = n/ST. o S o T
4
4 Under somewhat general conditions, is consistent, asymptotically normal, and asymptotically efficient (see e.g. Ogata 1978, Rathbun 1994). Similarly for pseudo-likelihoods (Baddeley 2001). Important counter-examples: ( ) = + t, ( ) = exp{ + t}, (for < 0). Other problems with MLE: Bias can be substantial. e.g. Matern I, = min{||(x i, y i ) - (x j, y j )||}. Optimization is tricky: requires initial parameter estimate and a tolerance threshold; can fail to converge; can converge to local maximum, etc. Nevertheless, MLE and pseudo-MLE are the only commonly-used methods for fitting point process models.
5
5 K-function & Variations: Usual K-function, for spatial processes only (Ripley 1978): Assume the null hypothesis that N is stationary Poisson, with constant rate K(h) = 1/ Expected # of pts within distance h of a given pt] Estimated via 1/ [∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n], where = n/S. Under the null hypothesis, K(h) = 1/ h 2 ] = h 2. Higher K indicates more clustering; lower K indicates inhibition. Centered version, L(h) = √[K(h)/ - h. L > 0 indicates clustering, L < 0 indicates inhibition. Version based on nearest-neighbors only (J-function): J(h) ~ 1/ Pr{nearest neighbor of a given point is within distance h}
6
6
7
7 K-function & Variations: Weighted K-function ( Baddeley, Møller and Waagepetersen 2002 ; Veen 2006): Null hypothesis is a general conditional intensity (x,y). Weight each point (x i,y j ) by a factor of w i = (x i,y i ) -1. Estimated K-function is S ∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2 ; K w (h) ^ = S ∑∑ i≠j w i w j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2, where w i = (x i,y i ) -1. Asymptotically normal, under certain regularity conditions (Veen 2006). Centered version: L w (h) ^ = √[K w (h) ^ /π] - h, for R 2. L w (h) ^ > 0 indicates more weight in clusters within h than expected according to the model for (x,y). ==> (x,y) is too low in clusters. That is, the model does not adequately capture clustering in the data. L w (h) ^ (x,y) is too high, for points within distance h. The model over-estimates the clustering in the data (or underestimates inhibition).
8
8 These statistics can be used for estimation as well as testing: Given a class of models with parameter to be estimated, choose the parameter that minimizes some distance between the observed estimate of K(h) and the theoretical function K(h; ) [Guan 2007]. Similarly for other statistics such as K w (h) ^ [Veen 2006].
9
9
10
10 Model (x,y; ) = (x,y) + (1- ). h (km)
11
11 3) How else can we tell how well a given pp model fits? a) Likelihood statistics (LR, AIC, BIC). [For instance, AIC = -2 logL( ) + 2p. Overly simplistic. Not graphical. b) Other tests TTT, Khamaladze (Andersen et al. 1993) Cramèr-von Mises, K-S test (Heinrich 91) Higher moment and spectral tests (Davies 77) Integrated residual plots (Baddeley et al. 2005): Plot: N(A i ) - C(A i ), over various areas A i. Useful for the mean, but questionable power. Fine-scale interactions not inspected. d) Rescaling, thinning (Meyer 1971; Schoenberg 1999, 2003)
12
12 For multi-dimensional point processes: ^ Stretch/compress one dimension according to, keeping others fixed. ^ Transformed process is Poisson with rate 1 iff. = almost everywhere.
13
13 Problems with multi-dimensional residual analysis: * Irregular boundary, plotting. * Points in transformed space can be hard to interpret. * For highly clustered processes: boundary effects, loss of power. Possible solutions: truncation, horizontal rescaling. Thinning: Suppose inf (x i,y i ) = b. Keep each point (x i,y i ) in original dataset with probability b / (x i,y i ). Obtain a different residual process, same scale as data. Can repeat many times --> many Poisson processes (but not quite independent!)
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21 Conditional intensity (t, x 1, …, x k ; ): [e.g. x 1 =location, x 2 = size.] Separability for Point Processes: Say is multiplicative in mark x j if (t, x 1, …, x k ; ) = 0 j (t, x j ; j ) -j (t, x -j ; -j ), where x -j = (x 1,…,x j-1, x j+1,…,x k ), same for -j and -j If ~ is multiplicative in x j ^ and if one of these holds, then j, the partial MLE, = j, the MLE: S -j (t, x -j ; -j ) d -j = , for all -j. S j (t, x j ; j ) d j = , for all j. ^ ~ S j (t, x; ) d = S j (t, x j ; j ) d j = , for all .
22
22 Individual Covariates: Suppose is multiplicative, and j (t,x j ; j ) = f 1 [X(t,x j ); 1 ] f 2 [Y(t,x j ); 2 ]. If H(x,y) = H 1 (x) H 2 (y), where for empirical d.f.s H,H 1,H 2, and if the log-likelihood is differentiable w.r.t. 1, then the partial MLE of 1 = MLE of 1. (Note: not true for additive models!) Suppose is multiplicative and the jth component is additive: j (t,x j ; j ) = f 1 [X(t,x j ); 1 ] + f 2 [Y(t,x j ); 2 ]. If f 1 and f 2 are continuous and f 2 is small: S f 2 (Y; 2 ) 2 / f 1 (X; ~ 1 ) d p 0], then the partial MLE 1 is consistent.
23
23 Impact Model building. Model evaluation / dimension reduction. Excluded variables.
24
24 Model Construction For example, for Los Angeles County wildfires: Relative Humidity, Windspeed, Precipitation, Aggregated rainfall over previous 60 days, Temperature, Date Tapered Pareto size distribution f, smooth spatial background . (t,x,a) = 1 exp{ 2 R(t) + 3 W(t) + 4 P(t)+ 5 A(t;60) + 6 T(t) + 7 [ 8 - D(t)] 2 } (x) g(a). Estimating each of these components separately might be somewhat reasonable, as a first attempt at least, if the interactions are not too extreme.
25
25 r = 0.16 (sq m)
26
26 Testing separability in marked point processes: Construct non-separable and separable kernel estimates of by smoothing over all coordinates simultaneously or separately. Then compare these two estimates: (Schoenberg 2004) May also consider: S 5 = mean absolute difference at the observed points. S 6 = maximum absolute difference at observed points.
27
27
28
28 S 3 seems to be most powerful for large-scale non-separability:
29
29 However, S 3 may not be ideal for Hawkes processes, and all these statistics are terrible for inhibition processes:
30
30 For Hawkes & inhibition processes, rescaling according to the separable estimate and then looking at the L-function seems much more powerful:
31
31 Los Angeles County Wildfire Example:
32
32 Statistics like S3 indicate separability, but the L-function after rescaling shows some clustering:
33
33 Summary: 1) MLE: maximize log L( ) = ∫ o T ∫ S log (t,x; ) dN - ∫ o T ∫ S (t,x; ) dt dx. 2) Estimated K-function: S ∑∑ i≠j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2 ; L(h) ^ = √[K w (h) ^ /π] - h K w (h) ^ = S ∑∑ i≠j w i w j I(|(x i,y i ) - (x j,y j )| ≤ h) / n 2, where w i = (x i,y i ) -1. 3) Residuals: Integrated residuals: [N(A i ) - C(A i )]. Rescaled residuals [stretch one coordinate according to ∫ (x,y) d ]. Thinned residuals [keep each pt with prob. b / (x i,y i ) ]. 4) Separability: when one coordinate can be estimated individually. Convenient, and sometimes results in estimates similar to global MLEs. 5) A separability test is and an alternative is L(h) after rescaling according to the separable kernel intensity estimate. Next time: applications to models for earthquakes and wildfires.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.