1 Some Current Problems in Point Process Research: 1. Prototype point processes 2. Non-simple point processes 3. Voronoi diagrams
2 Global Earthquake Data: Local e.q. catalogs tend to have problems, esp. missing data. 1977: Harvard (global) catalog created. Considered the most complete. Errors best understood. A collection of aftershock sequences: Harvard Catalog, 1/1/77 to 3/1/03 Shallow events only (depth < 70km) Mw 7.5 to 8.0 Aftershocks: Mw > 5.5, within 100km, days - 2 yrs. No Mw ≥ 7.5 within 200km in previous 2 yrs. No Mw ≥ 8.0 w/in 400km within 4 yrs (Molchan et al., 1997) 49 mainshocks, avg aftershocks, SD = 4.3.
3
4 1. Prototypes. Some motivating questions: A) What does a typical aftershock sequence look like? B) How can we tell if a particular sequence is an outlier? C) How can we group aftershock sequences into clusters based on the similarity of their features?
5 A) What does the typical aftershock sequence look like? e.g. What is typically observed after an eq of M w ? Modified Omori: K/(t+c) p
6 A) What does the typical aftershock sequence look like? e.g. What is typically observed after an eq of M w ? Modified Omori: K/(t+c) p May desire a prototype: a point pattern of min. distance to those observed. Requires distance between point patterns.
7
8 Victor-Purpura (1997) distance Given two point patterns: Match each point in A to the nearest point in B and record the horizontal distance moved (penalty p m =1 per unit moved) Delete excess points (with penalty p a )
9 Considerations
10 Calculating the distance between two point patterns: Reduces to which points are kept and which are removed. Mutual nearest neighbors within 2p a /p m are automatically kept. A point > 2p a /p m from its nearest neighbor is automatically removed.
11 Prototype Point Pattern Defined such that the sum of distances from the prototype to all observed point patterns in the data set is minimized. Represents a “typical” observation.
12 Some properties of the prototype Prototype is not necessarily unique. There exists a prototype pattern composed entirely of points in the dataset. In fact, a prototype can be found such that each point it contains is the median of its associated points in distance calculations.
13 Uses: Data summary, outlier identification, clustering, …
14
15
16
17 Clusters of aftershock sequences Distance of each aftershock sequence to the prototypes for time and magnitude
18 Cluster Map
19 With multidimensional point processes (time, m w, location): No simple sequential pairing. Mutual nearest neighbors are kept. There exists a prototype consisting only of points whose coordinates are medians of coordinates of associated pts.
20
21
22
23 2. Non-simple point processes. Simple point processes are characterized by the conditional intensity, (t). But what about non-simple point processes?
24 Two types of simplicity, for multi-dimensional point processes: 1) Completely simple: No two points overlap exactly: the same triple (t,x,a). 2) Simple ground process: No two points at exactly the same time. Multi-dimensional & marked point processes are only uniquely characterized by the conditional intensity (t,x,m) if they have simple ground process. Poisson process with intensity = 2: Poisson process with intensity = 1, but with each point doubled: Both have the same conditional intensity! ( = 2)
25 Multi-dimensional & marked point processes are only uniquely characterized by the conditional intensity (t,x,m) if they have simple ground process. How can one model non-simple point processes? t m1m2m1m2 m1m2m1m2 2 independent Poisson processes A Poisson process with = 1, each with intensity = 1. and an exact copy. (t, m 1 ) = (t, m 2 ) = 1. (t, m 1 ) = (t, m 2 ) = 1. t
26 How can one model non-simple marked point processes? t m1m2m3m1m2m3 m 1 m 2 m 3 m 12 m 13 m 23 Consider an extended mark space, consisting of pairs (and triplets, quadruplets, etc.) of marks: Z = {m 1, m 2, m 3, m 12 ={m 1, m 2 }, m 13, m 23 }. The resulting point process will have simple ground process. (Daley & Vere-Jones 1988, p208) The conditional intensity ’(t, m) of the resulting process can be written in terms of the original conditional intensity (t, m): For instance, (t, m 2 ) = ’(t,m 2 ) + ’(t, m 12 ) + ’(t, m 23 ). Can have models where ’(t,m ij ) = ’(t,m i ) ’(t,m j ). (Schoenberg 2005) t
27 3. Voronoi Tessellations. Given a collection of points p 1, p 2, …, divide the space into cells C 1, C 2, …, such that cell C i consists of all locations closer to p i than to any of the other points p j. C i = {x : ||x - p i || < ||x - p j || for all j}.
28
29
30
31
32
33 Southern California Earthquake Center (SCEC) data Lat: (733 km) Lon: (-114, -112) (556 km) Time: (1/1/ /6/2004) Mag: Mo > 2. n = Errors in the catalog: Missing earthquakes, esp in clusters. Discrimination problems. Location & projection errors.
34 Many models for the cell characteristics were fitted: Frechet, gamma, lognormal, exponential, Pareto, tapered Pareto. Pareto: F(x) = 1 - (a/x) . Tapered Pareto: F(x) = 1 - (a/x) e (a-x)/ .
35 Q-Q plots for the tapered Pareto: F(x) = 1 - (a/x) e (a-x)/ . Cell areaCell perimeter
36 Summary and Open Questions: 1)Prototypes may be useful data summaries for point processes, and for clustering, identifying outliers, etc. Prototypes for particular models? Applications? 2)Non-simple point processes can be viewed as simple on an extended mark space. More non-simple models? Applications? 3)Cell sizes in Voronoi tessellations of earthquake data seem to be tapered Pareto distribution (like many other features of earthquakes). Why? What is the theoretical cell size distribution for a particular model? Other applications?