Download presentation
Presentation is loading. Please wait.
1
Improvement of Likelihood Model Testing
Tetsuto Himeno (NIPR) and Kazuyoshi Nanjo (Tokyo University) 1-2 (Monday-Tuesday) November 2010 “Earthquake Forecast Systems Based on Seismicity of Japan: Toward Constructing Base-line Models of Earthquake Forecasting” Kyoto University, Uji Campus, Uji Obaku Plaza
2
The purpose of this study
When we evaluate the validity of an earthquake forecasting model, the L-, N-, S- and R-test are often used. However, there are some problems in these test. For example, the score on the L-test is better even if the score on the N-test is bad. Therefore, we investigate the problem of the L-test and suggest some new test. We compare these test by numerical simulations and show the validity of the suggested methods.
3
The earthquake forecasting model
In earthquake forecasting models, the consideration region is divided as some small regions (bins). In the each bin, the expectation of the number of the earthquakes is represented in the target . When we obtain the observation number of the earthquakes, we examine whether the observation fits the forecasting models.
4
L-test Let {b1, b2, ・・・, bm}, {λ1, λ2, ・・・, λm} and {n1, n2, ・・・, nm}
be bins in the target region, the expectations of number of earthquakes and the observations of number of earthquakes, respectively. To evaluate this model, let In the L-test, we obtain the empirical distribution of the L-test using the random samples generated by a multivariate Poisson distribution. Using the empirical distribution, we obtain the p-value of L.
5
The problem of the L-test (1)
In the L-test, when the p-value is high, we consider that the forecasting model is better. But mode (i.e. maximal point) of the L-test is {[λ1], [λ2], ・・・, [λm]} where [x] denotes the maximum integer not larger than x. For example, if λi = 0.9, then ni = 0 is the maximal point. When the number of bins is large, the expectation become lower than one in many bins. So, ni = 0 is the maximal point in many bins. Since the p-value is high in the case such that the total number of the occurrence of earthquakes is lower than the prediction, it is not better.
6
The suggested method (1)
To correct the maximal point, we suggest the test as follow: The maximal point of L1 is {[λ1+0.5], [λ2+0.5], ・・・, [λm+0.5]} , that is, the value rounded λi off. Therefore, the maximal point is the nearest to {λ1, λ2, ・・・, λm}. To compare the prediction with the observation, this is natural.
7
The property of the L-test
The L-test is transformed as The probability of the poisson distribution for the total number of earthquakes The probability of the multinomial distribution for the position of earthquakes where n=n1+n2+・・・+nm and λ=λ1+λ2+・・・+λm.
8
The problem of the L-test (2)
About the L-test, if the number of bins, then λi /λ becomes very small. So, the probability of the multinomial distribution Becomes very small, too. Therefore, when the total number of earthquakes is large, the value of L is small. We need the normalization for the multinomial distribution.
9
The suggested method (2)
We do not use the multinomial distribution and approximate the density function for position of earthquakes. We define the test as following: where B=b1∪b2∪・・・∪bm and |X| denotes the area of X. Let f(x,y) be the density function for position of earthquakes. Then under the condition |B|=1.
10
The suggested method (3)
For the normalization, we use the expectation under the spacial distribution {λ1, λ2, ・・・, λm}. Then we obtain where
11
The suggested method (4)
L2 and L3 may be better under the assumption that spacial distribution is {λ1, λ2, ・・・, λm}. But if earthquakes occur intensively in points with high expectation, then L2 and L3 may not be better. So, we normalize as follows: where Cn denotes the maximum of the probability of the multinomial distribution with the number of the sample n and expectation {λ1, λ2, ・・・, λm}.
12
The suggested method (5)
For L4, it is difficult that Cn is derived without the numerical iteration. So, we approximate Cn as where Γ(x) denotes the gamma function. Using Cn, we define
13
The numerical simulation
In this simulation, we derive the p-values of L-test and suggested test for four forecasting models using four observation data. Four observation data are (i) There is no earthquake. (ii) The total number of earthquakes is about the expectation under the spacial distribution of the forecasting model. (iii) The total number of earthquakes is 50 larger than the expectation under the spacial distribution of the forecasting model. (iv) The earthquakes occur intensively in points with high expectation under the total number is equal to the expectation.
14
The situation of the numerical simulation
We assume that the considering area is [0,20]×[0,20] and bins is divided by 0.1×0.1. The number of the simulation is 10,000 to derive the empirical distribution of test function. For the observation data (ii)~(iv), we obtain the expectation of p-value using the 1,000 times simulation data.
15
Simulation (1) The forecasting model The expectation of function is
the observation (iv) L L1 L2 L3 L4 L5 (i) 1.000 0.000 0.875 0.030 (ii) 0.491 0.495 0.526 0.522 0.506 0.520 (iii) 0.845 0.028 (iv) 0.989 0.496 The expectation of total number of earthquakes: 30.03
16
Simulation (2) The forecasting model The expectation of function is
the observation (iv) L L1 L2 L3 L4 L5 (i) 1.000 0.012 0.000 (ii) 0.507 0.516 0.512 0.514 (iii) 0.005 0.898 0.386 0.135 (iv) 0.999 The expectation of total number of earthquakes: 150.16
17
Simulation (3) The forecasting model The expectation of function is
the observation (iv) L L1 L2 L3 L4 L5 (i) 1.000 0.980 0.000 (ii) 0.513 0.535 0.536 0.529 0.525 (iii) 0.005 0.054 0.420 0.300 0.058 0.064 (iv) The expectation of total number of earthquakes: 255.49
18
summary ・L1 cannot reject the data when the number of earthquakes
is less than the expectations. ・When earthquakes occurs slightly intensively in points with high expectation, the model cannot be rejected by all test except L2. ・ When earthquakes occurs considerably intensively in points with high expectation, the model cannot be rejected by L2 and L3 such that there are no combination term. ・L4 and L5 are mostly equal. However L4 tends to be bad when the total expectation of earthquakes is low. ・Since L5 is most suitable in most models and most observations, we recommend L5.
19
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.