Presentation is loading. Please wait.

Presentation is loading. Please wait.

STIFF: A Forecasting Framework for Spatio-Temporal Data Zhigang Li, Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Similar presentations


Presentation on theme: "STIFF: A Forecasting Framework for Spatio-Temporal Data Zhigang Li, Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist."— Presentation transcript:

1 STIFF: A Forecasting Framework for Spatio-Temporal Data Zhigang Li, Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas USA

2 May 6, 2002Li & Dunham, PAKDD2 Our goal  In this paper, we present a novel forecasting framework for spatio-temporal data, in which not only spatial but also temporal characteristics of the data are considered to obtain a more appropriate result.

3 May 6, 2002Li & Dunham, PAKDD3 Presentation Outline  Motivation  Prior Research  Our Approach: STIFF Combining two approaches to achieve better results: Time Series Analysis and ANNs  Performance  Future Work

4 May 6, 2002Li & Dunham, PAKDD4 Why  There are many application fields which require spatio-temporal forecasting:  river hydrology, biological patterns, housing price research, rainfall distribution, waste monitoring, fishery, hotel pickup rate, etc.  In spatio-temporal forecasting, both spatial and temporal properties, as well as their mutual correlation, are taken into account.

5 May 6, 2002Li & Dunham, PAKDD5 What work has been done  [ Jothityangkoon, Sivapalan, and Viney, 2000]  Rainfall forecasting  Hidden Markov Model  De-aggregate high level to lower level  Large error  [Pokrajac and Obradovic,2001]  Current event assumed to be impacted only by immediate temporal ancestors.

6 May 6, 2002Li & Dunham, PAKDD6  [Cressie and Majure,1997]  Model livestock waste in a river basin  Condensed time into a “three day area of influence”  “large variation of the predicted values ”.  [Deutsch etal,1986]; [Kelly etal,1998]; [Pfeifer etal,1990]  Extended time series analysis with a spatial correlation from a simple distance matrix.  It is too arbitrary to just rely upon the pure distance measurement. More related research

7 May 6, 2002Li & Dunham, PAKDD7 Flood Forecasting (Our Motivating Application)  Catchment  Many different types of sensors  Predict at one sensor location  Water level or Flow rate  May not be interested in actual prediction of value

8 May 6, 2002Li & Dunham, PAKDD8 Our approach : Problem definition  Δ={α 0, α 1, α 2, … α n } is the research field, composed of n + 1 spatially separated subcomponents, named by α i accordingly.  WLOG, α 0 is assumed the target place where forecasting is about to be carried out.  For each α i in Δ, there are j observations with equal time intervals between consecutive ones, denoted by Л i ={α i1, α i2, α i3, … α ij }.

9 May 6, 2002Li & Dunham, PAKDD9 Problem definition (Cont.) - Given Δ={α 0, α 1, α 2, … α n }, Л={Л 1, Л 2, …Л n }, the length of observations j and the look-ahead steps of ι, we are expected to find an as good as possible forecasting relationship ƒ that is defined as follows.

10 May 6, 2002Li & Dunham, PAKDD10 Our approach : Algorithm sketch 1) Describe the forecasting problem according the problem definition.  Build a time series (ARIMA) model for each α i. Name the forecasting from α 0 time series model as ƒ T. - Construct and train an ANN to capture the spatial correlation and influence over the target subcomponent α 0. Name the forecasting from the neural network as ƒ S. - Combine ƒ T and ƒ S via a statistical regression mechanism.

11 May 6, 2002Li & Dunham, PAKDD11 Time Series Data Transformation  Convert non-stationary to stationary to prevent skewness as much as possible.  Box and Cox proposed a transformation family, namely, Box-Cox transformation:  The key is to determine the right value for λ so as to find the appropriate transformation. For example, when λ = 0 or.5 the transformation is in fact log or square root accordingly. But how?

12 May 6, 2002Li & Dunham, PAKDD12 Data transformation (cont’d)  Box and Cox proposed a large-sample maximum- likelihood approach.  Wei proposed to use the λ that minimizes  The former requires much computation while the latter one may incur some problems for it does not consider the difference compared to the real observation.  We therefore propose the following way to determine λ.

13 May 6, 2002Li & Dunham, PAKDD13 Time series Model  A time series model is chosen as it has the proven capability of describing and capturing the temporal dependency and relationship.  Our work focused on the ARIMA technique which can be embodied in the following formula.  And roughly speaking, the building process can be divided into three main steps. They are - Model identification - Parameter estimation - Diagnostic checking

14 May 6, 2002Li & Dunham, PAKDD14 Find the spatial influence  Normally it is much harder to find than its temporal counterpart in the problem.  No precise way to convert from the spatial measurement to the value it may change.  Time is only 1 dimension while space is 3 (or 2) dimensions.  A simple “distance” measure is not enough, other factors are important.

15 May 6, 2002Li & Dunham, PAKDD15 Artificial Neural Network (ANN)  Why is ANN used for finding spatial influence?  Itself a “black-box” and non-linear technology used to find the hidden pattern.  Like human brain, it can self-adjust and learn automatically even if the problem is not defined very well.  Practice proves its usefulness  [See,1997] found ANN was especially useful in “… situations where the underlying physical relationships are not fully understood …”

16 May 6, 2002Li & Dunham, PAKDD16 ANN Construction  Simple 3-layer back-propagation MLP  One input node for each sensor value except α 0.  Actual input shifted by predicted time lag.  The hidden layer has a certain number of neurons that have to be decided by experiment.  The output layer has only one neuron that corresponds to the target subcomponent α 0.  We also employ a kind of pruning strategy to achieve the most simplicity of ANN structure without harming the efficacy much.

17 May 6, 2002Li & Dunham, PAKDD17 Integrate the two forecasts  We have two forecasts so far at the target subcomponent α 0. One is ƒ T, from the time series model, and the other is ƒ S, from ANN. We may - Either dynamically select one from the two as the current forecast; - Or fuse them together since they contribute to the overall forecasting from two different aspects. (That’s what we take in the paper.)  The two forecasts are integrated via a very simple linear regression mechanism. Of course other more advanced alternatives can be used instead for better results.

18 May 6, 2002Li & Dunham, PAKDD18 A case study (National River Flow Archive – Great Britain)  Here we are going to present a practical case study to demonstrate how the framework works.  We will conduct the spatio-temporal forecasting at the outlet gauging station 28010 regarding the river water flow rate (m 3 /s). The basin is shown as follows.  The target station is 28010 while its siblings are lying upstream.  Derwent Catchment  Daily mean flow values

19 May 6, 2002Li & Dunham, PAKDD19 Data transformation  Checking the water flow rate data at station 28010 tells us the data is not very stable. The abrupt change is obvious and present roughly about 25% of the whole time.  We therefore employ the data transformation first according to the proposed approach discussed before.  We empirically vary the value of λ from –1.0 to 1.0 with the step of.1. It turns out λ = 0.0 is the best (relatively). In other words, we will log-transform the original water flow rate data.

20 May 6, 2002Li & Dunham, PAKDD20 Actual Flow at Derwent

21 May 6, 2002Li & Dunham, PAKDD21 Case Study ANN  6 input nodes  1 output node  6 chosen as number of hidden nodes based on experimentation  Number of links pruned based on river topology  Lag time used for input based on expected flow lag time

22 May 6, 2002Li & Dunham, PAKDD22 Building models  Following the framework specification, we then build a time series model based upon the dataset collected from each gauging station.  An ANN is constructed after that, with the spatially- induced pruning strategy applied to erase as many as possible unnecessary links while sacrificing little to the forecasting accuracy.  The final overall spatio-temporal forecasting is generated then following this simple regression:

23 May 6, 2002Li & Dunham, PAKDD23 STIFF Model 70 23 43 11 55 48 fSfS fTfT x 1 f T + x 2 f S + C

24 May 6, 2002Li & Dunham, PAKDD24 Performance Analysis  Compared STIFF to pure time series (C TS ) and pure ANN (C ANN )  Data starting at 10/01/75  30, 60, 120 days  Normalized Absolute Ratio Error (NARE)

25 May 6, 2002Li & Dunham, PAKDD25 Forecasting result  The forecasting comparison result, measured in NARE, is outlined in the following table. The other two models, built to our best knowledge, are used to compare with STIFF.  Here “Over” means overestimation while “Under” for underestimation.

26 May 6, 2002Li & Dunham, PAKDD26 Result 30 Days

27 May 6, 2002Li & Dunham, PAKDD27 Conclusion  STIFF has a better forecast accuracy than the normal single time series model and ANN model, and more balanced (over vs. under estimation).  Compared with other related work, it avoids the oversimplification.  Does not have the large variation problem.  STIFF requires much human intervention and interpretation.  STIFF is promising for future research.

28 May 6, 2002Li & Dunham, PAKDD28 Future work  Extend to multivariate forecasting  Use more sophisticated fusing techniques  Test on more flood data  Compare to other techniques  Examine different ANN structures  So far, it can only deal with univariate forecasting.  Extend to other application domains  …..

29 May 6, 2002Li & Dunham, PAKDD29


Download ppt "STIFF: A Forecasting Framework for Spatio-Temporal Data Zhigang Li, Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist."

Similar presentations


Ads by Google