Download presentation
Presentation is loading. Please wait.
Published byGrant Baron Modified over 9 years ago
1
Deepayan ChakrabartiCIKM 20021 F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos
2
Deepayan ChakrabartiCIKM 20022 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
3
Deepayan ChakrabartiCIKM 20023 General Problem Definition Given a time series {x t }, predict its future course, that is, x t+1, x t+2,... Time Value ?
4
Deepayan ChakrabartiCIKM 20024 Motivation Financial data analysis Physiological data, elderly care Weather, environmental studies Traditional fields Sensor Networks (MEMS, “SmartDust”) Long / “infinite” series No human intervention “black box”
5
Deepayan ChakrabartiCIKM 20025 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
6
Deepayan ChakrabartiCIKM 20026 How to forecast? ARIMA but linearity assumption Neural Networks but large number of parameters and long training times [Wan/1993, Mozer/1993] Hidden Markov Models O(N 2 ) in number of nodes N; also fixing N is a problem [Ge+/2000] Lag Plots
7
Deepayan ChakrabartiCIKM 20027 Lag Plots x t-1 xtxtxtxt 4-NN New Point Interpolate these… To get the final prediction Q0: Interpolation Method Q1: Lag = ? Q2: K = ?
8
Deepayan ChakrabartiCIKM 20028 Q0: Interpolation Using SVD (state of the art) [Sauer/1993] X t-1 xtxt
9
Deepayan ChakrabartiCIKM 20029 Why Lag Plots? Based on the “Takens’ Theorem” [Takens/1981] which says that delay vectors can be used for predictive purposes
10
Deepayan ChakrabartiCIKM 200210 Inside Theory Example: Lotka-Volterra equations ΔH/Δt = rH – aH*P ΔP/Δt = bH*P – mP H is density of prey P is density of predators Suppose only H(t) is observed. Internal state is (H,P). Extra
11
Deepayan ChakrabartiCIKM 200211 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
12
Deepayan ChakrabartiCIKM 200212 Problem at hand Given {x 1, x 2, …, x N } Automatically set parameters - L(opt) (from Q1) - k(opt) (from Q2) in Linear time on N to minimise Normalized Mean Squared Error (NMSE) of forecasting
13
Deepayan ChakrabartiCIKM 200213 Previous work/Alternatives Manual Setting : BUT infeasible [Sauer/1992] CrossValidation : BUT Slow; leave-one- out crossvalidation ~ O(N 2 logN) or more “False Nearest Neighbors” : BUT Unstable [Abarbanel/1996]
14
Deepayan ChakrabartiCIKM 200214 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
15
Deepayan ChakrabartiCIKM 200215 Intuition X(t-1) X(t) The Logistic Parabola x t = ax t-1 (1-x t-1 ) + noise time x(t) Intrinsic Dimensionality ≈ Degrees of Freedom ≈ Information about X t given X t-1
16
CIKM 200216 Intuition x(t-1) x(t) x(t-2) x(t) x(t-2) x(t-1) x(t)
17
Deepayan ChakrabartiCIKM 200217 Intuition To find L(opt): Go further back in time (ie., consider X t-2, X t-3 and so on) Till there is no more information gained about X t
18
Deepayan ChakrabartiCIKM 200218 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
19
Deepayan ChakrabartiCIKM 200219 Fractal Dimensions FD = intrinsic dimensionality “Embedding” dimensionality = 3 Intrinsic dimensionality = 1
20
Deepayan ChakrabartiCIKM 200220 Fractal Dimensions FD = intrinsic dimensionality [Belussi/1995] log(r) log( # pairs) Points to note: FD can be a non-integer There are fast methods to compute it
21
Deepayan ChakrabartiCIKM 200221 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
22
Deepayan ChakrabartiCIKM 200222 Q1: Finding L(opt) Use Fractal Dimensions to find the optimal lag length L(opt) Lag (L) Fractal Dimension epsilon L(opt) f
23
Deepayan ChakrabartiCIKM 200223 Q2: Finding k(opt) To find k(opt) Conjecture: k(opt) ~ O(f) We choose k(opt) = 2*f + 1
24
Deepayan ChakrabartiCIKM 200224 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
25
Deepayan ChakrabartiCIKM 200225 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] Time Value
26
Deepayan ChakrabartiCIKM 200226 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] LORENZ: Models convection currents in the air Time Value
27
CIKM 200227 Datasets Error NMSE = ∑(predicted-true) 2 /σ 2 Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] LORENZ: Models convection currents in the air LASER: fluctuations in a Laser over time (from the Santa Fe Time Series Competition, 1992) Time Value
28
Deepayan ChakrabartiCIKM 200228 Logistic Parabola FD vs L plot flattens out L(opt) = 1 Timesteps Value Lag FD
29
Deepayan ChakrabartiCIKM 200229 Logistic Parabola Timesteps Value Our Prediction from here
30
Deepayan ChakrabartiCIKM 200230 Logistic Parabola Timesteps Value Comparison of prediction to correct values
31
Deepayan ChakrabartiCIKM 200231 Logistic Parabola Our L(opt) = 1, which exactly minimizes NMSE Lag NMSE FD
32
Deepayan ChakrabartiCIKM 200232 LORENZ L(opt) = 5 Timesteps Value Lag FD
33
Deepayan ChakrabartiCIKM 200233 LORENZ Value Timesteps Our Prediction from here
34
Deepayan ChakrabartiCIKM 200234 LORENZ Timesteps Value Comparison of prediction to correct values
35
Deepayan ChakrabartiCIKM 200235 LORENZ L(opt) = 5 Also NMSE is optimal at Lag = 5 Lag NMSE FD
36
Deepayan ChakrabartiCIKM 200236 Laser L(opt) = 7 Timesteps Value Lag FD
37
Deepayan ChakrabartiCIKM 200237 Laser Timesteps Value Our Prediction starts here
38
Deepayan ChakrabartiCIKM 200238 Laser Timesteps Value Comparison of prediction to correct values
39
Deepayan ChakrabartiCIKM 200239 Laser L(opt) = 7 Corresponding NMSE is close to optimal Lag NMSE FD
40
Deepayan ChakrabartiCIKM 200240 Speed and Scalability Preprocessing is linear in N Proportional to time taken to calculate FD
41
Deepayan ChakrabartiCIKM 200241 Outline Introduction/Motivation Survey and Lag Plots Exact Problem Formulation Proposed Method Fractal Dimensions Background Our method Results Conclusions
42
Deepayan ChakrabartiCIKM 200242 Conclusions Our Method: Automatically set parameters L(opt) (answers Q1) k(opt) (answers Q2) In linear time on N
43
Deepayan ChakrabartiCIKM 200243 Conclusions Black-box non-linear time series forecasting Fractal Dimensions give a fast, automated method to set all parameters So, given any time series, we can automatically build a prediction system Useful in a sensor network setting
44
Deepayan ChakrabartiCIKM 200244 Snapshot http://snapdragon.cald.cs.cmu.edu/TSP Extra
45
Deepayan ChakrabartiCIKM 200245 Future Work Feature Selection Multi-sequence prediction Extra
46
Deepayan ChakrabartiCIKM 200246 Discussion – Some other problems How to forecast? x 1, x 2, …, x N L(opt) k(opt) How to find the k(opt) nearest neighbors quickly? Given: Extra
47
Deepayan ChakrabartiCIKM 200247 Motivation Forecasting also allows us to Find outliers anything that doesn’t match our prediction! Find patterns if different circumstances lead to similar predictions, they may be related. Extra
48
Deepayan ChakrabartiCIKM 200248 Motivation (Examples) EEGs : Patterns of electromagnetic impulses in the brain Intensity variations of white dwarf stars Highway usage over time Traditional Sensors “Active Disks” for forecasting / prefetching / buffering “Smart House” sensors monitor situation in a house Volcano monitoring Extra
49
Deepayan ChakrabartiCIKM 200249 General Method {x t-1, …, x t-L(opt) } and corresponding prediction x t Store all the delay vectors {x t-1, …, x t-L(opt) } and corresponding prediction x t X t-1 xtxt Find the latest delay vector L(opt) = ? Find nearest neighbors K(opt) = ? Interpolate Extra
50
Deepayan ChakrabartiCIKM 200250 Intuition The FD vs L plot does flatten out L(opt) = 1 Lag Fractal dimension Extra
51
Deepayan ChakrabartiCIKM 200251 Inside Theory Internal state may be unobserved But the delay vector space is a faithful reconstruction of the internal system state So prediction in delay vector space is as good as prediction in state space Extra
52
Deepayan ChakrabartiCIKM 200252 Fractal Dimensions Many real-world datasets have fractional intrinsic dimension There exist fast (O(N)) methods to calculate the fractal dimension of a cloud of points [Belussi/1995] Extra
53
Deepayan ChakrabartiCIKM 200253 Speed and Scalability Preprocessing varies as L(opt) 2 Extra
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.