June 2003Neural Computation for Time Series1 Neural Computation and Applications in Time Series and Signal Processing Georg Dorffner Dept. of Medical Cybernetics and Artificial Intelligence, University of Vienna And Austrian Research Institute for Artificial Intelligence
June 2003Neural Computation for Time Series2 Neural Computation Originally biologically motivated (information processing in the brain) Simple mathematical model of the neuron neural network Large number of simple „units“ Massively parallel (in theory) Complexity through the interplay of many simple elements Strong relationship to methods from statistics Suitable for pattern recognition
June 2003Neural Computation for Time Series3 A Unit Propagation rule: –Weighted sum –Euclidian distance Transfer function f: –Threshold fct. (McCulloch & Pitts) –Linear fct. –Sigmoid fct. –Gaussian fct. y j f xjxj w1w1 w2w2 wiwi … Weight Unit (Neuron) Activation, Output (Net-) Input
June 2003Neural Computation for Time Series4 Multilayer Perceptron (MLP), Radial Basis Function Network (RBFN) 2 (or more) layers (= connections) Input Units Hidden Units (typically nonlinear) Output Units (typically linear) MLP: RBFN:
June 2003Neural Computation for Time Series5 MLP as Universal Function Approximator E.g,: 1 Input, 1 Output, 5 Hidden MLP can approximate arbitrary functions (Hornik et al. 1990) trough superposition of weighted sigmoids Similar is true for RBFN move (bias) Stretch, mirror
June 2003Neural Computation for Time Series6 Training (Model Estimation) Typical error function: „Backpropagation“ (application of chain rule): contribution of error function contribution of network Iterative optimisation based on gradient (gradient descent, conjugent gradient, quasi-Newton): (summed squared error) target all patterns all outputs
June 2003Neural Computation for Time Series7 Recurrent Perceptrons Recurrent connection = feedback loop From hidden layer („Elman“) or output layer („Jordan“) Learning: „backpropagation through time“ Input Zustands- bzw. Kontextlayer copy
June 2003Neural Computation for Time Series8 Time series processing Given: time-dependent observables Scalar: univariate; vector: multivariate Typical tasks: - Forecasting - Noise modeling - Pattern recognition - Modeling - Filtering - Source separation Time series (minutes to days) Signals (milliseconds to seconds)
June 2003Neural Computation for Time Series9 Examples Standard & Poor‘sSunspots Preprocessed: (returns) Preprocessed: (de-seasoned)
June 2003Neural Computation for Time Series10 Autoregressive models Forecasting: making use of past information to predict (estimate) the future AR: Past information = past observations past observations Expected value Noise, „random shock“ Best forecast: expected value
June 2003Neural Computation for Time Series11 Linear AR models Most common case: Simplest form: random walk Nontrivial forecast impossible
June 2003Neural Computation for Time Series12 MLP as NAR Neural network can approximate nonlinear AR model „time window“ or „time delay“
June 2003Neural Computation for Time Series13 Noise modeling Regression is density estimation of: (Bishop 1995) Likelihood: Distribution with expected value F(x i ) Target = future past
June 2003Neural Computation for Time Series14 Gaussian noise Likelihood: Maximization = minimization of -logL (constant terms can be deleted, incl. p(x)) Corresponds to summed squared error (typical backpropagation)
June 2003Neural Computation for Time Series15 Complex noise models Assumption: arbitrary distribution Parameters are time dependent (dependent on past): Likelihood: Probability density function for D
June 2003Neural Computation for Time Series16 Heteroskedastic time series Assumption: Noise is Gaussian with time-dependent variance ARCH model MLP is nonlinear ARCH (when applied to returns/residuals)
June 2003Neural Computation for Time Series17 Non-Gaussian noise Other parametric pdfs (e.g. t-distribution) Mixture of Gaussians (Mixture density network, Bishop 1994) Network with 3k outputs (or separate networks)
June 2003Neural Computation for Time Series18 Identifiability problem Mixture models (like neural networks) are not identifiable (parameters cannot be interpreted) No distinction between model and noise e.g. sunspot data: Models have to be treated with care
June 2003Neural Computation for Time Series19 Recurrent networks: Moving Average Second model class: Moving Average models Past information: random shocks Recurrent (Jordan) network: Nonlinear MA However, convergence not guaranteed
June 2003Neural Computation for Time Series20 GARCH Extension of ARCH: Explains „volatility clustering“ Neural network can again be a nonlinear version Using past estimates: recurrent network
June 2003Neural Computation for Time Series21 State space models Observables depend on (hidden) time-variant state Strong relationship to recurrent (Elman) networks Nonlinear version only with additional hidden layers
June 2003Neural Computation for Time Series22 Symbolic time series Examples: –DNA –Text –Quantised time series (e.g. „up“ and „down“) Past information: past p symbols probability distribution Markov chains Problem: long substrings are rare alphabet
June 2003Neural Computation for Time Series23 Fractal prediction machines Similar subsequences are mapped to points close in space Clustering = extraction of stochastic automaton
June 2003Neural Computation for Time Series24 Relationship to recurrent network Network of 2nd order
June 2003Neural Computation for Time Series25 Other topics Filtering: corresponds to ARMA models NN as nonlinear filters Source separation independent component analysis Relationship to stochastic automata
June 2003Neural Computation for Time Series26 Practical considerations Stationarity is an important issue Preprocessing (trends, seasonalities) N-fold cross-validation time-wise (validation set must be after training set Mean and standard deviation model selection train validation test
June 2003Neural Computation for Time Series27 Summary Neural networks are powerful semi-parametric models for nonlinear dependencies Can be considered as nonlinear extensions of classical time series and signal processing techniques Applying semi-parametric models to noise modeling adds another interesting facet Models must be treated with care, much data necessary