Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College.

Similar presentations


Presentation on theme: "Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College."— Presentation transcript:

1 Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College University of London

2

3

4 MTS Applications at Birkbeck Screening Screening Forecasting Forecasting Explanation Explanation

5 Forecasting Predicting Visual Field Deterioration of Glaucoma Patients Predicting Visual Field Deterioration of Glaucoma Patients Function Prediction for Novel Proteins from Multiple Sequence/Structure Data Function Prediction for Novel Proteins from Multiple Sequence/Structure Data

6 Explanation Input (observations): t - 0: Tail Gas Flow in_state 0 t - 3: Reboiler Temperature in_state 1 Output (explanation): t - 7: Top Temperature in_state 0 with probability=0.92 t - 54: Feed Rate in_state 1 with probability=0.71 t - 75: Reactor Temperature in_state 0 with probability=0.65

7 The Gaps Screening Screening u Automatic / Semi- Automatic Analysis of Outliers Forecasting Forecasting u Analysing Short Multivariate Time Series Explanation Explanation u Coping with Huge Search Spaces

8 The Problem - What/Why/How Short-Term Forecasting of Visual Field Progression Using a Statistical MTS Model The Vector Auto-Regressive Process - VAR(P) There Could be Problems if the MTS is Short A Modified Genetic Algorithm (GA) can be Used VARGA The Prediction of Visual Field Deterioration Plays an Important Role in the Management of the Condition

9 Background - The Dataset The interval between tests is about 6 months Typically, 76 points are measured The number of tests can range between 10 and 44 x Points used in this paper (Right Eye) Usual Position of Blind Spot (Right Eye) x Values Range Between 60 =very good, 0 = blind 76751819 7473 71 151617 706968 676665 11121314 6463 72 678910 626160595812345 43424140392021222324 48474645442526272829 5251504930313233 555453343536 57563738

10 Background - The VAR Process Vector Auto-Regressive Process of Order P: VAR(P) x(t)VF Test for Data Points at Time t (K  1) A i Parameter Matrix at Lag i (K  K) x(t-i)VF Test for Data Points at lag i from t (K  1)  (t)Observational Noise at time t (K  1)

11 The Genetic Algorithm Generate a Population of random Chromosomes (Solutions) Repeat for a number of Generations Cross Over the current Population Mutate the current Population Select the Fittest for the next Population Loop The best solution to the problem is the Chromosome in the last generation which has the highest Fitness “A Search/Optimisation method that solves a problem through maintaining and improving a population of suitable candidate solutions using biological metaphors”

12 GAs - Chromosome Example X 0-127 0000000-1111111 Y 0-31 00000-11111 0000000.00000-1111111.11111

13 GAs - Mutation Each Bit (gene) of a Chromosome is Given a Chance MP of inverting Each Bit (gene) of a Chromosome is Given a Chance MP of inverting A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’ A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’ 01101101 These Ones! 00101111

14 GAs - Crossover (2) 0101110111101010 AB X=4 01011010 CD 11101101

15 VARGA - Representation Chromosome a 111 … … … a 1ij … … … a 1KK A1A1 A2A2 AmAm ApAp... a 211 … … … a 2ij … … … a 1KK a m11 … … … a mij … … … a mKK a p11 … … … a pij … … … a pKK

16 VARGA - The Genetic Algorithm GA With Extra Mutation Order Mutation After Gene Mutation Parents and Children Mutate (Both) Genes are Bound Natural Numbers Fitness is -ve Forecast Error Minimisation Problem - Roulette Wheel Run for EACH Patient

17 Evaluation - Methods for Comparison SPlus: Yule Walker Equations, AIC and Whittles Recursion, N  K(P+1), Standard Package Holt-Winters Univariate Forecasting Method, Is the Data Univariate? (GA Solution) Pure Noise Model, VAR(0), Worst Case Forecast, (Non-Differenced = 0) 54 out of the Possible 82 Patients VF Records Could not be Used : SPlus Implementation

18 Results - Graph Comparison The Lower the Score - the Better Score is the One Step Ahead Forecast Error

19 Results - Table Summary Average = The Average One Step Forecast Error For the 28 Patients (Both GA’s Fitness) (The Lower - The Better)

20 Conclusion - Results VARGA Has a Better Performance VARGA Can Model Short MTS The Visual Field Data is Definitely Multivariate Data Has a High Proportion of Noise

21 Conclusion - Remarks Non-Linear Methods and Transformations Performance Enhancements for the GA Improve Crossover Irregularly Spaced Methods Space-Time Series Methods Time Dependant Relationships Between Variables

22 Generating Explanations in MTS Useful to know probable explanations for a given set of observations within a time series Useful to know probable explanations for a given set of observations within a time series E.g. Oil Refinery: ‘Why a temperature has become high whilst a pressure has fallen below a certain value?’ E.g. Oil Refinery: ‘Why a temperature has become high whilst a pressure has fallen below a certain value?’ Possible paradigm which facilitates Explanation is the Bayesian Network Possible paradigm which facilitates Explanation is the Bayesian Network Evolutionary Methods to learn BNs Evolutionary Methods to learn BNs Extend work to Dynamic Bayesian Networks Extend work to Dynamic Bayesian Networks

23 Dynamic Bayesian Networks Static BNs repeated over t time slices Static BNs repeated over t time slices Contemporaneous / Non-Contemporaneous Links Contemporaneous / Non-Contemporaneous Links Used for Prediction / Diagnosis within dynamic systems Used for Prediction / Diagnosis within dynamic systems

24 Assume all variables take at least one time slice to impose an effect on another. Assume all variables take at least one time slice to impose an effect on another. The more frequently a system generates data, the more likely this will be true. The more frequently a system generates data, the more likely this will be true. Contemporaneous Links can be excluded from the DBN Contemporaneous Links can be excluded from the DBN Each variable at time, t, will be considered independent of one another Each variable at time, t, will be considered independent of one another Assumptions - 1

25 Representation P pairs of the form (ParentVar, TimeLag) P pairs of the form (ParentVar, TimeLag) Each pair represents a link from a node at a previous time slice to the node in question at time t. Each pair represents a link from a node at a previous time slice to the node in question at time t. Examples : Variable 1: { (1,1); (2,2); (0,3)} Variable 4: { (4,1); (2,5)}

26 Search Space Given the first assumption and proposed representation the Search Space for each variable will be: Given the first assumption and proposed representation the Search Space for each variable will be:

27 Structure Search : Evolutionary Algorithms, Hill Climbing etc. Parameter Calculation given structure Dynamic Bayesian Network Library for Different Operating States Multivariate Time Series Explanation Algorithm (e.g. using Stochastic Simulation) User Algorithm

28 Generating Synthetic Data (1) (2)

29 Oil Refinery Data Data recorded every minute Data recorded every minute Hundreds of variables Hundreds of variables Selected 11 interrelated variables Selected 11 interrelated variables Discretised each variable into k states Discretised each variable into k states Large Time Lags (up to 120 minutes between some variables) Large Time Lags (up to 120 minutes between some variables) Different Operating States Different Operating States

30 Results SOT FF TGF TT RinT

31 Explanations - using Stochastic Simulation

32

33 Explanation Input (observations): t - 0: Tail Gas Flow in_state 0 t - 3: Reboiler Temperature in_state 1 Output (explanation): t - 7: Top Temperature in_state 0 with probability=0.92 t - 54: Feed Rate in_state 1 with probability=0.71 t - 75: Reactor Temperature in_state 0 with probability=0.65

34 Future Work Exploring the use of different searches and metrics Exploring the use of different searches and metrics Improving accuracy (e.g. different discretisation policies, continuous DBNs) Improving accuracy (e.g. different discretisation policies, continuous DBNs) Using the library of DBNs in order to quickly classify the current state of a system Using the library of DBNs in order to quickly classify the current state of a system Automatically Detecting Changing Dependency Structure Automatically Detecting Changing Dependency Structure

35 Acknowledgements BBSRC BP-AMOCO British Council for Prevention of Blindness EPSRC Honeywell Hi-Spec Solutions Honeywell Technology Center Institute of Opthalmology Moorfields Eye Hospital MRC

36 Intelligent Data Analysis X Liu Department of Computer Science Birkbeck College University of London

37 Intelligent Data Analysis An interdisciplinary study concerned with effective analysis of data An interdisciplinary study concerned with effective analysis of data Intelligent application of data analytic tools Intelligent application of data analytic tools Application of “intelligent” data analytic tools Application of “intelligent” data analytic tools

38 IDA Requires Careful thinking at every stage of an analysis process (strategic aspects) Careful thinking at every stage of an analysis process (strategic aspects) Intelligent application of relevant domain knowledge Intelligent application of relevant domain knowledge Assessment and selection of appropriate analysis methods Assessment and selection of appropriate analysis methods

39 IDA Conferences IDA-95, Baden-Baden IDA-95, Baden-Baden IDA-97, London IDA-97, London IDA-99, Amsterdam IDA-99, Amsterdam IDA-2001, Lisbon IDA-2001, Lisbon

40 IDA in Medicine and Pharmacology IDAMAP-96, Budapest IDAMAP-96, Budapest IDAMAP-97, Nagoya IDAMAP-97, Nagoya IDAMAP-98, Brighton IDAMAP-98, Brighton IDAMAP-99, Washington DC IDAMAP-99, Washington DC IDAMAP-2000, Berlin IDAMAP-2000, Berlin

41 Other IDA Activities IDA Journal (Elsevier 1997) IDA Journal (Elsevier 1997) Journal Special Issues (1997 -) Journal Special Issues (1997 -) Introductory Books (Springer 1999) Introductory Books (Springer 1999) The Dagstuhl Seminar (Germany 2000) The Dagstuhl Seminar (Germany 2000) European Summer School (Italy 2000) European Summer School (Italy 2000) Special Sessions at Conferences Special Sessions at Conferences

42 Concluding Remarks Strategies for data analysis and mining Strategies for data analysis and mining Strategies for human-computer collaboration in IDA Strategies for human-computer collaboration in IDA Principles for exploring and analysing “big data” Principles for exploring and analysing “big data” Benchmarking interesting real-world data- sets as well as computational methods Benchmarking interesting real-world data- sets as well as computational methods A long term interdisciplinary effort A long term interdisciplinary effort

43 The Screening Architecture

44 Results from a GP Clinic


Download ppt "Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College."

Similar presentations


Ads by Google