Download presentation
Presentation is loading. Please wait.
1
1 A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis Paul Kellam 1, Xiaohui Liu 2, Nigel Martin 3, Christine Orengo 4, Stephen Swift 2, Allan Tucker 2 1 Dept of Immunology and Molecular Pathology, UCL, UK 2 Dept of Information Systems and Computing, Brunel University, UK 3 Dept of Computer Science, Birkbeck College, London, WC1E 7HX, UK 4 Dept of Biochemistry and Molecular Biology, UCL, WC1E 6BT, UK
2
2 Framework Expression Data Clustering Algorithms Cluster Fusion Model Building Clusters Robust Clusters ForecastsExplanations
3
3 Clustering Algorithms Hierarchical The Grouping Genetic Algorithm K-Means The Self Organising Map
4
4 Cluster Fusion (1) Construct Agreement Matrix Clusterfusion...... Cluster Method 1 Cluster Method 2 Cluster Method N
5
5 The Agreement Matrix F = To Gene From Gene
6
6 Viral Gene Expression Data Kaposi's Sarcoma-Associated Human Herpesvirus 8 (HHV8) 106 viral and human genes Induced with 12-O-TetradecoylPhorbol 13-Acetate (TPA) 13 Measurements over time Normalised expression levels
7
7 Evaluation Compare cluster similarity using Weighted-Kappa Compare clusters against biological domain knowledge Clusterfusion
8
8 Weighted-Kappa Results Hx :Hierarchical Clustering with x Clusters Kx :K-Means Clustering with x Clusters Sx :Self Organising Map with x Clusters Gx :Grouping Genetic Algorithm with x Clusters
9
9 Domain Knowledge Results
10
10 Clusterfusion Results 48 out of 106 genes unassigned Mostly pairs or triples Only 3 of feature 2 are present! Although there are some interesting results, e.g. unknown function genes placed with those of known function
11
11 Modelling We have focussed on the Dynamic Bayesian Network Models a temporal domain probabilisticallyModels a temporal domain probabilistically Consists of a graphical representation and conditional probability distributionsConsists of a graphical representation and conditional probability distributions Facilitates the combining of expert knowledge and dataFacilitates the combining of expert knowledge and data Models can be queried to investigate the relationships discovered from dataModels can be queried to investigate the relationships discovered from data Requires data discretisationRequires data discretisation
12
12 Dynamic Bayesian Networks g0g1g2g3g4g0g1g2g3g4 t-5 t-4 t-3 t-2 t-1 t Genes Time Lag
13
13 Modelling Results Example DBNs (compact representation without lags included):
14
14 Forecast Results
15
15 Explanation Apply inference given observations about certain nodes: Insert observations into DBNInsert observations into DBN Apply inference back in timeApply inference back in time Construct explanations using posterior probabilitiesConstruct explanations using posterior probabilities
16
16 Explanation - Results An example explanation using a discovered DBN P(C7 is 2) =1.000 P(H8 is 2) = 0.999P(B12 is 2) = 0.884 P(C7 is 1) =1.000 P(B6 is 2) = 0.568 P(A7 is 1) = 0.510 P(B12 is 1) = 0.440 122 121
17
17 Conclusions Modelling gene expression data is a challenging task Introduced a framework for modelling such data Encouraging preliminary results when applied to viral gene expression data More rigorous testing on different datasets
18
18 Acknowledgements Biotechnology and Biological Sciences Research Council (BBSRC), UK The Engineering and Physical Sciences Research Council (EPSRC), UK
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.