Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel.

Slides:



Advertisements
Similar presentations
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Advertisements

On-line learning and Boosting
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Data Mining Classification: Alternative Techniques
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
The purpose of this study is to use statistical and classification models to classify, detect and understand progression in visual fields (VFs) We intend.
CMPUT 466/551 Principal Source: CMU
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Bayesian networks and how they can help us to explore fish species interaction in the Northern gulf of St Lawrence Dr Allan Tucker Centre for Intelligent.
1 A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis Paul Kellam 1,
1 Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data Allan Tucker- Birkbeck College Stephen Swift- Brunel.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Genetic algorithms for neural networks An introduction.
The Stagecoach Problem
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Extending Evolutionary Programming to the Learning of Dynamic Bayesian Networks Allan Tucker Xiaohui Liu Birkbeck College University of London.
Spatial Operators for Evolving Dynamic Bayesian Networks from Spatio-Temporal Data Allan Tucker Xiaohui Liu David Garway-Heath Moorfields Eye Hospital.
Learning Dynamic Bayesian Networks with Changing Dependencies Allan Tucker Xiaohui Liu IDA 2003.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah Supervisor: Dr. Sid Ray.
Explaining Multivariate Time Series to Detect Early Problem Signs Architectures and Efficient Learning Algorithms for Dynamic Bayesian Networks Allan Tucker,
Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer.
Bayesian Classification and Forecasting of Visual Field Deterioration Allan Tucker, Xiaohui Liu; Brunel University David Garway-Heath; Moorfield’s Eye.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Geometric Approaches to Reconstructing Time Series Data Project Update 29 March 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
From Genes to Populations: The Intelligent Data Analysis of Biological Data Allan Tucker School of Information Systems Computing and Mathematics, Brunel.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Whole Genome Expression Analysis
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Big Data for Life Sciences Dr Allan Tucker Centre for Intelligent Data Analysis, Brunel University, London.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
The Broad Institute of MIT and Harvard Classification / Prediction.
Predicting Earthquakes By Lois Desplat. Why Predict Earthquakes?  To minimize the loss of life and property.  Unfortunately, current techniques do not.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
CLASSIFICATION: Ensemble Methods
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Image Stabilization by Bayesian Dynamics Yoram Burak Sloan-Swartz annual meeting, July 2009.
Data Abstraction and Time-Series Data CS 4390/5390 Data Visualization Shirley Moore, Instructor September 15,
Artificial Intelligence Lecture No. 6 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Flow cytometry data analysis: SPADE for cell population identification and sample clustering Narahara.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
David Amar, Tom Hait, and Ron Shamir
Introduction to Marketing Research
CS Fall 2016 (Shavlik©), Lecture 5
Semi-Supervised Clustering
Data Mining, Neural Network and Genetic Programming
Trees, bagging, boosting, and stacking
Basic machine learning background with Python scikit-learn
Avoid Overfitting in Classification
CSC 578 Neural Networks and Deep Learning
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel University West London

Cross-Section Data Studies often involve data sampled from a cross-section of a population Especially in biological and medical studies Collecting medical information on patients suffering from a particular disease and controls (healthy) Essentially these studies show a “snapshot” of the disease process

Cross-Section Data Many processes are inherently temporal in nature Previously healthy people can develop a disease over time going through different stages of severity If we want to model the development of such processes, usually require longitudinal data

Longitudinal Study Cross-Section vs Longitudinal Onset Cross Section Study

Pseudo Time-Series Models In this presentation we explore: Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000) Treating this ordered data as “Pseudo Time- Series” Using Pseudo Time-Series to build temporal models Test using a dynamic Bayesian network model for classifying: Medical Data Gene Expression Data

Multi-Dimensional Scaling Can be used to visualise distance between data points and pathways Here we use classic MDS Metric-based – Euclidean Distance

Minimum Spanning Tree Connects all nodes in graph Links contain minimal weights Weighted Graph MST

PQ-Tree PQ-Trees are used to encode partial orderings on variables P nodes: children can be in any order Q nodes: children order can only be reversed

Dynamic Bayesian Network Classifiers DBNCs are used to calculate: P(C|X t, X t-1 ) Here, we use the DBNC to model the Pseudo Time-Series for classifying data

Pseudo Time-Series Models In Summary: 1: Input: Cross-section data 2: Construct weighted graph and MST 3: Construct PQ tree from MST 4: Derive Pseudo Time-Series from PQ-tree using hill-climb search on P-nodes to minimise sequence length 5: Build DBNC model using pseudo temporal ordering of samples 6: Output: Temporal model of cross-section data

The Datasets B-Cell Microarray Data 3 classes of B-Cell data A number of patients Pre-ordered into expert pseudo time-series Visual Field Test Data One large cross-section study Healthy and Glaucomatous eyes One longitudinal study for testing the models

B-Cell: MDS & Pseudo Time-Series Plots show discovered path in 3D Classification of B-Cell data in 2D

B-Cell Accuracy Plot shows mean accuracy and variance over Cross-Validation with repeats

Expert Knowledge Ordering Sequence length Biologist = : 1-26 PQ-tree: = : 1-6,7,9,8,11,10,12-18,26,19,21,20,22-25 PQ-tree and hill-climb = : 1-18,26,19-25

Visual Field: MDS & Pseudo Time-Series Plots show Path found for VF data in 3D Classification of VF data in 2D

VF Accuracy Plot shows mean accuracy and variance over Train / Test data with repeats

Related Work Semi-Supervised Methods Some datapoints are labelled with classes These are used to assist classification of others in an incremental manner Pseudo MTS imposes an order on the data as well as a distance between data Allows for the prediction of future states

Conclusions Cross Section data usually models snapshot of a process Longitudinal data usually needed to model temporal nature Here we use ordering methods to create Pseudo Time-Series models Early results on medical and biological data are promising

Future Work Dealing with outliers in dataspace Multiple trajectories (e.g. in VF data) Normalisation (rather than discretisation) Combining a number of longitudinal and cross-section studies

Multiple Trajectories

Acknowledgements Thanks to: David Garway-Heath, Moorifield’s Eye Hospital, London Paul Kellam, University College London