Analyzing Time Series Gene Expression Data

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
The multi-layered organization of information in living systems
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
A New Approach to Analyzing Gene Expression Time Series Data Ziv Bar-Joseph Georg Gerber David K. Gifford Tommi S. Jaakkola Itamar Simon Learning Seminar:
Inference of Transcriptional Regulation Network with Gene Expression Data Andrew Kwon.
Classical tree view of cell cycle data (Spellman, et al MolBiolCell 9, 3273)
Fuzzy K means.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
More on Microarrays Chitta Baral Arizona State University.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Analysis of the yeast transcriptional regulatory network.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
Introduction to biological molecular networks
Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,
Module Networks BMI/CS 576 Mark Craven December 2007.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
David Amar, Tom Hait, and Ron Shamir
Semi-Supervised Clustering
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression By Alfredo A Kalaitzis and Neil.
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Recovering Temporally Rewiring Networks: A Model-based Approach
Building and Analyzing Genome-Wide Gene Disruption Networks
1 Department of Engineering, 2 Department of Mathematics,
Hidden Markov Models Part 2: Algorithms
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
1 Department of Engineering, 2 Department of Mathematics,
Gene expression analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Schedule for the Afternoon
Evaluation of inferred networks
GPX: Interactive Exploration of Time-series Microarray Data
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
SEG5010 Presentation Zhou Lanjun.
UNIVERSITY OF MASSACHUSETTS Dept
Principle of Epistasis Analysis
UNIVERSITY OF MASSACHUSETTS Dept
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
DESIGN OF EXPERIMENTS by R. C. Baker
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
CS249: Neural Language Model
Presentation transcript:

Analyzing Time Series Gene Expression Data Ziv Bar-Joseph Center for Automated Learning and Discovery Carnegie Mellon University 06/2004 Analyzing Time Series Expression Data

Expression Experiments Time series: Multiple arrays at various temporal intervals Static: Snapshot of the activity in the cell 06/2004 Analyzing Time Series Expression Data

Abundance of time series expression datasets Over 30% of the 170 papers perform time series experiments. A total of 220 time series datasets. More arrays used for time series than for static expression experiments. 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data 06/2004 Analyzing Time Series Expression Data

Unique features of time series expression experiments Autocorrelation between successive points. Can identify complete set of acting genes. Allows to infer causality. 06/2004 Analyzing Time Series Expression Data

Time Series Examples: Development Development of fruit flies [Arbeitman, Science 02] 06/2004 Analyzing Time Series Expression Data

Time Series Examples (cont) Function Infectious diseases, response to external stimulus Interactions and Systems Transcription factors knockouts 06/2004 Analyzing Time Series Expression Data

Time Series Examples: Systems The cell cycle system in yeast [Simon et al, Cell 01] 06/2004 Analyzing Time Series Expression Data

Computational challenges Biological Networks dynamic regulatory networks information fusion function, response programs clustering temporal signals Pattern Recognition alignment, diff. expressed genes continuous representation, missing values, Data Analysis transcription, decay rates sampling rates, duration Experimental Design 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Sampling Rates Non uniform Differ between experiments 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Cell Cycle Datasets Dataset Method of arrest Duration Cell cycle length Sampling Repeats alpha (Spellman 98) alpha mating factor 0-119m 64m every 7 minutes 1 cdc15 (Spellman 98) temp. sensitive cdc15 10-290m 112m ev. 20m for 1 hr, ev. 10m for 3 hr, ev. 20m for final hr cdc28 (Cho98) temp. sensitive cdc28 0-160m 85m every 10 minutes fkh1/fkh2 knockout (Zhu00) 0-215m 105m every 15m until 165m then after 45m 2 yox1/yhp1 knockout (Pramila02) 0-120m 60m 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Networks Pattern Recognition Data Analysis Experimental Design 06/2004 Analyzing Time Series Expression Data

Representing time series expression data We are capturing a continuous process with a few samples. We need a way to convert our samples for each gene to an expression profile. Some simple techniques: - Linear interpolation - Spline interpolation - Functional assignment 06/2004 Analyzing Time Series Expression Data

Standard interpolation If we have missing values and noise linear interpolation will fail to reproduce an accurate representation. 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Splines Instead of linear interpolation, we can use splines: piecewise polynomials. Still, will overfit when faced with missing values and noise. 06/2004 Analyzing Time Series Expression Data

The power of co-expression We can modify our splines to take into account the fact that many genes are co-expressed. 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Avoiding overfitting Require that for each gene  ~N(0, j) Add noise term 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Class Assignment In some cases the biological classes are known in advance. The algorithm can be modified and combined with a Gaussian mixture algorithm to perform clustering of the continuous representation of the expression data. Unlike previous algorithms, works on the continuous representation of the expression profile Performs much better than k-means on non uniformly sampled data 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Missing values 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Interpolation 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Alignment FKH1 cdc15 alpha cdc28 Difference in the timing of similar biological processes 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Continuous Alignment Using the estimated splines, we continuously align two expression datasets by minimizing a global error function Can be solved using numeric methods. Hard problem due to spline segments and the different durations. We also have an algorithm for removing genes that should not align RECOMB 2002 06/2004 Analyzing Time Series Expression Data

Identifying differentially expressed genes Wild Type Knockout Hard to perform manual comparison. Sampling rates and different timing prevent direct comparison. Local vs. global methods: Using the original samples is infisable since not on the same scale. Sampling the curves might lead to errors since does not take into account systematic biases Need a global measure and probability distribution Zhu et al, Nature 2000 06/2004 Analyzing Time Series Expression Data

Using Global Error to Determine Significance Key idea: Combine individual noise model with a global error (area between curves) that correctly captures the temporal difference between the two profiles. 06/2004 Analyzing Time Series Expression Data

Comparing the continuous representation WT Knockout 06/2004 Analyzing Time Series Expression Data

Enrichment for the Cell Cycle Factors 06/2004 Analyzing Time Series Expression Data

Overcoming population effects Smc3: observed values Microarray experiments profile population of cells. Initially cells are synchronized, but they lose their synchronization over time. Need to compensate for synchronization loss in order to recover single cell values. 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Networks Pattern Recognition Individual Gene Experimental Design 06/2004 Analyzing Time Series Expression Data

Pattern recognition and clustering Identifying relationships between genes based on expression profiles. Handling non uniform sampling rates. Determining relationships between clusters. 06/2004 Analyzing Time Series Expression Data

Time Shifted and Inverted Profiles Qian et al Journal of Molecular Biology 2001 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Results Simultaneous expression profile relationships: Inverted expression profile relationships: Time delayed expression profile relationships 06/2004 Analyzing Time Series Expression Data

Hierarchical clustering For n leaves there are n-1 internal nodes Each flip in an internal node creates a new linear ordering There are 2n-1 possible linear ordering of the leafs of the tree 3 4 5 4 5 3 1 2 06/2004 Analyzing Time Series Expression Data

Determine Relations Between Clusters Optimal leaf ordering: selects the ordering that maximizes the sum of the similarities of adjacent leaves in the clustering tree. Initial structure Permuted Hierarchical clustering 06/2004 Analyzing Time Series Expression Data

Results – Synthetic Data Hierarchical clustering Input Optimal ordering Input Optimal ordering Hierarchical clustering 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data 24 cell cycle experiments 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Short Time Series 60% of the time series datasets are short (<7). Over 40000 signals are measured, data is very noisy and experiments are compared across all time points. Most clustering algorithms will miss small sets, and in addition, cannot be used to compare datasets. 06/2004 Analyzing Time Series Expression Data

Taking advantage of the small number of points 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Networks Pattern Recognition Individual Gene Experimental Design 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Systems Biology Different types of data provide partial information about the activity in the cell. By integrating these data sources we can obtain a better picture of the activity in the cell. A lot of current interest – though relatively few methods construct temporal models. 06/2004 Analyzing Time Series Expression Data

Dynamic Bayesian Networks Bayesian networks are graphical models which can account for the stochastisity in the data. Can be extended to handle time series data (dynamic Bayesian networks). So far have been used for small scale modeling. 06/2004 Analyzing Time Series Expression Data

Modeling tryptophan metabolism on E. coli Ong et al Bioinformatics 2002 06/2004 Analyzing Time Series Expression Data

Genetic RegulAtory Modules (GRAM) Gene Modules: Set of genes that are co-regulated and co-expressed. Functional Module: Collection of gene modules with related function. 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Assembly of the Cell Cycle Transcriptional Regulatory Network Blue boxes: gene modules We combine GRAM with our continuous alignment algorithms to construct a dynamic model for a sub-network 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Assembly of the Cell Cycle Transcriptional Regulatory Network Blue boxes: gene modules Individual regulators: ovals, connected to their modules Dashed line: extends from module encoding a regulator to the regulator protein oval 06/2004 Analyzing Time Series Expression Data

Comparing the Continuous Representation WT Knockout 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Assembly of the Cell Cycle Transcriptional Regulatory Network Blue boxes: gene modules Individual regulators: ovals, connected to their modules Dashed line: extends from module encoding a regulator to the regulator protein oval 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Summary Time series expression data can be used to answer important biological questions. Pros: Autocorrelation, allows for casual inference, provides a better view of cellular activity Cons: Large number of signals but small number of time points, noise, lack of repeats By using methods specifically developed for this data we can overcome the above problems and take advantage of its unique properties 06/2004 Analyzing Time Series Expression Data

Analyzing Time Series Expression Data Want to know more ? Z. Bar-Joseph, “Analyzing time series gene expression data” Bioinformatics, in press. www.cs.cmu.edu/~zivbj zivbj@cs.cmu.edu 06/2004 Analyzing Time Series Expression Data