Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School

Slides:



Advertisements
Similar presentations
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Advertisements

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Gene expression analysis summary Where are we now?
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Functional genomics and inferring regulatory pathways with gene expression data.
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states Speaker: Zhu Yang
1 Protein-Protein Interaction Networks MSC Seminar in Computational Biology
Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.
Classical tree view of cell cycle data (Spellman, et al MolBiolCell 9, 3273)
Fuzzy K means.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Epistasis Analysis Using Microarrays Chris Workman.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Synthetic biology: New engineering rules for emerging discipline Andrianantoandro E; Basu S; Karig D K; Weiss R. Molecular Systems Biology 2006.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Networks and Interactions Boo Virk v1.0.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
A COMPREHENSIVE GENE REGULATORY NETWORK FOR THE DIAUXIC SHIFT IN SACCHAROMYCES CEREVISIAE GEISTLINGER, L., CSABA, G., DIRMEIER, S., KÜFFNER, R., AND ZIMMER,
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Bioinformatics and Computational Biology
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
CSCE555 Bioinformatics Lecture 23 Integrative Genomics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Networks and Interactions
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
System Structures Identification
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
SEG5010 Presentation Zhou Lanjun.
Principle of Epistasis Analysis
Presentation transcript:

Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School

Other major contributors in these projects… Xianghong Jasmine Zhou Assistant Professor of Biological Sciences USC Wing Hung Wong Professor of Statistics and of Health Research and Policy Stanford University

Gene Expression Profiling

KEGG

Transitive Functional Annotation By Shortest Path Analysis of Gene Expression Data

Shortest Path Analysis Transitive Co-Expression a b d c

a b c d e f g c(a,b)>0.6, d(a,b)=(1-|c(a,b)|) 6 * 10 5 ab Shortest Path Analysis Gene Expression Similarity Graph

Shortest Path Analysis The Shortest Path Distance from a to d:d: a  g  d 68 a  b  c  d 59 a  f  e  d a b c d e f g Use Dijkstra's shortest path algorithm, time complexity =  (N 2 +E) a  g 84  c  d

Shortest Path Analysis Validation Question: Are genes on the shortest path really involved in the same biological process? Using genes with known functions to construct shortest paths, to check how many transitive genes have the same function as the two anchor genes? Data: Rosetta microarray compendium which includes 300 deletion and drug treatment experiments of S. cerevisiae

Shortest Path Analysis Validation: Scheme

Level 1 Level 0 Shortest Path Analysis Validation: Functional Similarity abcd e GO Biological Process tree abe c ab ce d ab c de (cell cycle) (mitosis) (metaphase)

Shortest Path Analysis Validation: Results The percentages of L0- and L1-match transitive genes in the three cellular compartments. Values shown above the bars are the numbers of genes.

Shortest Path Analysis Related Genes Not Co-Expressed Are Found

Shortest Path Analysis Comparison with Hierarchical Clustering

Shortest Path Analysis Gene Function Prediction: Scheme

Shortest Path Analysis Summary Predicted functions for 246 unknown yeast genes and found that a significant number is supported by evidence other than the data we used Proposed a hypothesis regarding the fundamental nature of expression relationships Verified that genes on the same shortest path are likely to be involved in the same biological process Comparison with hierarchical clustering reveals the specificity of the SP approach

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction

2nd-Order Analysis Current Challenges in Microarray Data Analysis 1. How to effectively combine the expression data sets generated with different technology/laboratory platforms? 2. How to identify functionally related genes without co-expression pattern? 3. How to identify transcription cascades?

Microarray Platforms 2nd-Order Analysis Multiple Microarray Technology Platforms

2nd-Order Analysis Public Microarray Data Sources Stanford and NCBI databases:  788 microarray experiments (61 datasets) for yeast  348 experiments (15 datasets) for worm  736 (44 datasets) for Arabidopsis thaliana  1,553 (20 datasets) for mouse  4,135 (90 datasets) for Human

Transcription Factor 1 Transcription Factor 3 Transcription Factor 2 gene1 gene2 gene3 gene5 gene4 gene6 gene7 Amplification of signal ? ?

Experimental groups exp. correlation First-order correlation Second-order Correlation

Chromatin Silencing Amino acid Starvation Gamma Radiation Protein Metabolism DNA Damage Heat Steady Expression of SDA1-CDC5 Expression Correlation POG1-MPT5, SDA1-CDC5 Expression of POG1-MPT5 Experimental groups Regulation of Cell Cycle: POG1-MPT5 and SDA1-CDC5 2nd-Order Analysis An Example

Group functionally related genes that may not exhibit similar expression patterns Data:  Stanford Microarray Database (cDNA array)  NCBI GEO Database (Affymetrix array)  Rosetta Compendium (cDNA array) =  39 experimental groups subjected to different (types) of perturbations, such as cell cycle, heat shock, osmotic pressure, starvation, zinc, nitrogen depletion, etc. 2nd-Order Analysis Validation

43 functional classes 2,429 genes 5,142 doublets 278,799 Quadruplets Homogenous Quadruplets 84% Heterogeneous Quadruplets 16% 2nd-Order Analysis Validation: Scheme

2nd-Order Analysis Validation: Comparison

2nd-Order Analysis Validation: Results 2 nd -order analysis groups functionally related genes The derived quadruplets give rise to a set of 2,597 distinct and novel gene pairs 97% of the 2,597 pairs are missed by the standard methods Reasons for the poor performance of the 1 st - order method Inter-dataset variations Cross-doublet gene pairs need not show high expression correlation Sensitivity to gene pairs which are only co-expressed in a subset of the data sets

c a b d e f 5 Cell Cycle c a b d e f 5 Heat shock Starvation c a b d e f 5 Nitrogen Depletion c a b d e f 5 c a b d e f 5 Radiation Osmotic pressure c a b d e f 5

2nd-Order Analysis Interaction Modules

For each gene occurred in the 100 tightest and most stable clusters of known genes, we masked its function and make prediction based on our 2-step procedure, and check the predicted function and its true function. Results: We made predictions for 179 doublets, among which 163 are correct  91% success ratio 2nd-Order Analysis Interaction Modules: Leave-one-out Cross Validation

2nd-Order Analysis Interaction Modules: Functional Prediction 79 functions of 69 unknown yeast genes involved in diverse biological processes Experimental studies in the literature and in our laboratory  YLR183C in “mitosis” Regulation of G1/S transition  YLL051C in “cation transport” Ferric-chelate reductase activity and iron-regulated expression

2nd-Order Analysis Frequently Occurring Tight Clusters Transcription Factors

2nd-Order Analysis Frequently Occurring TCs with 2nd-Order Correlation

Relevance Networks Transcription Factors Set 1 Transcription Factor Set 2 Cooperativity

Three types of transcription cascades

2nd-Order Analysis ChIP-Chip

2nd-Order Analysis Transcription Module Results 60 transcription modules identified 34 pairs showed high 2nd-order correlation 29% (P<10 -5 ) of those modules pairs are participants in transcription cascades  2 pairs in Type I cascades  8 pairs in Type II cascades  3 pairs in Type III cascades These transcription cascades inter-connect into a partial cellular regulatory network

Avg. Expression Leu3 module vs. Met4 module Avg. Expression Correlation Leu3 module vs. Met4 module nd-Order Analysis Leu3 and Met4 Transcription Cascade

2nd-Order Analysis Hierarchical clustering of transcriptional modules

2nd-Order Analysis Assigning transcription factor to pathways For an unknown transcription factor in a module cluster, we can annotate its function by integrating the evidence of two dimensions: the functions of known genes in its target module the functions of known transcription factors regulating other modules in the same cluster

2nd-Order Analysis Summary We developed a framework to integrate many microarray data sets in a platform-independent way, and investigated its properties and applications: Group together functionally-related genes without direct expression similarity Cluster the functional interaction into modules and functional annotation for unknown genes Reveal the cooperativity in the regulatory network and reconstruct transcription cascades