Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich.edu

Xianghong Jasmine Zhou Assistant Professor of Biological Sciences USC Wing Hung Wong Professor of Statistics and of Health Research and Policy Stanford University

2nd-Order Analysis Current Challenges in Microarray Data Analysis 1. How to effectively combine the expression data sets generated with different technology/laboratory platforms? 2. How to identify functionally related genes without co-expression pattern? 3. How to identify transcription cascades?

Microarray Platforms 2nd-Order Analysis Multiple Microarray Technology Platforms

2nd-Order Analysis Public Microarray Data Sources ExperimentsDatasets S. cerevisiae78861 C. elegans34815 A. thaliana73644 M. mus1,55320 H. sapiens4,13590

Transcription Factor 1 Transcription Factor 3 Transcription Factor 2 gene1 gene2 gene3 gene5 gene4 gene6 gene7 Amplification of signal ? ?

Experimental groups exp. correlation First-order correlation Second-order Correlation

Chromatin Silencing Amino acid Starvation Gamma Radiation Protein Metabolism DNA Damage Heat Steady Expression of SDA1-CDC5 Expression Correlation POG1-MPT5, SDA1-CDC5 Expression of POG1-MPT5 Experimental groups Regulation of Cell Cycle: POG1-MPT5 and SDA1-CDC5 2nd-Order Analysis An Example

Group functionally related genes that may not exhibit similar expression patterns? Data  Stanford Microarray Database (cDNA array)  NCBI GEO Database (Affymetrix array)  Rosetta Compendium (cDNA array) 39 experimental groups subjected to different (types) of perturbations, such as cell cycle, heat shock, osmotic pressure, starvation, zinc, nitrogen depletion, etc. 2nd-Order Analysis Validation

43 functional classes 2,429 genes 5,142 doublets 278,799 Quadruplets Homogenous Quadruplets 84% Heterogeneous Quadruplets 16% 2nd-Order Analysis Validation: Scheme

2nd-Order Analysis Validation: Comparison

2nd-Order Analysis Validation: Results 2 nd -order analysis groups functionally related genes  The derived quadruplets give rise to a set of 2,597 distinct and novel gene pairs  97% of the 2,597 pairs are missed by the standard methods Reasons for the poor performance of the 1 st - order method  Inter-dataset variations  Cross-doublet gene pairs need not show high expression correlation  Sensitivity to gene pairs which are only co- expressed in a subset of the data sets

c a b d e f 5 Cell Cycle c a b d e f 5 Heat shock Starvation c a b d e f 5 Nitrogen Depletion c a b d e f 5 c a b d e f 5 Radiation Osmotic pressure c a b d e f 5

2nd-Order Analysis Interaction Modules

2nd-Order Analysis Interaction Modules: Leave-one-out Cross Validation For each gene occurred in the 100 tightest and most stable clusters of known genes, we masked its function and make prediction based on our 2-step procedure, and check the predicted function and its true function. We made predictions for 179 doublets, among which 163 are correct  91% success ratio

2nd-Order Analysis Interaction Modules: Functional Prediction 79 functions of 69 unknown yeast genes involved in diverse biological processes Experimental studies in the literature and in our laboratory  YLR183C in “mitosis” Regulation of G1/S transition  YLL051C in “cation transport” Ferric-chelate reductase activity and iron-regulated expression

2nd-Order Analysis Frequently Occurring Tight Clusters Transcription Factors

2nd-Order Analysis Frequently Occurring TCs with 2nd-Order Correlation

Transcription Factors Set 1 Transcription Factor Set 2 Cooperativity

3 types of transcription cascades

2nd-Order Analysis ChIP-Chip

2nd-Order Analysis Transcription Module Results 60 transcription modules identified 34 pairs showed high 2nd-order correlation 29% (P<10 -5 ) of those modules pairs are participants in transcription cascades  2 pairs in Type I cascades  8 pairs in Type II cascades  3 pairs in Type III cascades These transcription cascades inter-connect into a partial cellular regulatory network

Avg. Expression Leu3 module vs. Met4 module Avg. Expression Correlation Leu3 module vs. Met4 module 1.0 1.0 2nd-Order Analysis Leu3 and Met4 Transcription Cascade

2nd-Order Analysis Hierarchical clustering of transcriptional modules

2nd-Order Analysis Assigning transcription factor to pathways For an unknown transcription factor in a module cluster, we can annotate its function by integrating 2 types of evidence: the functions of known genes in its target module the functions of known transcription factors regulating other modules in the same cluster

2nd-Order Analysis Summary A framework to integrate many microarray data sets in a platform-independent way, and investigated its properties and applications. Group together functionally- related genes without direct expression similarity Cluster the functional interaction into modules and functional annotation for unknown genes Reveal the cooperativity in the regulatory network and reconstruct transcription cascades

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.

Similar presentations

Presentation on theme: "Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.

Similar presentations

Presentation on theme: "Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of."— Presentation transcript:

Similar presentations

About project

Feedback