Network integration and function prediction: Putting it all together

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Network integration and function prediction: Putting it all together Slides courtesy of Curtis Huttenhower Harvard School of Public Health Department.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Network integration and function prediction: Putting it all together Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Gene regulatory network
Supervised and unsupervised methods for large scale genomic data integration Curtis Huttenhower Harvard School of Public Health Department of.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Computational Methodology for Microbial and Metagenomic Characterization using Large Scale Functional Genomic Data Integration Curtis Huttenhower
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Computational Approaches in Epigenomics Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Large scale genomic data mining Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Unit 1: The Language of Science  communicate and apply scientific information extracted from various sources (3.B)  evaluate models according to their.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Answering biological questions using large genomic data collections Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A COMPREHENSIVE GENE REGULATORY NETWORK FOR THE DIAUXIC SHIFT IN SACCHAROMYCES CEREVISIAE GEISTLINGER, L., CSABA, G., DIRMEIER, S., KÜFFNER, R., AND ZIMMER,
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Large scale genomic data integration for functional genomics and metagenomics Curtis Huttenhower Harvard School of Public Health Department of.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Large scale genomic data integration for functional metagenomics Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Central dogma: the story of life RNA DNA Protein.
Introduction to biological molecular networks
Motif Search and RNA Structure Prediction Lesson 9.
Transcription factor binding motifs (part II) 10/22/07.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
CSCI2950-C Lecture 12 Networks
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Learning gene regulatory networks in Arabidopsis thaliana
Genomic Data Integration
Inferring Models of cis-Regulatory Modules using Information Theory
Large Scale Data Integration
Genomic Data Manipulation
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Loyola Marymount University
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Volume 20, Issue 5, Pages (November 2014)
Computational Discovery of miR-TF Regulatory Modules in Human Genome
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
SEG5010 Presentation Zhou Lanjun.
The Translational Landscape of the Mammalian Cell Cycle
Principle of Epistasis Analysis
Predicting Gene Expression from Sequence
Volume 20, Issue 5, Pages (November 2014)
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

Network integration and function prediction: Putting it all together Curtis Huttenhower 04-13-11 Harvard School of Public Health Department of Biostatistics

Outline Functional network integration Network meta-analysis Bayes nets and LR The human genome, tissues, and disease Network meta-analysis Pathogens and MTb Quantifying progress in yeast Networks to pathways Functional mapping: networks of networks Hierarchical integration Pathway prediction Regulatory network integration Network motifs

A computational definition of functional genomics Prior knowledge Genomic data Gene ↓ Function Gene ↓ Data ↓ Function Function ↓

A framework for functional genomics 100Ms gene pairs → G1 G2 + G4 G9 … G3 G6 - G7 G8 G5 ? 0.9 0.7 0.1 0.2 0.8 0.5 0.05 0.6 ← 1Ks datasets P(G2-G5|Data) = 0.85 High Correlation Low Frequency High Correlation Low Coloc. Not coloc. Frequency + = Similar Dissim. Frequency High Similarity Low

MEFIT: A Framework for Functional Genomics Functional area Tissue Disease … Functional Relationship Biological Context Golub 1999 Butte 2000 Whitfield 2002 Hansen 1998

Functional network prediction and analysis Global interaction network HEFalMp Currently includes data from 30,000 human experimental results, 15,000 expression conditions + 15,000 diverse others, analyzed for 200 biological functions and 150 diseases Metabolism network Signaling network Gut community network

HEFalMp: Predicting human gene function

HEFalMp: Predicting human genetic interactions

HEFalMp: Analyzing human genomic data

HEFalMp: Understanding human disease

Validating Human Predictions With Erin Haley, Hilary Coller Autophagy 5½ of 7 predictions currently confirmed Predicted novel autophagy proteins Luciferase (Negative control) ATG5 (Positive control) LAMP2 RAB11A Not Starved Starved (Autophagic)

Outline Functional network integration Network meta-analysis Bayes nets and LR The human genome, tissues, and disease Network meta-analysis Pathogens and MTb Quantifying progress in yeast Networks to pathways Functional mapping: networks of networks Hierarchical integration Pathway prediction Regulatory network integration Network motifs

Meta-analysis for unsupervised functional data integration Huttenhower 2006 Hibbs 2007 Evangelou 2007 Simple regression: All datasets are equally accurate Random effects: Variation within and among datasets and interactions

Meta-analysis for unsupervised functional data integration Huttenhower 2006 Hibbs 2007 Evangelou 2007 + =

Unsupervised data integration: TB virulence and ESX-1 secretion With Sarah Fortune Graphle http://huttenhower.sph.harvard.edu/graphle/

Unsupervised data integration: TB virulence and ESX-1 secretion With Sarah Fortune X ? Graphle http://huttenhower.sph.harvard.edu/graphle/

Predicting gene function Predicted relationships between genes High Confidence Low Cell cycle genes

Predicting gene function Predicted relationships between genes High Confidence Low Cell cycle genes

Predicting gene function Predicted relationships between genes High Confidence Low These edges provide a measure of how likely a gene is to specifically participate in the process of interest. Cell cycle genes

Comprehensive validation of computational predictions With David Hess, Amy Caudy Genomic data Prior knowledge Computational Predictions of Gene Function SPELL Hibbs et al 2007 bioPIXIE Myers et al 2005 MEFIT Retraining Genes predicted to function in mitochondrion organization and biogenesis New known functions for correctly predicted genes Could go (-) Laboratory Experiments Petite frequency Growth curves Confocal microscopy

Evaluating the performance of computational predictions Genes involved in mitochondrion organization and biogenesis 106 Original GO Annotations 135 Under-annotations 82 Novel Confirmations, First Iteration 17 Novel Confirmations, Second Iteration 340 total: >3x previously known genes in ~5 person-months Could go (-)

Evaluating the performance of computational predictions Genes involved in mitochondrion organization and biogenesis Computational predictions from large collections of genomic data can be accurate despite incomplete or misleading gold standards, and they continue to improve as additional data are incorporated. 106 Original GO Annotations 95 Under-annotations 40 Confirmed Under-annotations 80 Novel Confirmations First Iteration 17 Novel Confirmations Second Iteration 340 total: >3x previously known genes in ~5 person-months Could go (-)

Outline Functional network integration Network meta-analysis Bayes nets and LR The human genome, tissues, and disease Network meta-analysis Pathogens and MTb Quantifying progress in yeast Networks to pathways Functional mapping: networks of networks Hierarchical integration Pathway prediction Regulatory network integration Network motifs

Functional mapping: mining integrated networks Predicted relationships between genes High Confidence Low The strength of these relationships indicates how cohesive a process is. Chemotaxis

Functional mapping: mining integrated networks Predicted relationships between genes High Confidence Low Chemotaxis

Functional mapping: mining integrated networks Predicted relationships between genes High Confidence Low The strength of these relationships indicates how associated two processes are. Chemotaxis Flagellar assembly

Functional mapping: Associations among processes Hydrogen Transport Electron Transport Cellular Respiration Protein Processing Peptide Metabolism Cell Redox Homeostasis Aldehyde Metabolism Energy Reserve Metabolism Vacuolar Protein Catabolism Negative Regulation of Protein Metabolism Organelle Fusion Protein Depolymerization Organelle Inheritance Edges Associations between processes Moderately Strong Very Strong

Functional mapping: Associations among processes Hydrogen Transport Electron Transport Cellular Respiration Protein Processing Peptide Metabolism Cell Redox Homeostasis Aldehyde Metabolism Energy Reserve Metabolism Vacuolar Protein Catabolism Negative Regulation of Protein Metabolism Organelle Fusion Protein Depolymerization Organelle Inheritance Edges Associations between processes Moderately Strong Very Strong Borders Data coverage of processes Sparsely Covered Well Covered

Functional mapping: Associations among processes Hydrogen Transport Electron Transport Cellular Respiration Protein Processing Peptide Metabolism Cell Redox Homeostasis Aldehyde Metabolism Energy Reserve Metabolism Vacuolar Protein Catabolism Negative Regulation of Protein Metabolism Organelle Fusion Protein Depolymerization Organelle Inheritance Edges Associations between processes Moderately Strong Very Strong Nodes Cohesiveness of processes Below Baseline Baseline (genomic background) Very Cohesive Borders Data coverage of processes Sparsely Covered Well Covered

Functional mapping: Associations among processes Edges Associations between processes Moderately Strong Very Strong Nodes Cohesiveness of processes Below Baseline Baseline (genomic background) Very Cohesive Borders Data coverage of processes Sparsely Covered Well Covered

How do functional interactions become pathways? Gene expression Physical PPIs Genetic interactions Colocalization Sequence Protein domains Regulatory binding sites … ? + =

Simultaneous inference of physical, genetic, regulatory, and functional networks With Chris Park, Olga Troyanskaya Functional genomic data Functional interactions Regulatory interactions Post-transcriptional regulation Phosphorylation Metabolic interactions Protein complexes

Learning a compendium of interaction networks Train one SVM per interaction type Resolve consistency using hierarchical Bayes net

Learning a compendium of interaction networks Both presence/absence and directionality of interactions are accurately inferred AUC 0.5 1.0

Using network compendia to predict complete pathways With David Hess Additional 20 novel synthetic lethality predictions tested, 14 confirmed (>100x better than random) Adr1 – known carbon metabolism TF activator with many poorly characterized regulatory inputs Snf1 is a primary glucose-responsive regulator (kinase, repressor), but the mechanism of downstream regulation isn’t known Cmk2 is a calmodulin-dependent kinase with known involvement in the glucose response Glc7 is a protein phosphatase that’s known to be post-translationally regulated by Snf1 We predict specific mechanisms for these regulatory interactions and order them into a pathway based on synthetic alleviation Gph1 is a glycogen phosphorylase that mobilizes glycogen for initial processing into glucose Both Adr1 and Gph1 known to be cAMP dependent Our only metabolic interaction prediction hypothesizes coregulation by metabolite dependence Syn. let. predictions chosen from hubs in DNA topological change, isolated pairs in protein biosynthesis Confirmed Unconfirmed

Interactive aligned network viewer – http://function. princeton Graphle

Outline Functional network integration Network meta-analysis Bayes nets and LR The human genome, tissues, and disease Network meta-analysis Pathogens and MTb Quantifying progress in yeast Networks to pathways Functional mapping: networks of networks Hierarchical integration Pathway prediction Regulatory network integration Network motifs

Human Regulatory Networks Quiescence: reversible exit from the cell cycle I III IV V VI VII IX VIII II X 6,829 genes Serum re-stimulated (hrs) Serum starved (hrs) 1 5< <5 2 4 8 24 96 48 FIRE: Elemento et al. 2007 Elk-1 Sp1 NF-Y YY1 Of only five regulators found, four have generic cell cycle/proliferation targets Just five basic regulators for ~7,000 genes? These motifs only appear upstream of ~half of the genes Development Cholesterol Protein localization Cell cycle RNA processing Metabolism

COALESCE: Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction 5’ UTR 3’ UTR Upstream flank Downstream flank Nucleosome Positions Gene Expression DNA Sequence Evolutionary Conservation Create a new module Feature selection: Tests for differential expression/frequency Identify conditions where genes coexpress Identify motifs enriched in genes’ sequences Bayesian integration Select genes based on conditions and motifs Regulatory modules Coregulated genes Conditions where they’re coregulated Putative regulating motifs Subtract mean from all data

COALESCE: Selecting Coexpressed Conditions For each gene expression condition… Compare distributions of values for Genes in the module versus Genes not in the module If significantly different, include the condition Preserving data structure: If multiple conditions derive from the same dataset, can be included/excluded as a unit For example, time course vs. deletion collection Test using multivariate z-test Precalculate covariance matrix; still very efficient

COALESCE: Selecting Significant Motifs Coalesce looks for three kinds of motifs: K-mers Reverse complement pairs Probabilistic Suffix Trees (PSTs) For every possible motif… Compare distributions of values for Genes in the module versus Genes not in the module If significantly different, include the motif ACGACGT ACGACAT | ATGTCGT A T C G This can distinguish flanks from UTRs Fast! Efficient enough to search coding sequence (e.g. exons/introns)

COALESCE: Selecting Probable Genes For each gene in the genome… For each significant condition… For each significant motif… What’s the probability the gene came from the module’s distribution? What’s the probability that it came from outside the module? Prior is used to stabilize module convergence; genes already in the module are more likely to stay there next iteration. The probability of a gene being in the module given some data… Distributions of each feature in and out of the developing module are observed from the data.

COALESCE: Integrating Additional Data Types Nucleosome placement Evolutionary conservation N C G1 2.5 0.0 G2 0.6 0.5 G3 1.2 0.9 … Can be included as additional datasets and feature selected just like expression conditions/motifs. Or can be used as a prior or weight on the values of individual motifs. TCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATG

COALESCE Results: S. cerevisiae Modules ~2,200 conditions A needle 100 genes 80 conditions The haystack ~6,000 genes

COALESCE Results: S. cerevisiae Modules 54 genes, 144 conditions Conjugation 112 genes, 82 conditions Mitosis and DNA replication 266 1612 Ste12 Stb1/Swi6 33 genes, 434 conditions Budding Swi5 284

COALESCE Results: S. cerevisiae Modules 126 genes, 660 conditions Glycolysis, iron and phosphate transport, amino acid metabolism… Aft1/2 50 genes, 775 conditions Iron transport 174 175 Helix-Loop-Helix Tye7/Cbf1/Pho4 11 genes, 844 conditions Phosphate transport 176 Pho4

COALESCE Results: S. cerevisiae Modules 72 genes, 319 conditions Mitochondrial translation Puf3 822 …plus more ribosome clusters than you can shake a stick at!

COALESCE Results: Yeast TF/Target Accuracy

COALESCE Results: TF/Targets Influenced by Supporting Data Decreased by addl. data Improved by conservation Improved only by both Improved by any addl. data, mainly conservation

COALESCE Results: Yeast Clustering Accuracy ~2,200 yeast conditions Recapitulation of known biology from Gene Ontology

COALESCE Results: Yeast Clustering Accuracy C. elegans: Up in larvae, down in adults GATA in 5’ flank, miR-788 seed in 3’ UTR ~2,200 yeast conditions Recapitulation of known biology from Gene Ontology ASCL1 in 5’ flank, unch. sequences underenriched in 3’ UTR M. musculus: Up in callosal and motor neurons AAGGGGC (zf?) and enriched in 5’ flank H. sapiens: Up in normal muscle, down in diabetic

COALESCE: Coregulated Quiescence Modules Predicts regulatory modules from genomic data: Coregulated genes Conditions under which coregulation occurs Putative regulatory motifs 5 quiescence-related microarray datasets, 60 conditions Quiescence program (Coller et al. 2006) Adenoviral infection (Miller et al. 2007) let-7 response (Legesse-Miller et al. unpub.) Contact inhibition (Scarino et al. unpub.) Serum withdrawal (Legesse-Miller et al. unpub.)

COALESCE: Coregulated Quiescence Modules Up during quiescence entry, down during quiescence exit Many known related (proliferation) motifs: Pax4, Staf, NFKB1, Gfi, ESR1, Runx1, Su(H) Down with let-7 exposure let-7 motifs predicted in 3’ UTR (UACCUC) Down during quiescence entry, enriched for transport/trafficking miR-297 motif predicted in 3’ UTR (CACATAC) Down during quiescence entry, up during quiescence exit, down with adenoviral infection Specific predicted uncharacterized reverse complement motif

Network Motifs Feedback Positive auto-regulation Negative auto-regulation memory delay speed + stability Coherent feed-forward Bi-fan filter Incoherent feed-forward WGD and evolvability pulse

From Milo, et al., Science, 2002 March 1, 2010

Outline Functional network integration Network meta-analysis Bayes nets and LR The human genome, tissues, and disease Network meta-analysis Pathogens and MTb Quantifying progress in yeast Networks to pathways Functional mapping: networks of networks Hierarchical integration Pathway prediction Regulatory network integration Network motifs

1:1 Lewis Carroll Map “… And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!" "Have you used it much?" I enquired. "It has never been spread out, yet," said Mein Herr: "the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well. Sylvie and Bruno Concluded by Lewis Carroll, 1893. March 1, 2010