A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans Insuk Lee1,4, Ben Lehner2,3,4, Catriona Crombie2,

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Zhen Shi June 2, 2010 Journal Club. Introduction Most disease-causing mutations are thought to confer radical changes to proteins (Wang and Moult, 2001;
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Synthetic lethal analysis of Caenorhabditis elegans posterior embryonic patterning genes identifies conserved genetic interactions L Ryan Baugh, Joanne.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Gene prioritization through genomic data fusion Aerts et. al., Nature Biotechnology, 24, , 2006 November 21st, 2008 ENDEAVOUR
Identification of C. elegans sensory ray genes using whole- genome expression profiling Douglas S. Portman and Scott W. Emmons.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Microarrays Dr Peter Smooker,
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Experimental and computational assessment of conditionally essential genes in E. coli Chao WANG, Oct
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Protein Interactions and Disease Audry Kang 7/15/2013.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Gene Set Enrichment Analysis (GSEA)
Networks and Interactions Boo Virk v1.0.
Network Biology Presentation by: Ansuman sahoo 10th semester
How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Proteome and interactome Bioinformatics.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Supplementary Figure S1 eQTL prior model modified from previous approaches to Bayesian gene regulatory network modeling. Detailed description is provided.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Protein and RNA Families
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Bioinformatics and Computational Biology
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Determinants of Mutation Outcome Predicting mutation outcome from early stochastic variation in genetic interaction partners.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
The TDR Targets Database Prioritizing potential drug targets in complete genomes.
A Multi-stage Approach to Detect Gene-gene Interactions Associated with Multiple Correlated Phenotypes Zhou Xiangdong,Keith Chan, Danhong Zhu Department.
Networks and Interactions
Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
Presented by Meeyoung Park
SEG5010 Presentation Zhou Lanjun.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Network-Based Coverage of Mutational Profiles Reveals Cancer Genes
Presentation transcript:

A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans Insuk Lee1,4, Ben Lehner2,3,4, Catriona Crombie2, Wendy Wong2, Andrew G Fraser2 & Edward M Marcotte1

Abstract The fundamental aim of genetics is to understand how an organism's phenotype is determined by its genotype, and implicit in this is predicting how changes in DNA sequence alter phenotypes. A single network covering all the genes of an organism might guide such predictions down to the level of individual cells and tissues. To validate this approach, we computationally generated a network covering most C. elegans genes and tested its predictive capacity. Connectivity within this network predicts essentiality, identifying this relationship as an evolutionarily conserved biological principle. Critically, the network makes tissue-specific predictions—we accurately identify genes for most systematically assayed loss-of- function phenotypes, which span diverse cellular and developmental processes. Using the network, we identify 16 genes whose inactivation suppresses defects in the retinoblastoma tumor suppressor pathway, and we successfully predict that the dystrophin complex modulates EGF signaling. We conclude that an analogous network for human genes might be similarly predictive and thus facilitate identification of disease genes and rational therapeutic targets.

The Niche The central goal of genetics “In coming decades, the number of individual human genomes sequenced will grow enormously, and the key emerging problem will be to correlate identified genomic variation to phenotypic variation in health and disease.” “However, our present ability to predict the outcome of an inherited change in the activity of any single human gene is negligible.” ?? “A key goal of this network is that it should predict the phenotypic consequences of perturbing genes.”

Current flow chart Gene function annotation Integrated network functional genomics, proteomics and comparative genomics Works in unicellular organism (centrality and lethality) How about metazoan? Tissue, pleiotropy of gene Prediction of consequence of gene perturbation

Gene annotation integrated network - linkages between genes indicate their likelihood of being involved in the same biological processes (A)KEGG pathway annotations or (B) protein subcellular locations A Probabilistic Functional Network of Yeast Genes. Science 26 November 2004

Constructing a proteome-scale gene network for C. elegans DNA microarray measurements of the expression of C. elegans mRNAs (Supplementary Table 1 online), assays of physical and/or genetic interactions (Supplementary Table 2 online) among C. elegans, fly, human, and yeast proteins, literature-mined C. elegans gene associations, functional associations of yeast orthologs (we term such conserved functional linkages 'associalogs'), estimates of the coinheritance of C. elegans genes across bacterial genomes, and the operon structures of bacterial and/or archaeal homologs of C. elegans genes.

Dilemma in using these datasets a naïve union of these datasets generates a large but error-prone network with poor predictive capacity. although finding overlaps between multiple datasets identifies high-confidence linkages, it generates a low-coverage network that excludes much high-quality data.

Amazing success

How does it happen The network extends considerably beyond previously described associations: 83,946 links in the core network (74%) neither derive from literature-mined relationships nor overlap with known Gene Ontology pathway relationships.

Datasets expression data from the Stanford Microarray Database - selected 6 sets encompassing 220 DNA microarray experiments and 635 additional array experiments previously published - significant correlation between the extent of mRNA co- expression and functional associations between genes genome-wide yeast two-hybrid interactions between C. elegans proteins and the associated literature-derived protein-protein interactions from the Worm Interactome database Genetic interactions from WormBase (derived from >1,000 primary publications) Human protein interactions were collected from existing literature- derived databases, as well as large-scale yeast two-hybrid analysis, then transferred by orthology to C. elegans via orthologs defined using INPARANOID fly yeast two-hybrid interactions yeast functional gene network Interactions were assigned confidence scores before integration

Datesets cont’d comparative genomics linkages from the analysis of 133 genomes (117 bacteria and 16 archaea) using the methods of phylogenetic profiling and gene neighbors. linkages from co-citation of C. elegans gene names in ~7000 Medline abstracts that included the word “elegans”.

Integration of datasets Estimation of the extent that each dataset links genes known to share biological functions as determined from Gene Ontology (GO) annotations. Evidence codes: CC, co-citation; CX, co-expression; DM, fly interolog; GN, gene neighbor; GT, genetic interaction; HS, human interolog; PG, phylogenetic profiles; SC, yeast associalog; WI, worm protein interactome version 5.

Common scoring scheme the log likelihood score (LLS) scheme: the functional coupling between each pair of genes, defined as the likelihood of participating in the same pathway. where P(L|E) and P(¬L|E) are the frequencies of linkages (L) observed in the given experiment (E) between annotated genes operating in the same pathway and in different pathways, respectively, while P(L) and P(¬L) represent the prior expectations (i.e., the total frequency of linkages between all annotated C. elegans genes operating in the same pathway and operating in different pathways, respectively). the relative merits of each dataset is used to prior to integration weighted according to their scores

two primary reference pathway sets to evaluate and integrate datasets (training) The C. elegans Gene Ontology (GO) annotation from WormBase ~ 786,056 gene pairs sharing annotation ▫gold-standard positive functional linkages, we selected genes sharing GO "biological process" annotation terms from levels 2 through 10 of the GO hierarchy. (with 5 exclusions) ▫gold-standard negative linkages, we selected pairs of genes from this set that did not share annotation terms KEGG database annotations ~ 9,406 5,069 pairs shared between the two reference sets COG An additional test set (KEGG minus GO) was created by removing all GO pairs from the KEGG set. Other test sets. Testing sets

Reference and benchmark sets The Gene Ontology (GO) annotation from WormBase. Levels 2 ~ 10 from “biological process” hierarchy are used. (exclude top 5 high coverage terms) ,056 gene pairs KEGG database, provides metabolic and regulatory pathway annotations (exclude top 3 most abundant pathway terms). - 9,406 pairs (5,069 common with GO) COG 12 categories

LLS scheme cont’d bootstrapping for all LLS evaluations (claimed to be superior to cross validation especially for small set.) Each linkage has a probability of 1-1/n of not being sampled, resulting in ~63.2% of the data in the training set and ~36.8% in the test set (7). The overall LLS is the weighted average of results on the two sets, equal to 0.632*LLStest + ( )*LLStrain, calculated as the average over 10 repeated sampling trials.

LLS scheme cont’d – regression is used for continuous scores in some datesets Only positive correlation is used Bacteria profile performs best

the weighted sum (WS) of individual scores – integrating all data sets T, representing a LLS threshold for all data sets being integrated. D, a parameter for the overall degree of independence among the data sets. Determined by the (linear) decay rate of the weight for secondary evidence. It ranges from 1 to +∞ and captures the relative independence of the data sets, low values of D indicating more independence among data sets and higher values indicating less. i is the order index of the data sets after rank-ordering the n remaining LLS scores descending in magnitude. D, T are chosen by systematically testing values of D and T in order to maximize overall performance (area under a plot of LLS versus gene pairs incorporated in the network) on the Gene Ontology benchmark, selecting a single value of D and of T for all gene pairs being integrated using these datasets.

integration composite

The final network has a total of 384,700 linkages between 16,113 C. elegans proteins, covering ~82 % of C. elegans proteome; all gene pairs have a higher likelihood of belonging to the same pathway than random chance. To define a model with high confidence and reasonable proteome coverage, we applied a likelihood threshold, keeping only gene pairs linked with a likelihood of being in the same pathway of at least 1.5 fold better than random chance. Using this threshold, we defined the core network, containing 113,829 linkages for 12,357 worm proteins (~63% of the worm proteome). Final network

Basics

Linkages in Wormnet tend to connect genes expressed in the same tissue.

Basic evaluations Core all

first tested whether the network could predict gene essentiality A. Correlating to a whole genome RNAi study. (embryonic lethal, sterile, larval lethal, sterile progeny, adult lethal are considered essential.) B. Excluding yeast-derived linkage (which reported this correlation by barabasi) After removing all yeast orthologs Barabasi Pearson r = 0.75 C. the subset of 6,924 genes with mouse orthologs. RNAi from mouse embryonic + perinatal lethality is essential.

Essentiality of genes appears to ‘diffuse’ across the network.  (Left) Based on RNAi phenotype, we categorized genes into two classes, embryonic lethal (emb) and nonembryonic lethal, and plot the % of genes that are emb at 1, 2, and 3 hops from each emb gene (0 hops corresponds to 100%). We find that the probability of being embryonic lethal decays with increasing distance from other embryonic lethal genes in the network.  (Right) For the cases where essential genes are linked, we also examined the penetrance of the embryonic lethal RNAi phenotype as it diffuses through the network. We measured the mean % embryonic lethality for lethal genes linked by 1, 2, and 3 hops to a gene with 100% penetrance. The mean penetrance of lethality appears to decay with increasing distance from the 100% penetrant embryonic lethal genes.

Whether A single network can predict diverse phenotypes 'guilt by association' approach “If genes that share any given loss-of-function phenotype associated tightly together, this would indicate that Wormnet has the capability to identify additional genes sharing loss-of- function phenotypes with previously studied genes.” And reversely, if tightly linked, weather the partner has similar phenotype.

Among the 43 tested phenotypes, we found (A) 29 strongly predictable phenotypes, (B) 10 moderately or weakly predictable phenotypes, and (C) 4 predictable at no better than random levels. leave-one-out prediction method: For a given phenotype, genes conferring a specific RNAi phenotype as the “seed” set. Each gene in the worm proteome was rank-ordered by the sum of its linkage log likelihood scores to this seed set (omitting each seed gene). FN and FP are calculated as a function of RANK, ROC curve is used to evaluate the performance.

Genes sharing loss-of-function phenotypes are tightly linked in the network These results demonstrate that a single gene network can predict effects of gene perturbation for diverse aspects of animal biology, and that it is not essential to construct a specialized subnetwork for each particular process. Look at the reverse sentence: genes tightly linked in the network are sharing loss-of-function phenotypes.

whether we could identify previously unknown genes that modulate pathways relevant to human disease and then experimentally validate these predictions ectopic vulva synMuv A synMuv B Lin-15 A;B strain

The dystrophin complex modulates EGF-Ras-MAPK signaling (b) Inactivation of DAPC components by RNAi can suppress the induction of ectopic vulvae by a gain-of-function Ras/let-60 gene. (c) Mutations in the dys-1 gene enhance the larval lethal phenotype of let- 60(RNAi) Known: function of EGF signaling is as an inductive signal during vulval development. The genetic interaction suggests that the DAPC positively regulates EGF signaling during vulva induction.

usage the network predicts diverse cellular, developmental and physiological processes with great specificity. Newly identifies (annotates) relations of gene - pathway by clustered nodes. Identifies interaction between pathways.

Their future development Adding more data for the rest ~20% genes Adding Transcriptional analyses of individual tissues and mutant strains – tissue specificity. “Seed” set for particular disease to identify candidate genes. Human network DONE.