How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the.

Slides:



Advertisements
Similar presentations
1 * egg: generate the system * larva: eat and grow
Advertisements

Text Mining Applications for Literature Curation Kimberly Van Auken WormBase Consortium Textpresso Gene Ontology Consortium.
An Information Retrieval and Extraction System for C. elegans Literature.
Pathways analysis Iowa State Workshop 11 June 2009.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Chapter 13 Genetic Control of Development Jones and Bartlett Publishers © 2005.
Biology Mathematics Engineering Optics Physics Robotics Informatics.
The STRING database Michael Kuhn EMBL Heidelberg.
Textpresso Application and Extensibility Eimear Kenny GMOD Meeting, April 2004.
SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation.
© 2003 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Critical Assessment of Information Extraction Systems in Biology (BioCreAtIvE) Marc Colosimo Lynette.
How we assist knowledge collection Serving the monks Chris Evelo Dept of Bioinformatics – BiGCaT Maastricht University.
GMOD Meeting, May 2005 Patent Pending, Caltech Proprietary Textpresso Search engine for Biomedical Literature ~Eimear Kenny~
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
How does Ras act in our body, in vivo ? Why would constitutively active Ras lead to cancer ? From cell culture to model organisms.
Genetic interaction and interpretation of genetic interactions - Biosynthetic pathway/ genes acting in different steps. -Order genes in a genetic pathway.
Genetic models Self-organization How do genetic approaches help to understand development? How can equivalent cells organize themselves into a pattern?
Genetic models Self-organization How do genetic approaches help to understand development? How can equivalent cells organize themselves into a pattern?
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells Boutros et al.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
How does Ras act in our body, in vivo ? Why would constitutively active Ras lead to cancer ? From cell culture to model organisms.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Cis-Regulatory/ Text Mining Interface Discussion.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Comparative Genomics of the Eukaryotes
A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans Insuk Lee1,4, Ben Lehner2,3,4, Catriona Crombie2,
1. ~ 1000 cells, small, easy to use for genetics 2. Entire lineage and nerve system mapped. Caenhorhabditis elegans 3. 3 day life cycle, easy to use for.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Chapters 19 - Genetic Analysis of Development: Development Development refers to interaction of then genome with the cytoplasm and external environment.
Networks and Interactions Boo Virk v1.0.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Anotation: Gene of which little is known What follows is a simulation of an orf page in the proposed graphical interface. The interface does not yet exist.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Flexible Text Mining using Interactive Information Extraction David Milward
Creating Metabolic Network Models using Text Mining and Expert Knowledge J.A. Dickerson, D. Berleant, Z. Cox, W. Qi, and E. Wurtele Iowa State University.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
1 Genetic and Cellular Mechanisms of Pattern Formation VII.Neighboring cells instruct other cells to form particular structures: cell signaling and induction.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Copyright © 2005 Brooks/Cole — Thomson Learning Biology, Seventh Edition Solomon Berg Martin Chapter 16 Genes and Development.
Expanding GO annotations with text classification Nicko Goncharoff Reel Two, Inc.
DATA MANAGEMENT AND CURATION AT TAIR
Human Drosophila C. elegans ~ 24,000 Genes ~ 13,000 Genes ~ 19,000 Genes Mouse ~ 24,000 Genes.
Copyright OpenHelix. No use or reproduction without express written consent1.
Oct.27, 2003 Curator Meeting, Oct Gene Expression Curation ~WormBase, 2003 ~
Bioinformatics and Computational Biology
Introduction to C. elegans and RNA interference Why study model organisms?
A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,
Annotating Gene List From Literature Xin He Department of Computer Science UIUC.
Genetic Screen and Analysis of Regulators of Sexually Dimorphic Motor Neuron Development Jack Timmons, Esther Liu, Zachary Palchick, Sonya Krishnan, and.
Chapters 19 - Genetic Analysis of Development:
Phenotype And Trait Ontology (PATO) and plant phenotypes
Today’s Goals Describe the advantages of C. elegans as a model organism Discuss the life cycle of the nematode Safely and effectively culture a population.
Lab Interactions and Ontologies LAB CBW Bioinformatics Workshop February 23 th 2006, Toronto Christopher Hogue Blueprint Initiative.
Networks and Interactions
University of California, San Diego
Annotating with GO: an overview
Genomics research paper presentation
Why would constitutively active Ras lead to cancer ?
C. elegans Class B Synthetic Multivulva Genes Act in G1 Regulation
Why would constitutively active Ras lead to cancer ?
2. 2 Life as a worm-- the nematode C. elegans.
2. 2 Life as a worm-- the nematode C. elegans.
The Lateral Signal for LIN-12/Notch in C
Presentation transcript:

How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the information that exists

June 1979: 2 relevant papers S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans J. Sulston & R. Horvitz (Developmental Biology 1977) Post-embryonic cell lineages of the nematode, Caenorhabditis elegans Jan 2008: >200,000 relevant papers

2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong

Scientists spend more time skimming for information than reading papers. Much information are details hidden in the full text, and are neither in the abstract nor captured in MeSH terms. We designed Textpresso to do automated skimming for researchers and database curators. The output can be used for more sophisticated Natural Language Processing. Textpresso Literature Search Engine

Full TextSentence Ontology PubMed Google Scholar (-) MeSH Taxonomy Gene Ontology Customized Neuroscience Information Framework Textpresso Can we do better than PubMed and Google Scholar?

precursor upstream cascade descendants GENE Reporter Genes PATHWAY Drosophila anatomy FOXO HOXA1 pax2 PKD1 denticle wing MP2 neuron GFP, EGFP, YFP, lacZ, CFP, Green Fluorescent Protein, reporter gene, dsRed, mCherry Categories are “bags of words”

ARTICLE TEXT TEXTPRESSO CATEGORIES egl-38 regulates lin-3 transcription in vulF in L3 larvae gene regulationprocesslife stage anatomy Individual sentences in full text are marked up with Categories Automatically mark up the whole corpus of papers with terms of categories, and index for rapid searching gene

What Arabidopsis genes are expressed in the meristem based on reporter genes? 14,930 A.t. paperswww.textpresso.org/arabidopsis

Is a nicotinic receptor associated with Drugs of Abuse other than nicotine? 15,786 papers

The problem with clever fly names Gene nameabbreviation foragerfor ascuteas weewe Washed eyeWe Train system to recognize gene names by context use italics from PDF ~70% ~85% Michael Müller, Arun Rangarajan

What reporter genes have been used with Drosophila genes to study human disease? 20,099 full-text fly paperswww.textpresso.org/fly

Find all sentences that contain ≥2 gene names and ≥1 association or regulation word: 26,000 sentences out of articles simple interface to “check off” sentences 100 sentences per hour Database curation: e.g. Gene-Gene Interactions output into database

2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong

Training Set Training set  4775 Positive Interactions  Genetic, Literature curation (1909)  Yeast two-hybrid screen (2933)  3296 Negative Genetic Interactions  cis doubles in genetic mapping Benchmark  5515 Positives: KEGG database  5000 Negatives: Randomly selected

Algorithm worm gene pair yeast orthologs total score fly orthologs fly score worm score yeast score Ortholog mapping Scoring Score integration interaction GO expression phenotype microarray GO expression phenotype microarray interaction GO localization phenotype microarray

p(v | pos): probabilities of the predictor having value v if two genes interact p(v | neg): probabilities of the predictor having value v if two genes do not interact likelihood ratio C. elegans expression L term usage (% of annotated genes associated with the term) Scoring and score integration n: number of predictors L i : likelihood ratio of each predictor sum the logs of the L’s

lin-3 let-23 sem-5 sos-1 let-60 lin-45 mek-2 mpk-1 lip-1 ksr-1 gap-1 v1.6 v1.4 & v1.6

Testing let-60 ras Interactors WT%Muv%average N let-60(gf) let-60(gf); tax-6(RNAi) N2 let-60(gf) let-60(gf); tax-6(RNAi) 87 genes have score >0.9; 17 confirmed from literature Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi Assay vulva precursor cell (VPC) induction not Multivulva strong Multivulva weak Multivulva

let-60(gf) VPC Induction Under Various RNAi 12 hits (p<0.05) in 49 genes; 1 hit in 26 randomly selected genes Combined with literature, 29/66 (44%) predictions confirmed p< 0.01 p< 0.05 VPC induction index Score > 0.9 Score < 0.6

let-60 ras interactors (suppressors) tax-6calcineurin csn-5COP-9 signalosome qua-1hedgehog-related protein C01G8.9SWI/SNF-related (eyelid) C05D10.3ABC transporter (white) pfa-3profilin nhr-4transcription factor

C. elegans Interactions Input 4,726 known interactions among 2,713 genes Predict additional 18,863 for total of 23,589 interactions among 4,408 genes

for Drosophila

D. melanogaster interactions Input 4,180 known interactions among 1,262 genes, Predict 13,126 for 17,306 interactions among 6,044 genes

Automated, Quantitative Phenotyping Chris Cronin: movement analysis BMC-Genetics 2005 Chris Cronin: movement analysis BMC-Genetics 2005 generative graphics locomotion plate demographics (Weiwei Zhong) morphology sexual behavior E. Fontaine, A. Whittaker, Joel Burdick

2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong