Download presentation
Presentation is loading. Please wait.
Published byOphelia Jackson Modified over 9 years ago
1
How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the information that exists
2
June 1979: 2 relevant papers S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans J. Sulston & R. Horvitz (Developmental Biology 1977) Post-embryonic cell lineages of the nematode, Caenorhabditis elegans Jan 2008: >200,000 relevant papers
3
2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong
4
Scientists spend more time skimming for information than reading papers. Much information are details hidden in the full text, and are neither in the abstract nor captured in MeSH terms. We designed Textpresso to do automated skimming for researchers and database curators. The output can be used for more sophisticated Natural Language Processing. www.textpresso.org Textpresso Literature Search Engine
5
Full TextSentence Ontology PubMed Google Scholar (-) + ++ - -- MeSH Taxonomy Gene Ontology Customized Neuroscience Information Framework Textpresso Can we do better than PubMed and Google Scholar?
6
precursor upstream cascade descendants GENE Reporter Genes PATHWAY Drosophila anatomy FOXO HOXA1 pax2 PKD1 denticle wing MP2 neuron GFP, EGFP, YFP, lacZ, CFP, Green Fluorescent Protein, reporter gene, dsRed, mCherry Categories are “bags of words”
7
ARTICLE TEXT TEXTPRESSO CATEGORIES egl-38 regulates lin-3 transcription in vulF in L3 larvae gene regulationprocesslife stage anatomy Individual sentences in full text are marked up with Categories Automatically mark up the whole corpus of papers with terms of categories, and index for rapid searching gene
8
What Arabidopsis genes are expressed in the meristem based on reporter genes? 14,930 A.t. paperswww.textpresso.org/arabidopsis
9
Is a nicotinic receptor associated with Drugs of Abuse other than nicotine? www.textpresso.org/neuroscience 15,786 papers
10
The problem with clever fly names Gene nameabbreviation foragerfor ascuteas weewe Washed eyeWe Train system to recognize gene names by context use italics from PDF ~70% ~85% Michael Müller, Arun Rangarajan
11
What reporter genes have been used with Drosophila genes to study human disease? 20,099 full-text fly paperswww.textpresso.org/fly
12
Find all sentences that contain ≥2 gene names and ≥1 association or regulation word: 26,000 sentences out of 4.400 articles simple interface to “check off” sentences 100 sentences per hour Database curation: e.g. Gene-Gene Interactions output into database
13
2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong
14
Training Set Training set 4775 Positive Interactions Genetic, Literature curation (1909) Yeast two-hybrid screen (2933) 3296 Negative Genetic Interactions cis doubles in genetic mapping Benchmark 5515 Positives: KEGG database 5000 Negatives: Randomly selected
15
Algorithm worm gene pair yeast orthologs total score fly orthologs fly score worm score yeast score Ortholog mapping Scoring Score integration interaction GO expression phenotype microarray GO expression phenotype microarray interaction GO localization phenotype microarray
16
p(v | pos): probabilities of the predictor having value v if two genes interact p(v | neg): probabilities of the predictor having value v if two genes do not interact likelihood ratio C. elegans expression L term usage (% of annotated genes associated with the term) Scoring and score integration n: number of predictors L i : likelihood ratio of each predictor sum the logs of the L’s
20
lin-3 let-23 sem-5 sos-1 let-60 lin-45 mek-2 mpk-1 lip-1 ksr-1 gap-1 v1.6 v1.4 & v1.6
21
Testing let-60 ras Interactors WT%Muv%average N210003.0 let-60(gf)01004.3 let-60(gf); tax-6(RNAi)40603.4 N2 let-60(gf) let-60(gf); tax-6(RNAi) 87 genes have score >0.9; 17 confirmed from literature Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi Assay vulva precursor cell (VPC) induction not Multivulva strong Multivulva weak Multivulva
22
let-60(gf) VPC Induction Under Various RNAi 12 hits (p<0.05) in 49 genes; 1 hit in 26 randomly selected genes Combined with literature, 29/66 (44%) predictions confirmed p< 0.01 p< 0.05 VPC induction index Score > 0.9 Score < 0.6
23
let-60 ras interactors (suppressors) tax-6calcineurin csn-5COP-9 signalosome qua-1hedgehog-related protein C01G8.9SWI/SNF-related (eyelid) C05D10.3ABC transporter (white) pfa-3profilin nhr-4transcription factor
24
C. elegans Interactions Input 4,726 known interactions among 2,713 genes Predict additional 18,863 for total of 23,589 interactions among 4,408 genes
25
for Drosophila
27
D. melanogaster interactions Input 4,180 known interactions among 1,262 genes, Predict 13,126 for 17,306 interactions among 6,044 genes
28
Automated, Quantitative Phenotyping Chris Cronin: movement analysis BMC-Genetics 2005 Chris Cronin: movement analysis BMC-Genetics 2005 generative graphics locomotion plate demographics (Weiwei Zhong) morphology sexual behavior E. Fontaine, A. Whittaker, Joel Burdick
29
2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.