Presentation is loading. Please wait.

Presentation is loading. Please wait.

How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the.

Similar presentations


Presentation on theme: "How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the."— Presentation transcript:

1 How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the information that exists

2 June 1979: 2 relevant papers S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans J. Sulston & R. Horvitz (Developmental Biology 1977) Post-embryonic cell lineages of the nematode, Caenorhabditis elegans Jan 2008: >200,000 relevant papers

3 2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong

4 Scientists spend more time skimming for information than reading papers. Much information are details hidden in the full text, and are neither in the abstract nor captured in MeSH terms. We designed Textpresso to do automated skimming for researchers and database curators. The output can be used for more sophisticated Natural Language Processing. www.textpresso.org Textpresso Literature Search Engine

5 Full TextSentence Ontology PubMed Google Scholar (-) + ++ - -- MeSH Taxonomy Gene Ontology Customized Neuroscience Information Framework Textpresso Can we do better than PubMed and Google Scholar?

6 precursor upstream cascade descendants GENE Reporter Genes PATHWAY Drosophila anatomy FOXO HOXA1 pax2 PKD1 denticle wing MP2 neuron GFP, EGFP, YFP, lacZ, CFP, Green Fluorescent Protein, reporter gene, dsRed, mCherry Categories are “bags of words”

7 ARTICLE TEXT TEXTPRESSO CATEGORIES egl-38 regulates lin-3 transcription in vulF in L3 larvae gene regulationprocesslife stage anatomy Individual sentences in full text are marked up with Categories Automatically mark up the whole corpus of papers with terms of categories, and index for rapid searching gene

8 What Arabidopsis genes are expressed in the meristem based on reporter genes? 14,930 A.t. paperswww.textpresso.org/arabidopsis

9 Is a nicotinic receptor associated with Drugs of Abuse other than nicotine? www.textpresso.org/neuroscience 15,786 papers

10 The problem with clever fly names Gene nameabbreviation foragerfor ascuteas weewe Washed eyeWe Train system to recognize gene names by context use italics from PDF ~70% ~85% Michael Müller, Arun Rangarajan

11 What reporter genes have been used with Drosophila genes to study human disease? 20,099 full-text fly paperswww.textpresso.org/fly

12 Find all sentences that contain ≥2 gene names and ≥1 association or regulation word: 26,000 sentences out of 4.400 articles simple interface to “check off” sentences 100 sentences per hour Database curation: e.g. Gene-Gene Interactions output into database

13 2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong

14 Training Set Training set  4775 Positive Interactions  Genetic, Literature curation (1909)  Yeast two-hybrid screen (2933)  3296 Negative Genetic Interactions  cis doubles in genetic mapping Benchmark  5515 Positives: KEGG database  5000 Negatives: Randomly selected

15 Algorithm worm gene pair yeast orthologs total score fly orthologs fly score worm score yeast score Ortholog mapping Scoring Score integration interaction GO expression phenotype microarray GO expression phenotype microarray interaction GO localization phenotype microarray

16 p(v | pos): probabilities of the predictor having value v if two genes interact p(v | neg): probabilities of the predictor having value v if two genes do not interact likelihood ratio C. elegans expression L term usage (% of annotated genes associated with the term) Scoring and score integration n: number of predictors L i : likelihood ratio of each predictor sum the logs of the L’s

17

18

19

20 lin-3 let-23 sem-5 sos-1 let-60 lin-45 mek-2 mpk-1 lip-1 ksr-1 gap-1 v1.6 v1.4 & v1.6

21 Testing let-60 ras Interactors WT%Muv%average N210003.0 let-60(gf)01004.3 let-60(gf); tax-6(RNAi)40603.4 N2 let-60(gf) let-60(gf); tax-6(RNAi) 87 genes have score >0.9; 17 confirmed from literature Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi Assay vulva precursor cell (VPC) induction not Multivulva strong Multivulva weak Multivulva

22 let-60(gf) VPC Induction Under Various RNAi 12 hits (p<0.05) in 49 genes; 1 hit in 26 randomly selected genes Combined with literature, 29/66 (44%) predictions confirmed p< 0.01 p< 0.05 VPC induction index Score > 0.9 Score < 0.6

23 let-60 ras interactors (suppressors) tax-6calcineurin csn-5COP-9 signalosome qua-1hedgehog-related protein C01G8.9SWI/SNF-related (eyelid) C05D10.3ABC transporter (white) pfa-3profilin nhr-4transcription factor

24 C. elegans Interactions Input 4,726 known interactions among 2,713 genes Predict additional 18,863 for total of 23,589 interactions among 4,408 genes

25 for Drosophila

26

27 D. melanogaster interactions Input 4,180 known interactions among 1,262 genes, Predict 13,126 for 17,306 interactions among 6,044 genes

28 Automated, Quantitative Phenotyping Chris Cronin: movement analysis BMC-Genetics 2005 Chris Cronin: movement analysis BMC-Genetics 2005 generative graphics locomotion plate demographics (Weiwei Zhong) morphology sexual behavior E. Fontaine, A. Whittaker, Joel Burdick

29 2 1 Predicting Gene Interactions from information available in public databases Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Weiwei Zhong


Download ppt "How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the."

Similar presentations


Ads by Google