Presentation is loading. Please wait.

Presentation is loading. Please wait.

UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes.

Similar presentations


Presentation on theme: "UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes."— Presentation transcript:

1 UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes Alison Meynert – December 15, 2004 Supervisors: Francis Ouellette (UBiC) and Anne Condon (UBC Computer Science)

2 http://bioinformatics.ubc.ca CREBBP NCOA6 EP400 NCOR2 NCOA3 EP300 MECP2 Huntingtin TBP

3 http://bioinformatics.ubc.ca What kind of evidence links genes? Medline co-citation Entry in a protein-protein interaction database Part of the same pathway (i.e. KEGG) Similar co-expression patterns Shared overrepresentation of Gene Ontology terms

4 http://bioinformatics.ubc.ca Which genes are we interested in? GeMS: Genomic Mutational Signatures Project –Identification of novel disease-related genes via signature sequence mutations –Focus: genes encoding poly-glutamine tracts

5 http://bioinformatics.ubc.ca Medline XML citation example

6 http://bioinformatics.ubc.ca How do we identify citations of interest? 1.For genes of interest, build a list of gene names and synonyms, including accession numbers, from NCBI LocusLink/Gene 2.Parse Medline XML files and identify sections of interest (i.e. abstract, gene symbols) 3.Break text into words where required 4.Look for matches to gene synonyms 5.Output the PMID, matched synonym, primary gene symbol, and name of XML element 6.Results are loaded into a MySQL database

7 http://bioinformatics.ubc.ca What about false positive matches? Many gene names/symbols are ambiguous –Huntingtin: HD = Hodgkin Disease –Androgen Receptor: KD = disassociation constant Main method: examine MeSH headings associated with citations where an ambiguous gene synonym has been matched –Some headings are immediately indicative of a false positive match

8 http://bioinformatics.ubc.ca Pruning the MeSH ontology tree

9 http://bioinformatics.ubc.ca Current results Match count Not false positives XML location of match Method of analysis 113,70693,031AbstractAutomated, filter by associated MeSH headings 12,4757,818Title 5,5374,719Chemical 484401Gene symbol 12,2676,355Mesh headingManually verified 8814Keyword 365 AccessionTrusted as true positives 144,922112,703All locations

10 http://bioinformatics.ubc.ca Next steps/further work False positive filtering: –Finish filtering based on MeSH heading association with ambiguous synonym matches –Add filtering based on phrases that are not associated with a particular MeSH heading –If required, additional natural language processing Run synonym matching code on all genes in NCBI LocusLink/Gene on our citation dataset Provide weighting for different evidence sources Visualize and analyze co-citation network

11 http://bioinformatics.ubc.ca CREBBP NCOA6 EP400 NCOR2 NCOA3 EP300 MECP2 Huntingtin TBP

12 http://bioinformatics.ubc.ca Acknowledgements Supervisors -Anne Condon -Francis Ouellette Labs -UBiC/CMMT Bioinformatics Lab -UBC Beta Lab GeMS Project -Stefanie Butland -Yong Huang -Soo Sen Lee -George Yang -Blair Leavitt -Carri-Lyn Mead Alison’s research is supported by NSERC, Simon Fraser University, and the CIHR/Michael Smith Foundation Bioinformatics Training Program for Health Research.


Download ppt "UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes."

Similar presentations


Ads by Google