UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes.

UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes Alison Meynert – December 15, 2004 Supervisors: Francis Ouellette (UBiC) and Anne Condon (UBC Computer Science)

http://bioinformatics.ubc.ca CREBBP NCOA6 EP400 NCOR2 NCOA3 EP300 MECP2 Huntingtin TBP

http://bioinformatics.ubc.ca What kind of evidence links genes? Medline co-citation Entry in a protein-protein interaction database Part of the same pathway (i.e. KEGG) Similar co-expression patterns Shared overrepresentation of Gene Ontology terms

http://bioinformatics.ubc.ca Which genes are we interested in? GeMS: Genomic Mutational Signatures Project –Identification of novel disease-related genes via signature sequence mutations –Focus: genes encoding poly-glutamine tracts

http://bioinformatics.ubc.ca Medline XML citation example

http://bioinformatics.ubc.ca How do we identify citations of interest? 1.For genes of interest, build a list of gene names and synonyms, including accession numbers, from NCBI LocusLink/Gene 2.Parse Medline XML files and identify sections of interest (i.e. abstract, gene symbols) 3.Break text into words where required 4.Look for matches to gene synonyms 5.Output the PMID, matched synonym, primary gene symbol, and name of XML element 6.Results are loaded into a MySQL database

http://bioinformatics.ubc.ca What about false positive matches? Many gene names/symbols are ambiguous –Huntingtin: HD = Hodgkin Disease –Androgen Receptor: KD = disassociation constant Main method: examine MeSH headings associated with citations where an ambiguous gene synonym has been matched –Some headings are immediately indicative of a false positive match

http://bioinformatics.ubc.ca Pruning the MeSH ontology tree

http://bioinformatics.ubc.ca Current results Match count Not false positives XML location of match Method of analysis 113,70693,031AbstractAutomated, filter by associated MeSH headings 12,4757,818Title 5,5374,719Chemical 484401Gene symbol 12,2676,355Mesh headingManually verified 8814Keyword 365 AccessionTrusted as true positives 144,922112,703All locations

http://bioinformatics.ubc.ca Next steps/further work False positive filtering: –Finish filtering based on MeSH heading association with ambiguous synonym matches –Add filtering based on phrases that are not associated with a particular MeSH heading –If required, additional natural language processing Run synonym matching code on all genes in NCBI LocusLink/Gene on our citation dataset Provide weighting for different evidence sources Visualize and analyze co-citation network

http://bioinformatics.ubc.ca CREBBP NCOA6 EP400 NCOR2 NCOA3 EP300 MECP2 Huntingtin TBP

http://bioinformatics.ubc.ca Acknowledgements Supervisors -Anne Condon -Francis Ouellette Labs -UBiC/CMMT Bioinformatics Lab -UBC Beta Lab GeMS Project -Stefanie Butland -Yong Huang -Soo Sen Lee -George Yang -Blair Leavitt -Carri-Lyn Mead Alison’s research is supported by NSERC, Simon Fraser University, and the CIHR/Michael Smith Foundation Bioinformatics Training Program for Health Research.

UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes.

Similar presentations

Presentation on theme: "UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes.

Similar presentations

Presentation on theme: "UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes."— Presentation transcript:

Similar presentations

About project

Feedback