Download presentation
Presentation is loading. Please wait.
1
gene-to-gene relationships & networks
GenoMesh: A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships & networks Yongqun “Oliver” He Unit for Laboratory Animal Medicine Department of Microbiology and Immunology Center for Computational Medicine and Bioinformatics Comprehensive Cancer Center University of Michigan Medical School Ann Arbor, MI 48109
2
Outline Background Development & evaluation of GenoMesh algorithm
GenoMesh web system and features Usages of GenoMesh Reference: Zuoshuang Xiang, Tingting Qin, Zhaohui S. Qin, and Yongqun He. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks. BMC Systems Biology. 2013, 7(Suppl 3):S9. (InCoB 2013)
3
MEDLINE/PubMed and MeSH
MEDLINE: citations and abstracts from biomedical literature PubMed: free access to MEDLINE new articles daily Currently > 20 mill. Articles MeSH: Medical Subject Headings Controlled vocabulary for indexing articles for PubMed. 16 top-level Hierarchies 2013: 26,853 MeSH descriptors Growth of Medline Example of MeSH tree
4
Gene-gene Interaction Literature Mining
Two general strategies: Gene co-occurrence Two genes are related if in the same article Particularly in titles, abstracts, or sentences Example program: PubGene Limitation: unable to predict unknown relations Infer gene relatedness based on common linkage to keywords (e.g. GO, MeSH) Advantage: predict new gene-gene interactions. Example programs: ARROWSMITH, MeSHmap No genome-wide MeSH-based approach reported before Different methods, often not optimized GenoMesh: genome-wide MeSH-based E. coli (well studied) and Brucella (less studied)
5
MeSH-based Prediction of Gene-gene Interaction
Here is an example 2 E. coli genes hfq and sodB hfq – 137 papers sodB – 97 papers Each paper associated with a list of MeSH terms Some MeSH terms are shared by two groups hfq and sodB predicted to be associated Figure generated by GenoMesh Red line: co-occurrence grey line: no co-occurrence
6
GenoMesh algorithm Pipeline Gene-gene dissimilarity matrix
MeSH term weighted each term is highly or rarely used TF-IDF: term frequency-inverse document frequency E. coli, Brucella Preprocessing Gene-article matrix Gene-MeSH matrix Six scores tested to measure gene-gene dissimilarity Cosine coefficient Euclidean distance …. Gene-gene dissimilarity matrix Clustering, network
7
Receiver operating characteristic
What’s the best combination: MeSH term weighting and dissimilarity score calculation? Gold standard data for evaluation: RegulonDB – E. coli gene regulation database The winners are: Square root weighting Cosine coefficient similarity calculation Receiver operating characteristic (ROC) curve analysis
8
Normal Distribution observed using dissimilarity scores of random networks
Bill Gates pledged $10 billion for vaccines for the “decade of the vaccine.” The distribution of the gene-gene dissimilarities from randomly selected groups of E. coli genes approximates a normal distribution with the peak in the range of
9
GenoMesh able to predict implicit gene-gene interactions
Top E. coli 10 gene pairs predicated using literature data before 2004 and verified by literature data afterwards All proven valid
10
GenoMesh clusters genes of E
GenoMesh clusters genes of E. coli flagella biogenesis & Brucella Type VI secretion system A: 32 E. coli flagellar genes clustered Figure 3. Clusters of E. coli genes involving E. coli flagella biogenesis. (A) Thirty-two E. coli flagellar genes were clustered together; (B) Six E. coli flagellar genes were clustered together. The neighbour branch of the six-gene branch includes five E. coli genes. A cluster of Brucella genes that includes 8 virB genes B: 6 E. coli flagellar genes clustered 8 Brucella virB genes clustered
11
GenoMesh analysis of 31 E. coli pathways containing at least 10 genes from EcoCyc
All have significant Z-value and p-value So GenoMesh can be used to study gene interaction networks
12
GenoMesh Web Site http://genomesh.hegroup.org
Bill Gates pledged $10 billion for vaccines for the “decade of the vaccine.”
13
Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website
Bill Gates pledged $10 billion for vaccines for the “decade of the vaccine.” This is a GenoMesh MeSHBrowse example
14
homologous E. coli results
GenoMesh predicts new Brucella gene-gene interactions by comparing with homologous E. coli results Homologous E. coli and Brucella genes & associated genes
15
Summary The GenoMesh genome-wide MeSH-based literature mining algorithm and web system is generated and evaluated GenoMesh: predicts implicit gene-gene interactions clusters genes based on associations generates gene interaction networks Discussion More pathogens will be included Also applicable for human and other eukaryotes Host-pathogen interactions
16
Acknowledgements He Lab at the University of Michigan (UM) Ann Arbor, MI, USA Zuoshuang Xiang Tingting Qin Emory School of Medicine, Atlanta, GA, USA Zhaohui Steve Qin Funding: NIH-NIAID Grant 1R01AI081062 A pilot grant at the UM Center for Computational Medicine and Biology (CCMB)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.