Download presentation
Presentation is loading. Please wait.
1
Enriching Bio-ontologies with Non-hierarchical Relations Workshop on bio-ontologies October 28, 2005 Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA
2
2 Acknowledgments u Marc Aubry UMR 6061 CNRS, Rennes, France u Anita Burgun University of Rennes, France
3
3 Gene Ontology u Annotate gene products u Coverage l Molecular functions l Cellular components l Biological processes u Explicit relations to other terms within the same hierarchy u No (explicit) relations l To terms across hierarchies l To concepts from other biological ontologies
4
4 Gene Ontology Molecular functions Cellular components Biological processes BP:metal ion transport MF:metal ion transporter activity
5
5 Motivation u Richer ontology l Beyond hierarchies u Easier to maintain l Explicit dependence relations u More consistent annotations l Quality assurance l Assisted curation
6
6 Related work u Ontologizing GO l GONG u Identifying relations among GO terms across hierarchies l Lexical approach l Non-lexical approaches u Identifying relations between GO terms and OBO terms l ChEBI u Representing relations among GO terms and between GO terms and OBO terms l Obol u See also: [Ogren & al., PSB 2004-2005] [Bodenreider & al., PSB 2005] [Wroe & al., PSB 2003] [Mungall, CFG 2005] [Burgun & al., SMBM 2005] [Bada & al., 2004], [Kumar & al., 2004], [Dolan & al., 2005]
7
Non-lexical approaches to identifying associative relations in the Gene Ontology
8
8 GO and annotation databases u 5 model organisms l FlyBase l GOA-Human l MGI l SGD l WormBase Brca1GO:0000793IDA Brca1GO:0003677IEA Brca1GO:0003684IDA Brca1GO:0003723ISS Brca1GO:0004553ISS Brca1GO:0005515 ISS, TAS Brca1GO:0005622IEA Brca1GO:0005634 IDA, ISS Brca1… 270,000 gene-term associations
9
9 Three non-lexical approaches All based on annotation databases Similarity in the vector space model Similarity in the vector space model Statistical analysis of co-occurring GO terms Statistical analysis of co-occurring GO terms Association rule mining Association rule mining
10
10 Similarity in the vector space model Annotation database GO terms Genes g1g1 g2g2 …gngn t1t1 t2t2 … tntn GO terms Genes g1g1 g2g2 … gngn t1t1 t2t2 …tntn
11
11 Similarity in the vector space model GO terms t1t1 t2t2 … tntn t1t1 t2t2 …tntn Similarity matrix Sim(t i,t j ) = t i · t j GO terms Genes g1g1 g2g2 …gngn t1t1 t2t2 … tntn tjtj titi
12
12 Analysis of co-occurring GO terms Annotation database GO terms Genes g1g1 g2g2 … gngn t1t1 t2t2 …tntn g2g2 t2t2 t7t7 t9t9 … t3t3 t7t7 t9t9 g5g5 t5t5 t 2 -t 7 1 t 2 -t 9 1 t 7 -t 9 2 … t5t5t5t51 t7t7t7t72 t9t9t9t92 …
13
13 Analysis of co-occurring GO terms u Statistical analysis: test independence l Likelihood ratio test (G 2 ) l Chi-square test (Pearson’s χ 2 ) u Example from GOA (22,720 annotations) l C0006955 [BP]Freq. = 588 l C0008009 [MF]Freq. = 53 presentabsentTotalpresent46542588 absent721,58322,132 total5322,12522,720 GO:0008009 immune response GO:0006955chemokineactivity Co-oc. = 46 G 2 = 298.7 p < 0.000
14
14 Association rule mining Annotation database GO terms Genes g1g1 g2g2 … gngn t1t1 t2t2 …tntn g2g2 t2t2 t7t7 t9t9 transaction apriori Rules: t 1 => t 2 Confidence: >.9 Support:.05
15
15 Examples of associations u Vector space model l l MF:ice binding l l BP:response to freezing u u Co-occurring terms l l MF:chromatin binding l l CC:nuclear chromatin u u Association rule mining l l MF:carboxypeptidase A activity l l BP:peptolysis and peptidolysis
16
16 Associations identified VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 7665 by at least one approach
17
17 Associations identified VSM VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 MF:ice binding BP:response to freezing
18
18 Associations identified COC VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 MF:chromatin binding CC:nuclear chromatin
19
19 Associations identified ARM VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 MF:carboxypeptidase A activity BP:peptolysis and peptidolysis
20
20 Limited overlap among approaches u Lexical vs. non-lexical u Among non-lexical 76655493 230 VSM COC ARM 3689 2587 453 305309 121 201
21
Linking the Gene Ontology to other biological ontologies
22
22 Related domains u Organisms u Cell types u Physical entities l Gross anatomy l Molecules u Functions l Organism functions l Cell functions u Pathology cytosolic ribosome (sensu Eukaryota) T-cell activation brain development transferrin receptor activity visual perception T-cell activation regulation of blood pressure
23
23 GO and other domains [adapted from B. Smith] Physical entity FunctionProcess Organism Gross anatomy Organism functions Organism processes Cell Cellular components Cellular functions Cellular processes MoleculeMolecules Molecular functions Molecular processes
24
24 GO and other domains (revisited) [adapted from B. Smith] Resolution Physical whole Physical part FunctionProcess Organism Organism components Organism functions Organism processes Cell Cellular components Cellular functions Cellular processes Molecule Molecular components Molecular functions Molecular processes
25
25 Biological ontologies (OBO)
26
26 GO and other domains (revisited) [adapted from B. Smith] Resolution Physical whole Physical part FunctionProcess Organism Organism components Organism functions Organism processes Cell Cellular components Cellular functions Cellular processes Molecule Molecular components Molecular functions Molecular processes
27
27 Integrating biological ontologies Cellular components Molecular functions Molecule Biologocal processes Organism Other OBO ontologies Other OBO ontologies Cell
28
Linking GO to ChEBI [Burgun & al., SMBM 2005]
29
29 ChEBI u Member of the OBO family u Ontology of Chemical Entities of Biological Interest l Atom l Molecule l Ion l Radical u 10,516 entities l 27,097 terms [Dec. 22, 2005]
30
30 Methods u Every ChEBI term searched in every GO term u Maximize precision l Ignored ChEBI terms of 3 characters or less l Proper substring u Maximize recall l Case insensitive matches l Normalized ChEBI names (generated singular forms from plurals)
31
31 Examples u u iron [CHEBI:18248] u u uronic acid [CHEBI:27252] u u carbon [CHEBI:27594] BPiron ion transport [GO:0006826] MFiron superoxide dismutase activity [GO:0008382] CCvanadium-iron nitrogenase complex [GO:0016613] BPresponse to carbon dioxide [GO:0010037] MFcarbon-carbon lyase activity [GO:0016830] BPuronic acid metabolism [GO:0006063] MFuronic acid transporter activity [GO:0015133]
32
32 Quantitative results u 2,700 ChEBI entities (27%) identified in some GO term u 9,431 GO terms (55%) include some ChEBI entity in their names 10,516 entities17,250 terms 20,497 associations
33
33 Generalization MFCCBP CHEBI FMA Cell types REXFIX Mouse Pathology cell membrane viscosity enzymatic reaction Leydic cell tumor iron deposition cerebellar aplasia
34
Conclusions
35
35 Conclusions (1) u Links across OBO ontologies need to be made explicit l Between GO terms across GO hierarchies l Between GO terms and OBO terms l Between terms across OBO ontologies u Automatic approaches l Effective (GO-GO, GO-ChEBI) l At least to bootstrap the process l Needs to be refined
36
36 Conclusions (2) u Affordable relations l Computer-intensive, not labor-intensive u Methods must be combined l Cross-validation l Redundancy as a surrogate for reliability l Relations identified specifically by one approach n False positives n Specific strength of a particular method u Requires (some) manual curation l Biologists must be involved
37
37 References u Bodenreider O, Aubry M, Burgun A. Non-lexical approaches to identifying associative relations in the Gene Ontology. In: Altman RB, Dunker AK, Hunter L, Jung TA, Klein TE, editors. Pacific Symposium on Biocomputing 2005: World Scientific; 2005. p. 91-102. http://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdf http://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdf u Burgun A, Bodenreider O. An ontology of chemical entities helps identify dependence relations among Gene Ontology terms. Proceedings of the First International Symposium on Semantic Mining in Biomedicine (SMBM-2005) Electronic proceedings: CEUR-WS/Vol-148 http://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdf http://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdf
38
Medical Ontology Research Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Contact:Web:olivier@nlm.nih.govmor.nlm.nih.gov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.