Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enriching Bio-ontologies with Non-hierarchical Relations Workshop on bio-ontologies October 28, 2005 Olivier Bodenreider Lister Hill National Center for.

Similar presentations


Presentation on theme: "Enriching Bio-ontologies with Non-hierarchical Relations Workshop on bio-ontologies October 28, 2005 Olivier Bodenreider Lister Hill National Center for."— Presentation transcript:

1 Enriching Bio-ontologies with Non-hierarchical Relations Workshop on bio-ontologies October 28, 2005 Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

2 2 Acknowledgments u Marc Aubry UMR 6061 CNRS, Rennes, France u Anita Burgun University of Rennes, France

3 3 Gene Ontology u Annotate gene products u Coverage l Molecular functions l Cellular components l Biological processes u Explicit relations to other terms within the same hierarchy u No (explicit) relations l To terms across hierarchies l To concepts from other biological ontologies

4 4 Gene Ontology Molecular functions Cellular components Biological processes BP:metal ion transport MF:metal ion transporter activity

5 5 Motivation u Richer ontology l Beyond hierarchies u Easier to maintain l Explicit dependence relations u More consistent annotations l Quality assurance l Assisted curation

6 6 Related work u Ontologizing GO l GONG u Identifying relations among GO terms across hierarchies l Lexical approach l Non-lexical approaches u Identifying relations between GO terms and OBO terms l ChEBI u Representing relations among GO terms and between GO terms and OBO terms l Obol u See also: [Ogren & al., PSB 2004-2005] [Bodenreider & al., PSB 2005] [Wroe & al., PSB 2003] [Mungall, CFG 2005] [Burgun & al., SMBM 2005] [Bada & al., 2004], [Kumar & al., 2004], [Dolan & al., 2005]

7 Non-lexical approaches to identifying associative relations in the Gene Ontology

8 8 GO and annotation databases u 5 model organisms l FlyBase l GOA-Human l MGI l SGD l WormBase Brca1GO:0000793IDA Brca1GO:0003677IEA Brca1GO:0003684IDA Brca1GO:0003723ISS Brca1GO:0004553ISS Brca1GO:0005515 ISS, TAS Brca1GO:0005622IEA Brca1GO:0005634 IDA, ISS Brca1… 270,000 gene-term associations

9 9 Three non-lexical approaches All based on annotation databases Similarity in the vector space model  Similarity in the vector space model Statistical analysis of co-occurring GO terms  Statistical analysis of co-occurring GO terms Association rule mining  Association rule mining

10 10 Similarity in the vector space model Annotation database GO terms Genes g1g1 g2g2 …gngn t1t1 t2t2 … tntn GO terms Genes g1g1 g2g2 … gngn t1t1 t2t2 …tntn 

11 11 Similarity in the vector space model GO terms t1t1 t2t2 … tntn t1t1 t2t2 …tntn Similarity matrix Sim(t i,t j ) = t i · t j  GO terms Genes g1g1 g2g2 …gngn t1t1 t2t2 … tntn tjtj titi 

12 12 Analysis of co-occurring GO terms  Annotation database GO terms Genes g1g1 g2g2 … gngn t1t1 t2t2 …tntn g2g2 t2t2 t7t7 t9t9 … t3t3 t7t7 t9t9 g5g5 t5t5 t 2 -t 7 1 t 2 -t 9 1 t 7 -t 9 2 … t5t5t5t51 t7t7t7t72 t9t9t9t92 …

13 13 Analysis of co-occurring GO terms u Statistical analysis: test independence l Likelihood ratio test (G 2 ) l Chi-square test (Pearson’s χ 2 ) u Example from GOA (22,720 annotations) l C0006955 [BP]Freq. = 588 l C0008009 [MF]Freq. = 53 presentabsentTotalpresent46542588 absent721,58322,132 total5322,12522,720 GO:0008009 immune response GO:0006955chemokineactivity Co-oc. = 46 G 2 = 298.7 p < 0.000

14 14 Association rule mining  Annotation database GO terms Genes g1g1 g2g2 … gngn t1t1 t2t2 …tntn g2g2 t2t2 t7t7 t9t9 transaction apriori Rules: t 1 => t 2 Confidence: >.9 Support:.05

15 15 Examples of associations u Vector space model l l MF:ice binding l l BP:response to freezing u u Co-occurring terms l l MF:chromatin binding l l CC:nuclear chromatin u u Association rule mining l l MF:carboxypeptidase A activity l l BP:peptolysis and peptidolysis

16 16 Associations identified VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 7665 by at least one approach

17 17 Associations identified VSM VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 MF:ice binding BP:response to freezing

18 18 Associations identified COC VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 MF:chromatin binding CC:nuclear chromatin

19 19 Associations identified ARM VSMCOCARMLEX MF-CC499893362917 MF-BP305716285772523 CC-BP76010473292053 Total4316356812685493 MF:carboxypeptidase A activity BP:peptolysis and peptidolysis

20 20 Limited overlap among approaches u Lexical vs. non-lexical u Among non-lexical 76655493 230 VSM COC ARM 3689 2587 453 305309 121 201

21 Linking the Gene Ontology to other biological ontologies

22 22 Related domains u Organisms u Cell types u Physical entities l Gross anatomy l Molecules u Functions l Organism functions l Cell functions u Pathology cytosolic ribosome (sensu Eukaryota) T-cell activation brain development transferrin receptor activity visual perception T-cell activation regulation of blood pressure

23 23 GO and other domains [adapted from B. Smith] Physical entity FunctionProcess Organism Gross anatomy Organism functions Organism processes Cell Cellular components Cellular functions Cellular processes MoleculeMolecules Molecular functions Molecular processes

24 24 GO and other domains (revisited) [adapted from B. Smith] Resolution Physical whole Physical part FunctionProcess Organism Organism components Organism functions Organism processes Cell Cellular components Cellular functions Cellular processes Molecule Molecular components Molecular functions Molecular processes

25 25 Biological ontologies (OBO)

26 26 GO and other domains (revisited) [adapted from B. Smith] Resolution Physical whole Physical part FunctionProcess Organism Organism components Organism functions Organism processes Cell Cellular components Cellular functions Cellular processes Molecule Molecular components Molecular functions Molecular processes

27 27 Integrating biological ontologies Cellular components Molecular functions Molecule Biologocal processes Organism Other OBO ontologies Other OBO ontologies Cell

28 Linking GO to ChEBI [Burgun & al., SMBM 2005]

29 29 ChEBI u Member of the OBO family u Ontology of Chemical Entities of Biological Interest l Atom l Molecule l Ion l Radical u 10,516 entities l 27,097 terms [Dec. 22, 2005]

30 30 Methods u Every ChEBI term searched in every GO term u Maximize precision l Ignored ChEBI terms of 3 characters or less l Proper substring u Maximize recall l Case insensitive matches l Normalized ChEBI names (generated singular forms from plurals)

31 31 Examples u u iron [CHEBI:18248] u u uronic acid [CHEBI:27252] u u carbon [CHEBI:27594] BPiron ion transport [GO:0006826] MFiron superoxide dismutase activity [GO:0008382] CCvanadium-iron nitrogenase complex [GO:0016613] BPresponse to carbon dioxide [GO:0010037] MFcarbon-carbon lyase activity [GO:0016830] BPuronic acid metabolism [GO:0006063] MFuronic acid transporter activity [GO:0015133]

32 32 Quantitative results u 2,700 ChEBI entities (27%) identified in some GO term u 9,431 GO terms (55%) include some ChEBI entity in their names 10,516 entities17,250 terms 20,497 associations

33 33 Generalization MFCCBP CHEBI FMA Cell types REXFIX Mouse Pathology cell membrane viscosity enzymatic reaction Leydic cell tumor iron deposition cerebellar aplasia

34 Conclusions

35 35 Conclusions (1) u Links across OBO ontologies need to be made explicit l Between GO terms across GO hierarchies l Between GO terms and OBO terms l Between terms across OBO ontologies u Automatic approaches l Effective (GO-GO, GO-ChEBI) l At least to bootstrap the process l Needs to be refined

36 36 Conclusions (2) u Affordable relations l Computer-intensive, not labor-intensive u Methods must be combined l Cross-validation l Redundancy as a surrogate for reliability l Relations identified specifically by one approach n False positives n Specific strength of a particular method u Requires (some) manual curation l Biologists must be involved

37 37 References u Bodenreider O, Aubry M, Burgun A. Non-lexical approaches to identifying associative relations in the Gene Ontology. In: Altman RB, Dunker AK, Hunter L, Jung TA, Klein TE, editors. Pacific Symposium on Biocomputing 2005: World Scientific; 2005. p. 91-102. http://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdf http://mor.nlm.nih.gov/pubs/pdf/2005-psb-ob.pdf u Burgun A, Bodenreider O. An ontology of chemical entities helps identify dependence relations among Gene Ontology terms. Proceedings of the First International Symposium on Semantic Mining in Biomedicine (SMBM-2005) Electronic proceedings: CEUR-WS/Vol-148 http://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdf http://mor.nlm.nih.gov/pubs/pdf/2005-smbm-ab.pdf

38 Medical Ontology Research Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Contact:Web:olivier@nlm.nih.govmor.nlm.nih.gov


Download ppt "Enriching Bio-ontologies with Non-hierarchical Relations Workshop on bio-ontologies October 28, 2005 Olivier Bodenreider Lister Hill National Center for."

Similar presentations


Ads by Google