1 M APPING C OMPOSITION FOR M ATCHING L ARGE L IFE S CIENCE O NTOLOGIES A NIKA G ROSS, M ICHAEL H ARTUNG, T ORALF K IRSTEN, E RHARD R AHM 29 TH J ULY 2011, ICBO, B UFFALO
2 MeSH GALEN O NTOLOGIES Multiple interrelated ontologies in a domain (e.g. anatomy) UMLS SNOMED NCI Thesaurus Mouse Anatomy FMA Identify overlapping information between ontologies Create mappings
3 O NTOLOGY M ATCHING Manual creation of mappings between very large ontologies is too labor-intensive Semi-automatic generation of semantic correspondences (linguistic, structural, instance-based ontology matching) →Interrelation of ontologies →Integration of heterogeneous information sources Matching Mapping sim(O 1.a, O 2.b) = 0.8 sim(O 1.a, O 2.c) = 0.5 sim(O 1.c, O 2.c) = 1.0 further input, e.g. instances, dictionary … O1O1 O2O2
4 ? Indirect composition-based matching Via intermediate ontology (IO): important hub ontology, synonym dictionary, … C OMPOSING MA_ UBERON: NCI_C32239 Synonym: Atlas Name: atlas Name: C1 Vertebra Name: cervical vertebra 1Synonym: cervical vertebra 1 Synonym: C1 vertebra →Find new correspondences via composition →Reuse existing mappings to →Increase match quality →Save computation time IO O1O1 O2O2
5 Composition-based ontology matching approach, reuse of previously determined mappings →composeMatch Optional match step to improve composition-based match quality → extendMatch Use of ontology and mapping operators: compose, match and extract Evaluation: indirect matching of MA and NCIT using large intermediate ontologies (UMLS, FMA, Uberon, RadLex) C ONTRIBUTIONS
6 Use mappings to intermediate ontologies IO 1, …, IO k to indirectly match O 1 and O 2 Reduce matching effort by reusing mappings to IO → very fast composition I NDIRECT M ATCHING... IO 1 IO 2 IO k O1O1 O2O2... O1O1 O2O2 OnOn HO O new →IO should have a significant overlap with O 1 and O 2 →IO 1, …, IO k may complement each other →Centralized hub HO →many mappings to other ontologies →O new aligned with any O i via HO
7 O1O1 IO 1 O2O2 occ = 1: CM O1,O2 = {(a,a),(b,b),(c,c)} occ = 2: CM O1,O2 = {(a,a)} Input: Two ontologies O 1 and O 2, list of intermediate ontologies IO 1 … IO k, occurrence count occ Output: Composed mapping CM O1,O2 (1) C OMPOSE M ATCH a bc de a b gh a bc d f a ic IO 2 MapList empty for each IO i IO do M O1,IOi getMapping(O 1, IO i ) return merge (MapList, occ) MapList.add( compose (M O1,IOi, M IOi,O2 )) M IOi,O2 getMapping(IO i, O 2 ) end for MapList (c,c ), (a,a) (a,a), (b,b)
8 //Direct Mapping DM ΔO1ΔO2 match (ΔO 1, ΔO 2 ) EM O1,O2 (b,b) (a,a) (c,c) (d,d) EM O1,O2 merge ({CM O1,O2, DM ΔO1ΔO2 }, 1) return EM O1,O2 Input: Two ontologies O 1 and O 2, composed mapping CM O1,O2 Output: Extended Mapping EM O1,O2 (2) E XTEND M ATCH O1O1 O2O2 a bc de a bc d f ΔO 1 extract (O 1, CM O1,O2 ) ΔO 2 extract (O 2, inverse (CM O1,O2 )) CM O1,O2 (b,b) (a,a) (c,c) DM O1,O2 (d,d)
9 E VALUATION S ETUP Match problem Adult Mouse Anatomy (MA) NCI Thesaurus Anatomy part (NCIT) Preprocessing Normalization Linguistic Matcher (Name, synonyms, Trigram t = 0.8) Selection & Postprocessing Uberon UMLS MA NCIT RadLex FMA Gold standard ~1500 correspondences Precompute mappings using a match strategy ~5000 ~88,000 ~30,800 ~81,000 ~2,700~3,300 |concepts|
10 M APPING S TATISTICS 80% of MA48% of NCIT MA NCIT Uberon Is there a good coverage of MA and NCIT by intermediate ontologies? High overlap of covered MA and NCIT concepts → promising for composition-based match results
11 OAEI A NATOMY T RACK AgrMaker 87.7%
12 Direct match result compared to composeMatch via each hub Additional matching of unmatched parts (extendMatch) E VALUATION 88.2% 86% Uberon & UMLS → best evaluated intermediate ontologies Intermediate Ontology IO
13 Combination of the four composed mappings Correspondences have to occur in at least 1,…,4 mappings E VALUATION union(occ=1) F-Measure90.2 Precision92.7 Recall87.8 Higher occurrences → Recall ↓ extendMatch → Recall ↑
14 Composition-based approach for indirect matching of large life science ontologies (composeMatch, extendMatch) Reuse mappings for improved match efficiency and quality (>90%) Evaluated several intermediate ontologies →Uberon and UMLS: very effective, suited as hub ontologies in the anatomy domain C ONCLUSIONS
15 F UTURE W ORK Investigate composition-based ontology matching for further domains Study the impact of additional mappings Determined by structural matching Existing mappings from BioPortal
16 M APPING C OMPOSITION FOR M ATCHING L ARGE L IFE S CIENCE O NTOLOGIES ? F UNDING German Research Foundation Grant RA497/18-1 “E VOLUTION OF O NTOLOGIES AND M APPINGS ”
17 S. Zhang, O.Bodenreider "Alignment of multiple ontologies of anatomy: Deriving indirect mappings from direct mappings to a reference." AMIA Annual Symposium (2005) A. Tordai, A. Ghazvinian, J. van Ossenbruggen, M.A. Musen, N.F. Noy "Lost in Translation? Empirical Analysis of Mapping Compositions for Large Ontologies." Proc. Ontology Matching ISWC (2010) Our work: Use multiple complementing intermediate ontologies + an additional extendMatch to improve recall of compose R ELATED W ORK
18 NameSynonym Matcher B ACKUP Synonym1 Name Synonym1 Synonym2 Concept MA Concept NCIT sim = max(sim 1,sim 2,...)