Download presentation
Presentation is loading. Please wait.
Published byJoella Preston Modified over 9 years ago
1
Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia
2
Introduction Ontology: formalize the knowledge of a domain by means of defining concepts and properties that relate them
3
Introduction: Ontology Alignment
6
Problem Definition: Ontology Alignment find a set of correspondences between two ontologies O 1 = and O 2 =. The ontology alignment problem:
7
Ontology Alignment Challenges Improving the Alignment Quality Structural & lexical disparity Improving the Alignment Efficiency Quickly producing quality alignment Improving the Scalability Ontology Sizes Efficiency / Quality Resources Efficiency / Quality
8
Space of Alignments m11m12…m1|V 2 | m21m22…m2|V 2 | ………… m|V 1 |1m|V 1 |2…m|V 1 ||V 2 | x1 x2.. x|V 1 | y1y2…y|V 2 | Alignment between many-to-many Alignment Space Size: one-to-manyone-to-one Evaluating An Alignment: Cartesian Product of entities
9
Space of Alignments m11m12…m1|V 2 | m21m22…m2|V 2 | ………… m|V 1 |1m|V 1 |2…m|V 1 ||V 2 | x1 x2.. x|V 1 | y1y2…y|V 2 | Alignment between many-to-many Alignment Space Size: one-to-manyone-to-one Evaluating An Alignment: Cartesian Product of entities Bipartite graph
10
Large Ontology Matching Reduction of alignment space Early pruning of dissimilar element pairs aflood (Hanif and Masaki ‘09) Partition based matching Falcon-AO (Jian et. al. ‘05) Parallel matching MapPSO (Bock and Hettenhausen ‘10) VDoc+ (Zhang ‘12) O2O2 O1O1 P11P11 P12P12 P13P13 P21P21 P22P22 P23P23 4 blocks
11
Batch Alignment of Large Ontologies Scalability is challenging OAEI 2012 - Very Large Biomedical Ontology Track 8 out of 21 tools completed Ontology repositories (e.g., NCBO at Stanford) Batch alignment of ontologies New ontologies posted Ontologies get updated Approach allows any alignment algorithm to be utilized on a MapReduce architecture
12
Contributions: Batch Alignment of Large Ontologies General & Novel Approach To speed up batch alignment of large ontologies using MapReduce No impact to alignment quality for some algorithms Benefits ontology repositories
13
MapReduce Framework
14
output Key-> Value Key-> Key-> Output Value Key identifies a subproblem
15
MapReduce Framework O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22
16
O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22 …
17
O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22
18
O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22
19
Mapper & Reducer Algorithms
20
Identifying Alignment Subproblems Approach: Hamdi et al. 2010 Identify anchors: entity pairs with identical names or labels Cluster concepts around the anchors Using structural neighborhood Entities from one cluster are predominantly in correspondence with entities in one other cluster
21
Merging Subproblem Alignments
22
Performance Evaluation Datasets Conference track from OAEI (120 pairs) Large ontologies from OAEI (SNOMED, NCI,... 5 pairs) New biomedical ontology testbed (50 pairs from NCBO) Algorithms Compare F-measure & runtime Default setup on a single node MapReduce setup using Hadoop (12 nodes each with 24 2GB & 2GHz Intel Xeon processors) Falcon-AOOptima+LogMapYAM++
23
Results – 3 Datasets Algos.Speedup Confer.LargeOAEIBiomed Falcon2155 LogMap9165 Optima+1164110 Yam++4227 ConferenceLarge OAEI Biomedical
24
Results – Large OAEI ontologies Conference Track No partitioning No change in output Ontology Pairs MapRed./Def. Falcon-AO MapReduce LogMap MapRed./Def. Optima+ MapReduce YAM++ Default LogMap Default YAM++ PRFPRFPRFPRFPRFPRF mouse, human 737473967584787376957785928588948690 STW, TheSoz 575053575154184025555253696467607566 fma,nci 958188958389968389978490958690988591 fma, snomed 856372856372846171866373976678977081 snomed, nci 695863675862705863715864906475956074 Other Datasets LogMap & Yam++ : Tradeoff is in the alignment quality Falcon-AO & Optima+: No change in output
25
Speedup with # of nodes in the Hadoop cluster
26
Discussion First inter-matcher parallelization approach Especially using MapReduce Exhibits significant speedup for batch alignment Some algorithms may find small reduction in alignment quality due to the partitioning Significant speedup for single ontology pair Falcon-AO, Optima+ & YAM++ Any alignment algorithm can fit in our framework
27
Thank you Questions ?
28
Parallel Alignment of Large Ontologies on A Computing Cluster Current Divide and Conquer Approaches Heavily rely on structure Size based partitioning techniques are not effective Current Parallel Matching algorithms Parallelize the process within the algorithms Do not support multi node – cluster architecture
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.