Using Partial Reference Alignments to Align Ontologies

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Clustering.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
PARTITIONAL CLUSTERING
Word Spotting DTW.
Project 2 Ontology alignment. SIGNAL-ONTOLOGY (SigO) Immune Response i- Allergic Response i- Antigen Processing and Presentation i- B Cell Activation.
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
Ontology alignment Patrick Lambrix Linköpings universitet.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Cluster Analysis (1).
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
What is Cluster Analysis?
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Ontology Alignment Patrick Lambrix Linköpings universitet.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Presented by Tienwei Tsai July, 2005
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Paper: Large-Scale Clustering of cDNA-Fingerprinting Data Presented by: Srilatha Bhuvanapalli INFS 795 – Special Topics in Data Mining.
Ontology Alignment. Ontologies in biomedical research many biomedical ontologies e.g. GO, OBO, SNOMED-CT practical use of biomedical ontologies e.g. databases.
This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Ontology Alignment state of the art and an application in literature search Patrick Lambrix Linköpings universitet.
A B C D E F A ABSTRACT A novel, efficient, robust, feature-based algorithm is presented for intramodality and multimodality medical image registration.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
1 Aligning the Parasite Experiment Ontology and the Ontology for Biomedical Investigations Using AgreementMaker Valerie Cross, Cosmin Stroe Xueheng Hu,
Genotype Calling Matt Schuerman. Biological Problem How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Ontology Alignment. Ontology alignment Ontology alignment Ontology alignment strategies Evaluation of ontology alignment strategies Ontology alignment.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
University of the Aegean AI – LAB ESWC 2008 From Conceptual to Instance Matching George A. Vouros AI Lab Department of Information and Communication Systems.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
A session-based approach for aligning large ontologies Patrick Lambrix, Rajaram Kaliyaperumal Linköping University.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
A Multi-stage Approach to Detect Gene-gene Interactions Associated with Multiple Correlated Phenotypes Zhou Xiangdong,Keith Chan, Danhong Zhu Department.
Semi-Supervised Clustering
Cluster Analysis II 10/03/2012.
Rule Induction for Classification Using
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Data Mining K-means Algorithm
Websoft Research Group
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Result of Ontology Alignment with RiMOM at OAEI’06
Hierarchical clustering approaches for high-throughput data
Hidden Markov Models Part 2: Algorithms
Patrick Lambrix Linköpings universitet
Extracting Semantic Concept Relations
of the Artificial Neural Networks.
Exploring and Understanding ChIP-Seq data
[jws13] Evaluation of instance matching tools: The experience of OAEI
An Interactive Approach to Collectively Resolving URI Coreference
Data Mining – Chapter 4 Cluster Analysis Part 2
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Actively Learning Ontology Matching via User Interaction
Evaluating Software Clustering Algorithms
Presentation transcript:

Using Partial Reference Alignments to Align Ontologies Patrick Lambrix, Qiang Liu Linköpings Universitet Hello,

Ontology Alignment Many ontologies have been developed  Many of them have overlapping information Use of multiple ontologies  Important to know the inter-ontology relationships Nowadays many ontologies have already been developed. For example in biomedical area a large number of ontologies are available now, like GO, ontologies on OBO. Many of these ontologies have overlapping information. In practice. 1, Often we would like to use multiple ontologies. For instance, applications may need to use ontologies from different areas and from different views on one area. 2, In bottom-up creation of ontologies, ontology builders may want to create a new ontology using existing ontologies or combining knowledge from small ontologies. In all these cases, the different ontologies used, often have overlapping information. It is important to know the relations between terms from different ontologies.

Ontology Alignment SIGNAL-ONTOLOGY (SigO) Immune Response         i- Allergic Response      i- Antigen Processing and Presentation      i- B Cell Activation      i- B Cell Development      i- Complement Signaling synonym complement activation      i- Cytokine Response      i- Immune Suppression      i- Inflammation      i- Intestinal Immunity      i- Leukotriene Response        i-  Leukotriene Metabolism      i- Natural Killer Cell Response      i- T Cell Activation      i- T Cell Development      i- T Cell Selection in Thymus GENE ONTOLOGY (GO) immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- T-cell activation i- activation of natural killer cell activity Here I give an example of ontology alignment. There are two ontologies both about “immune response”. It is a small part of GO. GO becomes a standard in biomedical area. There is another small ontology which is a fragment of SigO.

Ontology Alignment SIGNAL-ONTOLOGY (SigO) Immune Response         i- Allergic Response      i- Antigen Processing and Presentation      i- B Cell Activation      i- B Cell Development      i- Complement Signaling synonym complement activation      i- Cytokine Response      i- Immune Suppression      i- Inflammation      i- Intestinal Immunity      i- Leukotriene Response        i-  Leukotriene Metabolism      i- Natural Killer Cell Response      i- T Cell Activation      i- T Cell Development      i- T Cell Selection in Thymus GENE ONTOLOGY (GO) immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processing i- cellular defense response i- cytokine metabolism i- cytokine biosynthesis synonym cytokine production … p- regulation of cytokine biosynthesis i- B-cell activation i- B-cell differentiation i- B-cell proliferation i- T-cell activation i- activation of natural killer cell activity equivalent concepts equivalent relations is-a relation We define the relations between the terms. We specify the equivalent concepts and equivalent relations, ex. “immune response” in GO is equivalent to ““immune response” in SigO. And new is-a relations between concepts are introduced. ex. “antigen presentation” is a “antigen processing and presentation”. determine the correspondences between terms in different ontologies

Ontology Alignment Framework

Partial Reference Alignment New setting for ontology alignment: Portals with mappings Iterative ontology alignment Anatomy track, task 4 in OAEI 2008  In all these cases some correct mappings between terms in different ontologies are given or have been obtained. A partial reference alignment (PRA) is a subset of all correct mappings.

Partial Reference Alignment Research Problem: Can we use PRAs to obtain higher quality mapping suggestions in ontology alignment?

Partial Reference Alignment Research Problem: Can we use PRAs in the different parts of the framework to obtain higher quality mapping suggestions in ontology alignment?

Outline Background and Evaluation setup Algorithms and evaluations SAMBO and SAMBOdtf Test cases and Evaluation measures Algorithms and evaluations Use of PRA in the preprocessing step Use of PRA in the matcher Use of PRA in the filter step Influence of size of PRA Conclusion & Future Work Here is the outline. As you can see, I will divide my presentation into two parts.

Outline Background and Evaluation setup Algorithms and evaluations SAMBO and SAMBOdtf Test cases and Evaluation measures Algorithms and evaluations Use of PRA in the preprocessing step Use of PRA in the matcher Use of PRA in the filter step Influence of size of PRA Conclusion & Future Work Here is the outline. As you can see, I will divide my presentation into two parts.

SAMBO (1) SAMBO (System for Aligning and Merging Biomedical Ontologies) Phase I Matchers Weighted sum combination of matcher results Single threshold filtering

SAMBO (2) Phase II:

SAMBOdtf (1) What is SAMBOdtf? Observation: SAMBO with Double Threshold Filtering Observation: For single threshold filtering, the higher the threshold, suggestions are more often correct, fewer correct mappings are found. Threshold

SAMBOdtf (2) Idea: Use two thresholds (i) Pairs with similarity value equal to or higher than upper threshold are retained as mapping suggestions. (ii) Pairs with similarity value beween lower and upper threshold are retained as suggestions only if they are ’reasonable’ with respect to the structure of the ontologies and the mapping suggestions retained in step (i). Otherwise they are discarded. (iii) Pairs with similarity value lower than the lower threshold are discarded.

SAMBOdtf (3) 2. Calculate similarity values between their concepts. 1. Given two ontologies. 2. Calculate similarity values between their concepts. 3. Use suggestions above upper threshold to partition the ontologies into mappable groups, using is-a. (For mapping suggestions (A,A’) and (B,B’): A is-a B iff A’ is-a B’) 4. Final mapping suggestions consist of 1) pairs with similarity value above upper threshold and 2) pairs of concepts with similarity value between the two thresholds for which the concepts belong to related mappable groups.

SAMBOdtf (4) Sometimes, the suggestions with similarity values higher than or equal to the upper threshold do not satisfy the structural requirement. In that case, we need find a consistent group, in which for each pair of suggestions (A, A’) and (B, B’): A is-a B iff A’ is-a B’ Example: Consistent Group 5 is-a 2, but not C is-a B

Baseline Systems (SAMBO and SAMBOdtf for OAEI 2008) Removal of Phase II – no user involvement As there is no user to choose between different suggestions regarding a specific term, a term appears in at most one mapping suggestion. Matchers: TermWN String Matching with WordNet UMLSKSearch Uses UMLS Combination Maximum-based strategy Filters Single /Double threshold filtering

Test cases Behavior, Defense: Gene Ontology – Signal Ontology Nose, Ear, Eye: Adult Mouse Anatomy - MeSH Anatomy: Adult Mouse Anatomy – NCI anatomy

Evaluation Precision: number of correct suggestions divided by number of suggestions Recall: number of correct suggestions divided by number of correct mappings Recall-PRA: number of correct suggestions not in PRA divided by number of correct mappings not in PRA F-measure: harmonic mean of precision and recall

Outline Background and Evaluation setup Algorithms and evaluations SAMBO and SAMBOdtf Test cases and Evaluation measures Algorithms and evaluations Use of PRA in the preprocessing step Use of PRA in the matcher Use of PRA in the filter step Influence of size of PRA Conclusion & Future Work Here is the outline. As you can see, I will divide my presentation into two parts.

Algorithms Here is the outline. As you can see, I will divide my presentation into two parts.

1. Use of PRA in the preprocessing step

Use of PRA in the preprocessing step Intuition During the preprocessing step, use mappings in PRA to partition the ontologies into mappable groups. Methods mgPRA mgfPRA

Use of PRA in the preprocessing step mgPRA (Mappable Groups with PRA) Strategy Find consistent group in PRA Partition ontologies into mappable groups before aligning Example:

Use of PRA in the preprocessing step Partition Results

Use of PRA in the preprocessing step mgfPRA (Mappable Groups and Fixing with PRA) Strategy ‘Fix’ the missing structural relationships, making the whole PRA a consistent group Then, partition ontologies into mappable groups Example:

Use of PRA in the preprocessing step Partition Results

Use of PRA in the preprocessing step Result Analysis For threshold 0.4, there are no conclusive results. For thresholds 0.6 and 0.8, mgPRA and mgfPRA almost always have equal or higher precision than SAMBO. mgPRA almost always has equal or higher recall than SAMBO. mgfPRA almost always has equal or lower recall than SAMBO and mgPRA. Similarity values do not change (i.e. Same in SAMBO as in mg/mgfPRA)  recall is not expected to be raised Precision is expected to be raised Threshold 0.4 means flexible string matching

Use of PRA in the preprocessing step Why does mgfPRA perform worse than mgPRA? Incorrect use of the structural relation. For instance, in dataset nose, one source ontology uses the structural relation to define both is-a and part-of. ‘Fixing’ the ontology may therefore be wrong. For instance, the mapping (nose, nose) may lead to introducing is-a relations between nose and its parts.

2. Use of PRA in the matcher

Use of PRA in a matcher Observation Some correct mappings share a similar linguistic pattern. Examples from PRA of Anatomy (lumbar vertebra 5, l5 vertebra) and (thoracic vertebra 11, t11 vertebra) (forebrain, fore brain) and (gallbladder, gall bladder ) (stomach body, body stomach) and (stomach fundus, fundus stomach) Linguistic similarity vectors for (lumbar vertebra 5, l5 vertebra) and (thoracic vertebra 11, t11 vertebra) are similar.

Use of PRA in a matcher Intuition pmPRA (Pattern Matcher with PRA) Mapping suggestions with a linguistic similarity vector close to the linguistic similarity vector of a PRA mapping are more likely to be correct suggestions. pmPRA (Pattern Matcher with PRA) Strategy Compute a linguistic similarity vector for each PRA mapping. For each mapping suggestion, we augment its similarity value according to the number of PRA mappings within its neighborhood.

Use of PRA in a matcher Result Analysis For the small datasets, the correct suggested mappings already had high similarity values, and the missed correct mappings had no shared linguistic pattern with PRA mappings. For the Anatomy dataset, the pmPRA has lower or equal precision. Recall increased for high thresholds and decreased for low thresholds. New correct mappings were found. For low thresholds also new wrong mappings were found.

3. Use of PRA in the filter step

Use of PRA in the filter step fPRA (Filter with PRA) Strategy Implant PRA mappings in the final result. Any suggestion contradicting with PRA mappings will be filtered out. dtfPRA (Double Threshold Filter with PRA) Similar to SAMBOdtf. Use a consistent group in the PRA to filter the suggestions between upper threshold and low threshold.

Use of PRA in the filter step pfPRA (Pattern Filter with PRA) Strategy Cluster all suggestions according to their linguistic similarity vectors using expectation-maximization algorithm. Assign every PRA mapping to the cluster with the nearest cluster center.

Use of PRA in the filter step Strategy (continued..) For each cluster, calculate the average distance (AvgDis) of PRA mappings to their cluster center. Finally, only suggestions with distance to the cluster center smaller or equal than AvgDis will be kept. Otherwise, discarded.

Use of PRA in the filter step (1) Result Analysis fPRA always has equal or higher precision and recall than SAMBO. pfPRA always has equal or higher precision than fPRA and SAMBO. pfPRA always has equal or lower recall than SAMBO. Some correct suggestions are filtered out because they have no similar linguistic pattern to PRA mappings.

Use of PRA in the filter step (2) Result Analysis dtfPRA always has equal or higher recall than SAMBOdtf. For lower threshold 0.6, dtfPRA always has equal or higher precision than SAMBOdtf. For lower threshold 0.4, dtfPRA always has equal or higher precision than SAMBOdtf, except for dataset ear and eye. For dataset ear and eye, the consistent group of dtfPRA is much smaller than the consistent group of SAMBOdtf.

4. Influence of size of PRA

Use of PRA-Full vs PRA-Half Result Analysis For larger PRA For all strategies, the recall is higher. For the preprocessing strategies and pmPRA When threshold is low, the precision is lower. When threshold is high, the precision is higher. For the filtering strategies The precision is always equal or higher. For larger PRA, more constraints need to be satisfied and thus fewer suggestions are generated (both correct and wrong)

Outline Background and Evaluation setup Algorithms and evaluations SAMBO and SAMBOdtf Test cases and Evaluation measures Algorithms and evaluations Use of PRA in the preprocessing step Use of PRA in the matcher Use of PRA in the filter step Influence of size of PRA Conclusion & Future Work Here is the outline. As you can see, I will divide my presentation into two parts.

Lessons learned PRA in preprocessing leads to fewer suggestions, in most cases to an improvement in precision and in some cases to an improvement in recall. Use the linguistic pattern matcher mainly to find new suggestions. Always use filter with PRA. The other filter approaches work well when the structure of the source ontologies is well-defined and complete. Not so large difference between PRA-based algorithms and SAMBO/SAMBOdtf SAMBO/SAMBOdtf already do well on test cases Anatomy case: all new correct mappings are non-trivial

Future Work Improve current strategies, and test on other ontologies. Investigate combinations and interactions of these strategies. Develop an iterative ontology alignment framework.