Download presentation
Presentation is loading. Please wait.
Published bySydney Porter Modified over 9 years ago
1
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM 2012. ASSOCIATION FOR COMPUTING MACHINERY A Study of Hybrid Similarity Measures for Semantic Relation Extraction
2
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab Motivation The quality of the relations provided by existing extractors is still lower than the quality of the manually constructed relations. Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.
4
Intelligent Database Systems Lab Objectives To development of new relation extraction methods. The method is a systematic analysis of 16 baseline measures, and their combinations with 8 fusion methods and 3 techniques for the combination set selection.
5
Intelligent Database Systems Lab Methodology norm function similarity scores knn function
6
Intelligent Database Systems Lab Methodology -Single Similarity Measures Measures Based on a Semantic Network(5) – exploit the lengths of the shortest paths between terms in a network – probability of terms derived from a corpus – Wu and Palmer, Leacock and Chodorow, Resnik, Jiang and Conrath, and Lin
7
Intelligent Database Systems Lab Web-based Measures(3) – Web search engines – rely on the number of times the terms co-occur in the documents – Normalized Google Distance(NGD) – Measures of Semantic Relatedness(MSR) – YAHOO!, BING, GOOGLE over the domain wikipedia.org Methodology -Single Similarity Measures
8
Intelligent Database Systems Lab Corpus-based Measures(5) – Distributional Measures ›Bag-of-words Distributional Analysis(BDA) ›Syntactic Distributional Analysis(SDA) – Pattern-based Measure ›PatternWiki – Other Corpus-based Measures ›Latent Semantic Analysis(LSA) ›Normalized Google Distance(NGD) Methodology -Single Similarity Measures
9
Intelligent Database Systems Lab Definition-based Measures(3) – WktWiki – Gloss Vectors – Extended Lesk Methodology -Single Similarity Measures
10
Intelligent Database Systems Lab Combination Methods – Input : a set of similarity matrices{S1,..., SK} produced by K single measures – Output : a combined similarity matrix Scmb ›1. Mean ›2. Mean-Nnz ›3. Mean-Zscore ›4. Median Methodology - Hybrid Similarity Measures ›5. Max ›6. Rank Fusion ›7. Relation Fusion ›8. Logit
11
Intelligent Database Systems Lab Combination Methods – Mean. A mean of K pairwise similarity scores: – Mean-Nnz. A mean of those pairwise similarity scores which have a non-zero value: Methodology - Hybrid Similarity Measures
12
Intelligent Database Systems Lab Combination Methods – Mean-Zscore. A mean of K similarity scores transformed into Z-scores: – Median. A median of K pairwise similarities: Methodology - Hybrid Similarity Measures
13
Intelligent Database Systems Lab Combination Methods – Max. A maximum of K pairwise similarities: – Rank Fusion. Methodology - Hybrid Similarity Measures
14
Intelligent Database Systems Lab Combination Methods – Relation Fusion. – Logit. Methodology - Hybrid Similarity Measures
15
Intelligent Database Systems Lab Combination Sets – Expert choice of measures – Forward stepwise procedure – Logistic regression Methodology - Hybrid Similarity Measures
16
Intelligent Database Systems Lab Experiments Evaluation – Human Judgements Datasets. ›MC, RG, WordSim353 – Semantic Relations Datasets. ›BLESS, SN
17
Intelligent Database Systems Lab Experiments
18
Intelligent Database Systems Lab Experiments
19
Intelligent Database Systems Lab Conclusions The results have shown that the hybrid measures outperform the single measures on all datasets. A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.
20
Intelligent Database Systems Lab Comments Advantages – higher performance Applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.