Download presentation
Presentation is loading. Please wait.
Published byStuart Barrett Modified over 9 years ago
1
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing semantic relatedness using Wikipedia features
2
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab Motivation Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis- tics, cognitive science and artificial intelligence.
4
Intelligent Database Systems Lab Objectives We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.
5
Intelligent Database Systems Lab Methodology Our semantic relatedness computing system – Filtering Wikipedia category graph – pre-processing Filtering article content Porter stemming Weighting article stems Providing a Category Semantic Depiction (CSD)
6
Intelligent Database Systems Lab Different steps performed to generate the Category Semantic DepictionFiltering Wikipedia category graph Methodology
7
Intelligent Database Systems Lab Methodology Filtering Wikipedia category graph – First : clean meta-categories » We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album – Second : remove orphan nodes and we keep only the category Contents as root » maximum depth 291 to 221
8
Intelligent Database Systems Lab pre-processing – Filtering article content » Remove html tags,infobox, language translation, hyperlinks... – Porter stemming » filtered a stop list to eliminate words which do not have any contribution. – Weighting article stems – Providing a Category Semantic Depiction (CSD) Methodology
9
Intelligent Database Systems Lab Semantic relatedness computing system architecture – Extraction categories algorithm WordNet: resolve the disambiguation pages problem: – Setp1 : extracting all outLinks – Setp2 : find links containing disambiguation tag in parenthesis – Setp3 : extract categories to the two first links – Final : take the categories of the article assigned to the first link existing in the ordered set Methodology-
10
Intelligent Database Systems Lab Methodology Semantic relatedness computing system architecture – Semantic relatedness computing
11
Intelligent Database Systems Lab Methodology Evaluating semantic relatedness measures Comparison with human judgments Pearson product-moment correlation coefficient Spearman rank order correlation coefficient Datasets
12
Intelligent Database Systems Lab Experiments Our semantic relatedness computing system modules using Wikipedia features – Basic system – First module – Second module – Third module – Forth module
13
Intelligent Database Systems Lab Experiments Basic system
14
Intelligent Database Systems Lab Experiments First module: simple patterns
15
Intelligent Database Systems Lab Experiments Second module: Wikipedia pages
16
Intelligent Database Systems Lab Experiments Third module: enrichment using categories neighbors in WCG
17
Intelligent Database Systems Lab Experiments Forth module: Categories enrichment using WCG and redirects
18
Intelligent Database Systems Lab Experiments Application of the SR measure on other datasets – Datasets RG-65 and MC-30 – The verbal dataset YP-130 Solving word choice problems
19
Intelligent Database Systems Lab Conclusions Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches
20
Intelligent Database Systems Lab Comments Advantages Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help. Applications – cognitive science – artificial intelligence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.