Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing semantic relatedness using Wikipedia features
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab Motivation Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis- tics, cognitive science and artificial intelligence.
Intelligent Database Systems Lab Objectives We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.
Intelligent Database Systems Lab Methodology Our semantic relatedness computing system – Filtering Wikipedia category graph – pre-processing Filtering article content Porter stemming Weighting article stems Providing a Category Semantic Depiction (CSD)
Intelligent Database Systems Lab Different steps performed to generate the Category Semantic DepictionFiltering Wikipedia category graph Methodology
Intelligent Database Systems Lab Methodology Filtering Wikipedia category graph – First : clean meta-categories » We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album – Second : remove orphan nodes and we keep only the category Contents as root » maximum depth 291 to 221
Intelligent Database Systems Lab pre-processing – Filtering article content » Remove html tags,infobox, language translation, hyperlinks... – Porter stemming » filtered a stop list to eliminate words which do not have any contribution. – Weighting article stems – Providing a Category Semantic Depiction (CSD) Methodology
Intelligent Database Systems Lab Semantic relatedness computing system architecture – Extraction categories algorithm WordNet: resolve the disambiguation pages problem: – Setp1 : extracting all outLinks – Setp2 : find links containing disambiguation tag in parenthesis – Setp3 : extract categories to the two first links – Final : take the categories of the article assigned to the first link existing in the ordered set Methodology-
Intelligent Database Systems Lab Methodology Semantic relatedness computing system architecture – Semantic relatedness computing
Intelligent Database Systems Lab Methodology Evaluating semantic relatedness measures Comparison with human judgments Pearson product-moment correlation coefficient Spearman rank order correlation coefficient Datasets
Intelligent Database Systems Lab Experiments Our semantic relatedness computing system modules using Wikipedia features – Basic system – First module – Second module – Third module – Forth module
Intelligent Database Systems Lab Experiments Basic system
Intelligent Database Systems Lab Experiments First module: simple patterns
Intelligent Database Systems Lab Experiments Second module: Wikipedia pages
Intelligent Database Systems Lab Experiments Third module: enrichment using categories neighbors in WCG
Intelligent Database Systems Lab Experiments Forth module: Categories enrichment using WCG and redirects
Intelligent Database Systems Lab Experiments Application of the SR measure on other datasets – Datasets RG-65 and MC-30 – The verbal dataset YP-130 Solving word choice problems
Intelligent Database Systems Lab Conclusions Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches
Intelligent Database Systems Lab Comments Advantages Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help. Applications – cognitive science – artificial intelligence