Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles using WordNet
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab Motivation Document clustering is a powerful technique that has been widely. That some of the problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters.
Intelligent Database Systems Lab Objectives We are proposing the enhancement of standard k- means algorithm using the external knowledge from WordNet hypernyms. The proposed method enabled significantly improves k-means generating also useful and high quality cluster.
Intelligent Database Systems Lab Methodology-Framework
Intelligent Database Systems Lab Methodology - Euclidian Distance & City-block Distance
Intelligent Database Systems Lab Methodology - Pearson
Intelligent Database Systems Lab Methodology - Cosine Distance
Intelligent Database Systems Lab Methodology - Spearman-rank Distance
Intelligent Database Systems Lab Methodology -Kendall Distance
Intelligent Database Systems Lab Methodology - Comparison of various methods Euclidian City-Block Cosine Kendall Spearman Pearson
Intelligent Database Systems Lab Methodology - heuristic function For Example for ‘fruit’ d=9, f=2 then W= For Example for ‘edible fruit’ d=7, f=1 then W=0.8915’ For Example for ‘food’ d=5, f=1 then W=0.6534
Intelligent Database Systems Lab Methodology - Enriching news articles using WordNet hypernyms
Intelligent Database Systems Lab Methodology - Labeling clusters using WordNet hypernyms
Intelligent Database Systems Lab Methodology - News article’s clustering using W-k means
Intelligent Database Systems Lab Experiments
Intelligent Database Systems Lab Experiments
Intelligent Database Systems Lab Experiments With WordNet use Without WordNet use → ←
Intelligent Database Systems Lab Experiments
Intelligent Database Systems Lab Experiments
Intelligent Database Systems Lab Experiments
Intelligent Database Systems Lab Conclusions From the plethora of similarity measures that have been used, the appliance of Euclidian and cosine k-means produced the best results. We have also presented a novel algorithmic approach towards enhancing the k-means algorithm using knowledge from an external database, WordNet.
Intelligent Database Systems Lab Comments Advantages -The resulting labels are with high precision Applications -News clustering -Cluster labeling