Download presentation
Presentation is loading. Please wait.
Published byPenelope Wilkerson Modified over 8 years ago
1
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : JAMAL A. NASIR, IRAKLIS VARLAMIS, ASIM KARIM, GEORGE TSATSARONIS 2013. KNOWLEDGE-BASED SYSTEMS Semantic Smoothing for Text Clustering
2
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab Motivation (VSM) It assumes independency between the vocabulary terms and ignores all the conceptual relations between terms that potentially exist.
4
Intelligent Database Systems Lab Objectives To increase the importance of core words by considering the terms’ relations, and in parallel downsize the contribution of general terms, leading to better text clustering results.
5
Intelligent Database Systems Lab Methodology Document representations Relatedness measure Document clustering A GVSM-based semantic kernel S-VSM Top-k S-VSM 2.1 Omiotis 2.2 Wikipedia-based relatedness 2.3 Average of Omiotis and Wikipedia-based relatedness 2.4 Pointwise mutual information 3.1 Clustering algorithms 3.2 Algorithms complexity 3.3 Clustering criterion functions 1. 2. 3. 4. 5. 1.1 The Vector Space Model(VSM) 1.2 The Generalized Vector Space Model(GVSM)
6
Intelligent Database Systems Lab Methodology The Vector Space Model(VSM)
7
Intelligent Database Systems Lab Methodology The Generalized Vector Space Model(GVSM)
8
Intelligent Database Systems Lab Methodology Omiotis Wikipedia-based relatedness Average of Omiotis and Wikipedia-based relatedness
9
Intelligent Database Systems Lab Methodology Pointwise mutual information
10
Intelligent Database Systems Lab Methodology
11
Intelligent Database Systems Lab Methodology
12
Intelligent Database Systems Lab Methodology
13
Intelligent Database Systems Lab Methodology
14
Intelligent Database Systems Lab Experiments
15
Intelligent Database Systems Lab Experiments Vector similarity Evaluation measures
16
Intelligent Database Systems Lab Experiments Evaluation measures – Purity – Entropy – Error rate
17
Intelligent Database Systems Lab Experiments
18
Intelligent Database Systems Lab Experiments
19
Intelligent Database Systems Lab Experiments
20
Intelligent Database Systems Lab Experiments
21
Intelligent Database Systems Lab Experiments
22
Intelligent Database Systems Lab Experiments
23
Intelligent Database Systems Lab Experiments
24
Intelligent Database Systems Lab Conclusions The evaluation results demonstrated that S-VSM dominates VSM in performance in most of the combinations and compares favorably to GVSM. In order to further reduce the complexity of S-VSM we introduced an extension of it, namely the top-k S-VSM.
25
Intelligent Database Systems Lab Comments Advantages – It offers a very flexible kernel that can be applied within any domain or with any language. – The ability of the S-VSM perform much better than the VSM in the task of text clustering. – It very efficiently in terms of time and space complexity Applications -Text clustering -Semantic smoothing kernels
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.