Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Evolving Tree for Text Document Clustering and Visualization 1 Wui Lee Chang, 1* Kai Meng Tay, 2 Chee Peng Lim 1 Faculty of Engineering, Universiti.

Similar presentations


Presentation on theme: "A New Evolving Tree for Text Document Clustering and Visualization 1 Wui Lee Chang, 1* Kai Meng Tay, 2 Chee Peng Lim 1 Faculty of Engineering, Universiti."— Presentation transcript:

1 A New Evolving Tree for Text Document Clustering and Visualization 1 Wui Lee Chang, 1* Kai Meng Tay, 2 Chee Peng Lim 1 Faculty of Engineering, Universiti Malaysia Sarawak, Malaysia 2 Centre for Intelligent Systems Research, Deakin University, Australia. 1* kmtay@feng.unimas.my WSC 17 ( 2012)

2 Presentation Outline Introduction Problem Statements Motivations and Objectives Preliminary Evolving Tree A General Application framework for Evolving Systems The Proposed Procedure Experimental results Concluding Remarks

3 Introduction: Clustering To group sets of data based on their similarity levels to groups/clusters Examples are Self Organizing Map(SOM), K- mean, Fuzzy C-mean.

4 Introduction: Textual Document Clustering To cluster/group sets of textual document based on their similarity levels. To ease information retrieval. Examples – the naive Bayes-based document clustering model [21], – WEBSOM [22], and – support vector machines-based for imbalanced text document classification [23]. [21] Lewis, D.: Naïve Bayes at forty: The independence assumption in information retrieval. In: ECML (1998) [22] Azcarraga, A.P., Yap, T.J., Tan, J., Chua, T.S.: Evaluating keyword selection methods for WEBSOM text archives. In: IEEE Transactions on Knowledge and Data Engineering, vol.16, no.3, pp. 380- 383 (2004) [23] Liu, T., Loh, H.T., Sun, A.: Imbalanced text classification: A term weighting approach. In: Expert Systems with Applications, vol.36, pp.690-701, (2009).

5 Problem Statements : 1 Traditional textual document clustering uses off-line learning. – Weakness:- needed to re-learn when new document is fed. – Adaptive or evolving feature model can be the alternative to traditional methods. – Evolving increase the learning flexibility. – WEBSOM focuses on off-line learning

6 Problem Statement: 2 For SOM ( or WEBSOM) – the difficulty in determining the map size before learning [19]. – The map size also affects the learning time [19]. [19] Pakkanen, J., Iivarinen, J., Oja, E.: The Evolving Tree – Analysis and Applications. In: IEEE Transactions on Neural Networks, vol. 17, no.3, pp.591-603 (2006)

7 Motivations and Objectives To construct an adaptive textual document clustering tool based on Evolving Tree (ETree). To apply a general application framework for Evolving Systems [24]. To analyze the adaptive activity of the proposed method with UNIMAS ENCON 2008 articles. [24] Lughofer, E.: Evolving Fuzzy Systems – Methodologies, Advanced Concepts and Applications. Ed.1, Springer (2011)

8 Preliminary: Evolving Tree (ETree) Formed a tree structure that contains root node, trunk nodes and leaf nodes. Root node is the first created node in the tree. Trunk nodes is connecting the leaf nodes. Leaf nodes are the clusters formed. Able to expand hierarchically (form a tree structure) to scale the data. Hierarchical structure reduce the complexity control.

9 Preliminary: Evolving Tree (ETree)

10 Preliminary: Evolving Tree (ETree)- The learning Algorithm Finding of BMU. Updating leaf nodes. Expanding the tree.

11 Preliminary: Evolving Tree (ETree)-- Finding BMU BMU Tree depth Layer 1 Layer 2 Layer 3 Layer 4

12 Preliminary: Evolving Tree (ETree)-- Updating Leaf Nodes

13 BMU 1 2 3

14 A General Application framework for Evolving Systems [24] [24] Lughofer, E.: Evolving Fuzzy Systems – Methodologies, Advanced Concepts and Applications. Ed.1, Springer (2011)

15 The Proposed Procedure Updating terms of articles ETree Fetching on-line article Refining trained model

16 The Proposed Procedure :Preprocessing Text

17 The Proposed Procedure :Term Weighting

18 The Proposed Procedure : Similarity Match Histogram

19

20 The Proposed Procedure : Expanding Tree

21 Experimental results: Observation

22 Root node Trunk node Leaf node

23 Experimental results: Complexity Control Time(s) Label for articles

24 Number of Clusters Tree sizeTree depth 1014278 15594 20352

25 Concluding Remarks With the proposed approach, articles from ENCON 2008 could be clustered and visualized as a tree structure. In short, the proposed approach constitutes to a new decision support supporting tool for conference organizer. Besides, the proposed procedure could be useful with a larger number of articles with an expected increase in the computation complexity.

26 Future Works

27 Thank you for your attentions


Download ppt "A New Evolving Tree for Text Document Clustering and Visualization 1 Wui Lee Chang, 1* Kai Meng Tay, 2 Chee Peng Lim 1 Faculty of Engineering, Universiti."

Similar presentations


Ads by Google