Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents in Graphs domain Presenter : Cheng-Hui Chen Authors : Mahmoud F. Hussin, Mahmoud R. farra and Yasser El- Sonbaty IJCNN, 2008

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation  The variants of SOM are limited by the fact that they use only the VSM for document representations. ─ It does not represent any relation between the words. ─ The space complexity to the VSM.  The sentences being broken down into their individual components without any representation of the sentence structure. 3 river rafting mild

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives  Using graphs to represent documents helped the salient features of data through using edges to represent relations and using vertices to represent words.  The decrease the space complexity comparing to the VSM.  The extend the GHSOM to work in the graph domain to enhance the quality of clusters. 4 rafting river mild

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 5

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Enhance the DIG to work with G-GHSOM  The Document Index Graph (DIG) model ─ the DIG for representing the document and Exploited it in the document clustering.  For example (the document table of word "river" is shown) ─ River rafting. (doc1) ─ mild river rafting. (doc2) ─ River fishing. (doc3) 6 e1e1 S 0 (1) e0e0 S 0 (0) river 3 3

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Enhance the DIG to work with G-GHSOM 7 1 2 1 2 3 4 3 4

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Enhance the DIG to work with G-GHSOM  Single-word similarity measure  Two document vectors similarity measure  The total similarity is the integration 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Ment of the G-GHSOM to work with graph  Neuron Initialization ─ Detecting the matching list to calculate the phrase based similarity. 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Ment of the G-GHSOM to work with graph 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Ment of the G-GHSOM to work with graph 11 G in

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Data set ─ Reuter’s news articles (RNA) ─ University of Waterloo and Canadian Web sites (UW- CAN) 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions  The extend the GHSOM to a new graph based GHSOM: (G-GHSOM) to enhance the quality of the document clustering. ─ G-GHSOM works successfully with graph domain and achieves a better quality clustering than TGHSOM in document clustering.  The enhanced the DIG model to work with GHSOM algorithm. 14

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Comments  Advantages ─ Enhance the quality of the document clustering  Application ─ SOM ─ Clustering


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents."

Similar presentations


Ads by Google