Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing Maps Advisor : Dr. Hsu Reporter : Wen-Hsiang Hu Author : D Elliman and JRG Pulido ∗ 2002 IEEE, Proceedings of the Sixth International Conference on Information Visualisation
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Methods (SOM -> Ontology) Results Conclusions Personal Opinion
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Some sites are an overgrown wilderness in which it is difficult to find anything of interest, even if it is known to be hidden there somewhere. It would be useful to be able to construct some representation of the information in the site.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective We describe an approach for constructing an ontology for such web sites.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methods Our system will organize the knowledge extracted (figure 1) from a digital archive, e.g. a digital library, or the web itself, as follows: ─ Set of objects (entities) ─ Set of functions (is-a) ─ Set of relations (has, part-of )
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 The Algorithm (1/2) The major steps of our approach are as follows: 1. Retrieve hyperlinks from a predefined digital archive. For our analysis we have only retrieved local ones, e.g. cs.nott.ac.uk. 2. Preprocess each hyperlink. For each file, the following is done: a) Remove html tags, e.g.,. b) Remove words by using a stoplist e.g. the, by, but, and the like. Common words that carry little information are pruned from the files. c) For each remaining word, the following is done: i. Its stem is obtained e.g. play is the stem of the words plays, playing, played. ii. A weighted valued is given to it by using the tf x idf iii. A vector space is created for each file.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 The Algorithm (2/2) 3. Produce a document space ─ By using the vector spaces of the previous step, a document space is created. 4. Construct the SOM ─ By using the vector spaces of the previous step, a document space is created. 5. Create the Ontology ─ Once the SOM is done, ontology components can be visualized. This produced results that are often surprisingly close to the user’s intuitive expectation.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Results-Classifying Animals (1/2) The animal dataset (table 1) is presented by means of a html page. Our approach uses a 4x4 SOM and presents the same data by using colored areas. Entities Attributes
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Results-Classifying Animals (2/2) small animals with feathers, big animals with hooves, and the ones with four legs and hair are also clustered together.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Results-Classifying Digital Archives (1/2) For our second analysis a set of web pages of the Computer Science Department 2 at Nottingham University ─ we can readily identify people within the domain, their roles 4, modules 5 that are taught, and research 6 interests of the members of the school. ─ Terms like ieee, confer(encee), proceed(ings), workshop, journal, spring(er), and even the location of the school (wollaton, jubil(e), campu(s), nottingham), and how to reach it (driv(e), rout(e), map, direc(tion), guid(e)) are clustered together. ─ Further subcategories are also visualized, for instance, within the Image Processing and Interpretation Research Group we found terms like text, vision, ai, colour, recognition, image grouped (figure 4).
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Conclusion An ontology is a form of knowledge representation that can be used to give a sense of order to unstructured digital sources. Visualizing Ontology Components through Self- Organizing Maps.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Personal Opinion Application ─ Apply ontology to IR