Download presentation
Presentation is loading. Please wait.
Published byMadeleine Roxanne Rose Modified over 9 years ago
1
Knowledge management system based on Latent Semantic Analysis
2
08.06.20072 Authors Zvonimir Radoš Siemens d.d. Županijska 21, 31000 Osijek, Croatia zvonimir.rados@siemens.com Franjo Jović, Josip Job Faculty of Electrical Engeeniring University of Josip Juraj Strossmayer Kneza Trpimira 2b, Osijek, Croatia franjo.jovic@etfos.hr, josip.job@etfos.hr
3
08.06.20073 Presentation overview Identification of the problem Proposed solution (The Big Picture) What is LSA? Characteristics of the system Categories User profiles and agents System lifecycle Conclusion Questions
4
08.06.20074 Identification of the problem Facts Hughe amount of information is already stored in electronic form (documents, web pages, e-mails...) and new documents are created every day. Organizations are in need for fast, cheap and reliable way to find targeted information. Problem How to organize and search documents within an organization? Solution Knowledge management system that can: –store new documents –perform queries on existing documents –classify documents –organize people in communities based on their affinities
5
08.06.20075 Proposed solution Pitfalls in design of a KM system Documents are stored using simple keywords and querying is done using exact match. Categories are too rigid and not flexible System usually does not have process that would guarantees valid and up-to-date internal “knowledge” Proposed KM system based on LSA and ontologies instead of keywords vectores in semantic space are used categories are defined as URIs in Semantic Web users are defined with profile and agents (also using ontology) system has processes that refresh internal “knowledge”
6
08.06.20076 Latent Semantic Analysis The base is matrix that defines occurrence of terms in documents. Initial semantic space (few hundreds dimensions) is transformed into latent semantic space (~100 dimensions) using singular value decomposition. SVD creates new dimensions where every document and term is uniquely projected. Similarity of projections is defined by the angle between semantic vectors. Deerwester, Harshman, Dumais, Landauer
7
08.06.20077 KMS based on LSA Description the core of the system is a semantic space where documents, categories and queries are projected matching documents to queries and categories is based on their projection in semantic space adaptation of the semantic space is periodical users can insert documents and perform queries users can define their profiles and create agents moderators define categories using ontologies Benefits matching based on similarity of meaning and not exact keyword (synonymy, polysemy) categories are easily modified by moderator (initiated by the system) no predefined knowledge (only categories)
8
08.06.20078 Categories -Categories are represented as URIs in Semantic web -their structure is used to classify documents -they are defined by one or more semantic vectors (influence on recall) -every category has a Threshold (influence on precision) -they are maintained by moderators
9
08.06.20079 User profile and agents Users are also described using ontologies: -user can define his profile by subscribing to certain categories -profiles are used to create communities -user can create agents that are defined with semantic vectors: when new document is added agents notify users (e-mail).
10
08.06.200710 Life of the system Lifecycle of such system: Instalation phase 1.System is installed 2.Initial set of documents is used to create semantic space 3.Categories are defined (hierarchy, semantic vectores...) Active phase 1.New documents are added and projected in semantic space 2.Users are created with their profiles and agents 3.Queries are performed 4.Periodical updates of semantic space are executed No built in knowledge => “cheap” off-the-shelf system
11
08.06.200711 Conclusion PROS Easy installation: no predefined knowledge Easy maintenance : Moderator can redefine categories, system updates it self periodically Semantically based approach: meaning of the words is considered rather then simple keyword matching CONS SVD is time and CPU demanding (folding-in as alternativ) Possible influence of the moderator on the process of creation of the latent semantic space is not clear => he can not “tune it”!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.