Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at A Semantic Model of Selective Dissemination of Information for Digital Libraries.

Slides:



Advertisements
Similar presentations
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Web Intelligence Text Mining, and web-related Applications
Improved TF-IDF Ranker
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Intelligent Information Retrieval CS 336 –Lecture 3: Text Operations Xiaoyan Li Spring 2006.
Hermes: News Personalization Using Semantic Web Technologies
1 JURO4C: Online Usage Reports for Consortia José Fernandes 12th October 2006.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Information Retrieval in Practice
Architecture of a Search Engine
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
WMES3103 : INFORMATION RETRIEVAL
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
Overview of Search Engines
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Computational Linguistics WTLAB ( Web Technology Laboratory ) Mohsen Kamyar.
© Copyright 2008 STI INNSBRUCK Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
A Hybrid Approach for Searching in the Semantic Web Guide: Dr. S. N. Sivanandam Dept of Computer science & Engg P.Raja 07MW06 Final Yr ME-Software Engg.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
CSC 230: C and Software Tools Rudra Dutta Computer Science Department Course Introduction.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
A semantic based methodology to classify and protect sensitive data in medical records Flora Amato, Valentina Casola, Antonino Mazzeo, Sara Romano Dipartimento.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Chapter 6: Information Retrieval and Web Search
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Web- and Multimedia-based Information Systems Lecture 2.
Digital libraries and web- based information systems Mohsen Kamyar.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
1 CS 430: Information Discovery Lecture 5 Ranking.
Artificial Intelligence Techniques Internet Applications 4.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Information Retrieval in Practice
Search Engine Architecture
Clustering of Web pages
Presented by: Prof. Ali Jaoua
NUR2300 – Guide to Searching ClinicalKey for Nursing
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

© Copyright 2008 STI INNSBRUCK A Semantic Model of Selective Dissemination of Information for Digital Libraries Authors: J. M. Morales-del-Castillo¹, R. Pedraza-Jiménez², A. A. Ruiz³, E. Peis ⁴, and E. Herrera-Viedma ⁵

Basic Ideea 2 – Develop a multi-agent Selective Dissmination of Information (SDI) platform capable of generating alerts and recommandations of documents for users, according to their personal profiles – Appling Semantic Web technologies for achiving more efficient information managment and improving agent-agent and user-agent communication

SDI Components Thesaurus – Enables organizing the most relevant concepts in a specific domain, by defining semantic relations between them. User profiles – Structured representations that contain personal data, interest and preferences of users. RSS feeds – Used as “current awareness bulletins” to generate personalized bibliographic alerts Recommendation log file – Each document in the repository has an associated log file that includes the listing of evaluations assigned to that resource by different users 3

Thesaurus The creation of a thesaurus includes four phases: Pre-processing of documents – Prepare the document parametrization by removing the elements regarded as superfluous in 3 stages: Eliminate all the tags (HTML, XML, etc) Standardization of the words in the document including removing texts articles, determiners, auxiliary verbs, conjunctions, prepositions, … Stemming all the terms left using the WordNet algorithm(Morphy) Parameterizing the selected terms – Final terms are quantified by assigning weights obtained by the application of the scheme term frequency – inverse document frequency (tf-idf)

Thesaurus Conceptualizing their lexical stems – The associated meaning of each term (lemma) are extract by searching them on WordNet, which returns a group of synsets associated to each word (including hypernyms and hyperonyms) Generating a lattice or graph that shows the relation between the identified concepts – Using formal concept analysis techniques for finding relations from the generated groups, where each node in the graph represents a descriptor(namely a group of synonyms terms) – Clustering of documents depending on the terms(and synonyms) including links to those with which has any relation(hyponymy or hyperonymy) Once the thesaurus is obtained by identifying its terms and the underlying relation between them, it is represented using SKOS vocabulary.

User profiles Defined with Friend of a Friend(FOAF) vocabulary (generated at registration time) – Containing personal data, interests and preferences of users 2 Parts: – Public profile: data related to the user's identity and affiliation – Private profile: user interests and preferences about the topic of the alerts he or she wishes to receive Users must specify keywords and concepts that best define their information needs This keywords are then compared with the concepts in the thesaurus; if there is an exact math, the introduced term will be return, otherwise the lexically most similar term. The return term will be suggested to the user and added to its preferences, if this term satisfy he user expectations.

Profile and RSS feeds generation process

Alert generation process

Questions

References 1. J. M. Morales-del-Castillo: Assistant Professor of Information Science, Library and Information Science Department, University of Granada, Spain 2. R. Pedraza-Jiménez: Assistant Professor of Information Science, Journalism and Audiovisual Communication Department, Pompeu Fabra University, Barcelona, Spain 3. A. A. Ruíz: Full Professor of Information Science, Library and Information Science Department, University of Granada. 4. E. Peis: is Full Professor of Information Science, Library and Information Science Department, University of Granada. 5. E. Herrera-Viedma: Senior Lecturer in Computer Science, Computer Science and Artificial Intelligence Department, University of Granada.