Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

November 2009INIS Training Seminar1 INIS Training Seminar November 2009 Information Retrieval and Query Formulation Christine Krieger-Levine Content.
Distinción semántica de compuestos léxicos en Recuperación de Información Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos,
Evaluating Hierarchical Clustering of Search Results Departamento de Lenguajes y Sistemas Informáticos UNED, Spain Juan Cigarrán Anselmo Peñas Julio Gonzalo.
Terminology Retrieval: towards a synergy between thesaurus and free text searching Anselmo Peñas, Felisa Verdejo and Julio Gonzalo Dpto. Lenguajes y Sistemas.
Corpus-based Terminology Extraction applied to Information Access Anselmo Peñas, Felisa Verdejo and Julio Gonzalo NLP Group, Dpto. Lenguajes y Sistemas.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Browsing by phrases: terminological information in interactive multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto.
La indexación con técnicas lingüísticas en el modelo clásico de Recuperación de Información Julio Gonzalo, Anselmo Peñas y Felisa Verdejo Grupo de Procesamiento.
CSE3201/CSE4500 Information Retrieval Systems Introduction to Information Retrieval.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Website Term Browser Un sistema interactivo y multilingüe de búsqueda textual basado en técnicas lingüísticas Anselmo Peñas Padilla Directores Julio Gonzalo.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Modern Information Retrieval
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Advance Information Retrieval Topics Hassan Bashiri.
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Internet Research, Second Edition- Illustrated 1 Internet Research: Unit A Searching the Internet Effectively.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
1 Query Operations Relevance Feedback & Query Expansion.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Chapter 6: Information Retrieval and Web Search
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Information Retrieval
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Search Engine Architecture
IR Theory: Evaluation Methods
Evaluation of IR Performance
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Search Engine Architecture
A Suite to Compile and Analyze an LSP Corpus
Topic: Semantic Text Mining
Introduction to Search Engines
Presentation transcript:

Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III Jornadas de Bibliotecas Digitales El Escorial, 2002

2Overview Motivation: problems in query formulation Hand-crafted approaches Controlled vocabularies Automatic approaches Pure string processing Automatic terminology extraction Website Term Browser Conclusions

3 Precise information needs Help users to express and precise their information needs –Vague need User doesn’t know exactly what he is looking for –Broad need Compile or summarize pieces of information around a topic Users develop strategies without system assistance Informatio n need Search engine Docs. Document ranking Refinement Query Formulation

4 Language barriers Help users to overcome language barriers –Specific domain terminology Find appropriate wording –Translinguality Information available only in a foreign language –Natural Language characteristics Lexical ambiguity Terminology variation Informatio n need Search engine Docs. Query Formulation Document ranking Refinement

5 General approaches Terminology Controlled vocabularies indexing & browsing Information Retrieval

6

7

8 Controlled vocabularies Problems Construction & management (high cost) Indexing Manual keyword assessment Errors in automatic keyword assessment Domain specific New domain needs a new thesaurus Specialist oriented (know preferred descriptors) Less specialized audience get poorer results

9 General approaches Terminology Controlled vocabularies indexing & browsing String Processing Free text indexing Information Retrieval

10 Free text searching Help users to express and precise their information needs? Help users to overcome language barriers? Search

11 General approaches Terminology Controlled vocabularies indexing & browsing String Processing Free text indexing Information Retrieval Phrase indexing & browsing (Phind) Keyphrase navigation (Phrasier)

12 “Keyphrase” navigation (Jones 1999) Automatic extraction and assessment of 10 “keyphrases” to each document (KEA, Frank 1999) Navigation between documents that share “keyphrases” Problems –No translinguality –No terminology variation

13 Problems –No translinguality –No terminology variation

14Objectives Develop a model –to help users to express and precise their information needs –to help users to overcome language barriers Bringing to users the collection terminology Morpho-syntactic, semantic & translingual variations Without needs of thesauri construction Establish an appropriate evaluation framework Website Term Browser

15 Proposed approach Natural Language Processing Disambiguation Conceptual indexing Terminology Controlled vocabularies indexing & browsing String Processing Free text indexing Information Retrieval Phrase indexing & browsing (Phind) Keyphrase navigation (Phrasier) Automatic Terminology Extraction Terminology Retrieval & Term browsing (WTB)

16 Terminology Retrieval From Automatic Terminology Extraction... Obtain lists of terms relevant for a specific domain Term Extraction Term Weighting Term Selection... to Terminology Retrieval Retrieve terms relevant for an information need User query points the relevant terms No terminology lists truncation Favor recall relaxing term extraction patterns... & Browsing Navigate through relevant terminology Access information from retrieved terms Bridge the gap between query and collection vocabularies Cross-Language

17 Query in Spanish Hierarchy of terms Catalan English Spanish Ranking of documents

18 Translingual variation Morpho-syntactic variations (permutation, insertion) Semantic variations

19 Usefulness of Term Browsing All queries 1 word queries >1 word queries First action after QUERY Explore Document from Google 42%42%47%47%39%39% Explore Term51%45%55% Source of last document explored Google ranking50%57%46% Explore Term44%38%47% 2000 session logs in UNED.es comparing: - Use of term area from WTB - Use of document area from Google

20Conclusions Browsing of phrases and terminology User oriented approach Interaction over terminological information –Intermediate way between free-searching and thesaurus- guided searching –Without needs of thesaurus construction Website term Browser Brings to users the collection terminology –Morpho-syntactic & semantic variations –Translinguality Evaluation Users appreciate Term Browsing WTB phrasal information can substantially complement the document ranking provided by the search engines