Presentation is loading. Please wait.

Presentation is loading. Please wait.

Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.

Similar presentations


Presentation on theme: "Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III."— Presentation transcript:

1 Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III Jornadas de Bibliotecas Digitales El Escorial, 2002

2 2Overview Motivation: problems in query formulation Hand-crafted approaches Controlled vocabularies Automatic approaches Pure string processing Automatic terminology extraction Website Term Browser Conclusions

3 3 Precise information needs Help users to express and precise their information needs –Vague need User doesn’t know exactly what he is looking for –Broad need Compile or summarize pieces of information around a topic Users develop strategies without system assistance Informatio n need Search engine Docs. Document ranking Refinement Query Formulation

4 4 Language barriers Help users to overcome language barriers –Specific domain terminology Find appropriate wording –Translinguality Information available only in a foreign language –Natural Language characteristics Lexical ambiguity Terminology variation Informatio n need Search engine Docs. Query Formulation Document ranking Refinement

5 5 General approaches Terminology Controlled vocabularies indexing & browsing Information Retrieval

6 6

7 7

8 8 Controlled vocabularies Problems Construction & management (high cost) Indexing Manual keyword assessment Errors in automatic keyword assessment Domain specific New domain needs a new thesaurus Specialist oriented (know preferred descriptors) Less specialized audience get poorer results

9 9 General approaches Terminology Controlled vocabularies indexing & browsing String Processing Free text indexing Information Retrieval

10 10 Free text searching Help users to express and precise their information needs? Help users to overcome language barriers? Search

11 11 General approaches Terminology Controlled vocabularies indexing & browsing String Processing Free text indexing Information Retrieval Phrase indexing & browsing (Phind) Keyphrase navigation (Phrasier)

12 12 “Keyphrase” navigation (Jones 1999) Automatic extraction and assessment of 10 “keyphrases” to each document (KEA, Frank 1999) Navigation between documents that share “keyphrases” Problems –No translinguality –No terminology variation

13 13 Problems –No translinguality –No terminology variation

14 14Objectives Develop a model –to help users to express and precise their information needs –to help users to overcome language barriers Bringing to users the collection terminology Morpho-syntactic, semantic & translingual variations Without needs of thesauri construction Establish an appropriate evaluation framework Website Term Browser

15 15 Proposed approach Natural Language Processing Disambiguation Conceptual indexing Terminology Controlled vocabularies indexing & browsing String Processing Free text indexing Information Retrieval Phrase indexing & browsing (Phind) Keyphrase navigation (Phrasier) Automatic Terminology Extraction Terminology Retrieval & Term browsing (WTB)

16 16 Terminology Retrieval From Automatic Terminology Extraction... Obtain lists of terms relevant for a specific domain Term Extraction Term Weighting Term Selection... to Terminology Retrieval Retrieve terms relevant for an information need User query points the relevant terms No terminology lists truncation Favor recall relaxing term extraction patterns... & Browsing Navigate through relevant terminology Access information from retrieved terms Bridge the gap between query and collection vocabularies Cross-Language

17 17 Query in Spanish Hierarchy of terms Catalan English Spanish Ranking of documents

18 18 Translingual variation Morpho-syntactic variations (permutation, insertion) Semantic variations

19 19 Usefulness of Term Browsing All queries 1 word queries >1 word queries First action after QUERY Explore Document from Google 42%42%47%47%39%39% Explore Term51%45%55% Source of last document explored Google ranking50%57%46% Explore Term44%38%47% 2000 session logs in UNED.es comparing: - Use of term area from WTB - Use of document area from Google

20 20Conclusions Browsing of phrases and terminology User oriented approach Interaction over terminological information –Intermediate way between free-searching and thesaurus- guided searching –Without needs of thesaurus construction Website term Browser Brings to users the collection terminology –Morpho-syntactic & semantic variations –Translinguality Evaluation Users appreciate Term Browsing WTB phrasal information can substantially complement the document ranking provided by the search engines


Download ppt "Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III."

Similar presentations


Ads by Google