Download presentation
Presentation is loading. Please wait.
Published byDebra Brown Modified over 9 years ago
1
General domain question answering system. The starting point was the architecture described in Brill, Eric. ‘Processing Natural Language without Natural Language Processing’, in A. Gelbukh (ed.), CICLing 2003, LNCS 2588, Springer-Verlag Berlin Heidelberg, 2003, pp. 360-9: - Exploring the redundancy existent in the Web. - Exploring the fact that Portuguese is one of the most used languages in the Web. Available on the Web. Participation at CLEF 2004 and 2005. Two strategies were tested: - Searching the answers in the Web and using the CLEF document collection to confirm them (Strategy 1). - Searching the answers only in the CLEF document collection (Strategy 2). Additional experiments using Strategy 1 were performed after error analysis and system debugging (Post-CLEF). 20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005 Luís Costa Luis.costa@sintef.no Linguateca / SINTEF ICT PB 124, Blindern NO-0314 Oslo, Norway http://www.linguateca.pt ? Question reformulation module Passage extraction from CLEF document collection Submition of answer patterns to Google Passages N-gram Harvesting TaskExperiment# questions# right% right CLEF 2005 PT-PT Strategy 120048*24 % Strategy 22004322 % Post-CLEF2006131% EN-PT Strategy 12002513% CLEF 2004 PT-PT Strategy 11993015% Strategy 21992211% Post-CLEF1995528% Esfinge overview What was new at CLEF 2005? The results in the runs using the Web (Strategy 1) were slightly better than the runs using only the CLEF document collection on both participations. The results using Strategy 2 for the questions of type People and Date are better both comparing to the other types of questions and to the same type of questions using Strategy 1. This suggests that both strategies are still worthwhile to experiment and study further. The analysis of the individual modules shows that the NER system helps the system mainly in the questions of type “People”, “Quantity” and “Date”, while the morphological analyser is more influential in the questions of type “Which X”, “Who was ” and “What is”. The results show that Esfinge improved comparing to last year: the results are better both with this year’s and last year’s questions. No => Stem patterns Stemmed Pattern 1 Stemmed Pattern n Answer = NIL No Passage extraction from CLEF document collection No Yes Yes => SIEMES NER No N-grams Filters (B+C+D) No Yes Answer = best scored N-gram YesNo Answer = NIL Use of the named entity recognizer SIEMES (detection of humans, countries, settlements, geographical locations, dates and quantities). List of not interesting websites (jokes, blogs, etc.) Available Brazilian Portuguese document collection. Use of the stemmer Lingua::PT::Stemmer for the generalization of search patterns. Filtering of “undesired answers”. A list of these answers was built based on the logs of last year’s participation and tests performed afterwards. Searching longer answers: the system does not stop when it finds an acceptable answer. Instead keeps searching for longer acceptable answers containing the latter. Participation in the EN-PT multilingual task. Correction of problems detected last year. * Two further right answers were found after the official results were released. Esfinge’s performance Filters (A+B+C+D) Esfinge on the Web http://www.linguateca.pt/Esfinge/ Any N-grams? Doc.s Found? Q. pattern enables use of NER? Doc.s Found? Filters: A: Interesting PoS B: Answer contained in question C: Undesired answer D: Supporting document Doc.s Found? Any N-grams? Strategy 1 Answer Pattern 1 Answer Pattern n Strategy 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.