Download presentation
Presentation is loading. Please wait.
Published bySteven Melvin Webb Modified over 9 years ago
1
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information University of California, Berkeley
2
September 21, 2007CLEF 2009 -- Corfu, Greece GikiCLEF Task Perform QA style retrieval for complex questions with geographic elements Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong Perform QA style retrieval for complex questions with geographic elements Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong
3
September 21, 2007CLEF 2009 -- Corfu, Greece GikiCLEF Task Questions were VERY complex, such as: Which countries have the white, green and red colors in their national flag? Which authors were born in and write about the Bohemian Forest? What Belgians won the Ronde van Vlaanderen exactly twice? List the left side tributaries of the Po river Questions were VERY complex, such as: Which countries have the white, green and red colors in their national flag? Which authors were born in and write about the Bohemian Forest? What Belgians won the Ronde van Vlaanderen exactly twice? List the left side tributaries of the Po river
4
September 21, 2007CLEF 2009 -- Corfu, Greece Approach to GikiCLEF We had no idea about how to handle many of these questions So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant content At least until we realized that relevant content was not a relevant answer We had no idea about how to handle many of these questions So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant content At least until we realized that relevant content was not a relevant answer
5
September 21, 2007CLEF 2009 -- Corfu, Greece Adapting Cheshire II for GikiCLEF For this task we created an interactive version of the database which included: Multiple indexes and re-implementation of links in the wikipedia corpus as title searches to both retrieve identical and similar titles Cross-searching between the different language corpora Some parts of some queries were better searched in specific languages Relied on semi-intelligent translations (me) and occasionally Babelfish For this task we created an interactive version of the database which included: Multiple indexes and re-implementation of links in the wikipedia corpus as title searches to both retrieve identical and similar titles Cross-searching between the different language corpora Some parts of some queries were better searched in specific languages Relied on semi-intelligent translations (me) and occasionally Babelfish
6
September 21, 2007CLEF 2009 -- Corfu, Greece
7
September 21, 2007CLEF 2009 -- Corfu, Greece
8
September 21, 2007CLEF 2009 -- Corfu, Greece
9
September 21, 2007CLEF 2009 -- Corfu, Greece Not all topics are so simple Some topics require multiple background searches to help determine possible answers E.g. Searching for the football teams of all South American countries requires that you know all South American Countries Often Wikipedia List pages are available for these kinds of questions -- but are not themselves considered relevant in this task For example… Some topics require multiple background searches to help determine possible answers E.g. Searching for the football teams of all South American countries requires that you know all South American Countries Often Wikipedia List pages are available for these kinds of questions -- but are not themselves considered relevant in this task For example…
10
September 21, 2007CLEF 2009 -- Corfu, Greece
11
September 21, 2007CLEF 2009 -- Corfu, Greece
12
September 21, 2007CLEF 2009 -- Corfu, Greece NOT RELEVANT ANSWER?
13
September 21, 2007CLEF 2009 -- Corfu, Greece RELEVANT - But no way to verify without the flag page or images and computer vision analysis
14
September 21, 2007CLEF 2009 -- Corfu, Greece Multilingual Search Conducted English search first and identified what I thought were relevant items Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections Usually relied on translation approximations and cognates E.g. Bulgarian had enough similarities to Russian to be able to get a sense of the meaning Conducted English search first and identified what I thought were relevant items Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections Usually relied on translation approximations and cognates E.g. Bulgarian had enough similarities to Russian to be able to get a sense of the meaning
15
September 21, 2007CLEF 2009 -- Corfu, Greece Search Methods Basic ranked search was not very effective on its own Using ranked search with all terms required (boolean constraint) was more effective Sometimes the best approach was a simple Boolean exact match on names But when going across languages may need ranked approximate searches too Definitely need a type classification index for pages that can be used to constrain results Probably need to use link indexes more often too (to find pages of the correct type that link to the relevant page) Basic ranked search was not very effective on its own Using ranked search with all terms required (boolean constraint) was more effective Sometimes the best approach was a simple Boolean exact match on names But when going across languages may need ranked approximate searches too Definitely need a type classification index for pages that can be used to constrain results Probably need to use link indexes more often too (to find pages of the correct type that link to the relevant page)
16
September 21, 2007CLEF 2009 -- Corfu, Greece Limitations Interactive search was a very slow process Often took several hours per question The short test period fell during a family vacation It is more fun to go out to dinner at a nice restaurant than to try to translate Bulgarian Interactive search was a very slow process Often took several hours per question The short test period fell during a family vacation It is more fun to go out to dinner at a nice restaurant than to try to translate Bulgarian
17
September 21, 2007CLEF 2009 -- Corfu, Greece Limitations As a result of the preceding constraints I was only able to complete 22 of 50 topics But each included all languages whenever possible In spite of this (and since scoring penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well As a result of the preceding constraints I was only able to complete 22 of 50 topics But each included all languages whenever possible In spite of this (and since scoring penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well
18
September 21, 2007CLEF 2009 -- Corfu, Greece Results from GikiCLEF Site
19
September 21, 2007CLEF 2009 -- Corfu, Greece Conclusions The interactive approach showed some search strategies than might be exploited automatically Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task The trick for the future will be trying to effective combine the two The interactive approach showed some search strategies than might be exploited automatically Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task The trick for the future will be trying to effective combine the two
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.