Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.

Similar presentations


Presentation on theme: "Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information."— Presentation transcript:

1 Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information University of California, Berkeley

2 September 21, 2007CLEF 2009 -- Corfu, Greece GikiCLEF Task  Perform QA style retrieval for complex questions with geographic elements  Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong  Perform QA style retrieval for complex questions with geographic elements  Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong

3 September 21, 2007CLEF 2009 -- Corfu, Greece GikiCLEF Task  Questions were VERY complex, such as:  Which countries have the white, green and red colors in their national flag?  Which authors were born in and write about the Bohemian Forest?  What Belgians won the Ronde van Vlaanderen exactly twice?  List the left side tributaries of the Po river  Questions were VERY complex, such as:  Which countries have the white, green and red colors in their national flag?  Which authors were born in and write about the Bohemian Forest?  What Belgians won the Ronde van Vlaanderen exactly twice?  List the left side tributaries of the Po river

4 September 21, 2007CLEF 2009 -- Corfu, Greece Approach to GikiCLEF  We had no idea about how to handle many of these questions  So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system  We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant content  At least until we realized that relevant content was not a relevant answer  We had no idea about how to handle many of these questions  So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system  We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant content  At least until we realized that relevant content was not a relevant answer

5 September 21, 2007CLEF 2009 -- Corfu, Greece Adapting Cheshire II for GikiCLEF  For this task we created an interactive version of the database which included:  Multiple indexes and re-implementation of links in the wikipedia corpus as title searches to both retrieve identical and similar titles  Cross-searching between the different language corpora  Some parts of some queries were better searched in specific languages  Relied on semi-intelligent translations (me) and occasionally Babelfish  For this task we created an interactive version of the database which included:  Multiple indexes and re-implementation of links in the wikipedia corpus as title searches to both retrieve identical and similar titles  Cross-searching between the different language corpora  Some parts of some queries were better searched in specific languages  Relied on semi-intelligent translations (me) and occasionally Babelfish

6 September 21, 2007CLEF 2009 -- Corfu, Greece

7 September 21, 2007CLEF 2009 -- Corfu, Greece

8 September 21, 2007CLEF 2009 -- Corfu, Greece

9 September 21, 2007CLEF 2009 -- Corfu, Greece Not all topics are so simple  Some topics require multiple background searches to help determine possible answers  E.g. Searching for the football teams of all South American countries requires that you know all South American Countries  Often Wikipedia List pages are available for these kinds of questions -- but are not themselves considered relevant in this task  For example…  Some topics require multiple background searches to help determine possible answers  E.g. Searching for the football teams of all South American countries requires that you know all South American Countries  Often Wikipedia List pages are available for these kinds of questions -- but are not themselves considered relevant in this task  For example…

10 September 21, 2007CLEF 2009 -- Corfu, Greece

11 September 21, 2007CLEF 2009 -- Corfu, Greece

12 September 21, 2007CLEF 2009 -- Corfu, Greece NOT RELEVANT ANSWER?

13 September 21, 2007CLEF 2009 -- Corfu, Greece RELEVANT - But no way to verify without the flag page or images and computer vision analysis

14 September 21, 2007CLEF 2009 -- Corfu, Greece Multilingual Search  Conducted English search first and identified what I thought were relevant items  Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections  Usually relied on translation approximations and cognates  E.g. Bulgarian had enough similarities to Russian to be able to get a sense of the meaning  Conducted English search first and identified what I thought were relevant items  Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections  Usually relied on translation approximations and cognates  E.g. Bulgarian had enough similarities to Russian to be able to get a sense of the meaning

15 September 21, 2007CLEF 2009 -- Corfu, Greece Search Methods  Basic ranked search was not very effective on its own  Using ranked search with all terms required (boolean constraint) was more effective  Sometimes the best approach was a simple Boolean exact match on names  But when going across languages may need ranked approximate searches too  Definitely need a type classification index for pages that can be used to constrain results  Probably need to use link indexes more often too (to find pages of the correct type that link to the relevant page)  Basic ranked search was not very effective on its own  Using ranked search with all terms required (boolean constraint) was more effective  Sometimes the best approach was a simple Boolean exact match on names  But when going across languages may need ranked approximate searches too  Definitely need a type classification index for pages that can be used to constrain results  Probably need to use link indexes more often too (to find pages of the correct type that link to the relevant page)

16 September 21, 2007CLEF 2009 -- Corfu, Greece Limitations  Interactive search was a very slow process  Often took several hours per question  The short test period fell during a family vacation  It is more fun to go out to dinner at a nice restaurant than to try to translate Bulgarian  Interactive search was a very slow process  Often took several hours per question  The short test period fell during a family vacation  It is more fun to go out to dinner at a nice restaurant than to try to translate Bulgarian

17 September 21, 2007CLEF 2009 -- Corfu, Greece Limitations  As a result of the preceding constraints I was only able to complete 22 of 50 topics  But each included all languages whenever possible  In spite of this (and since scoring penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well  As a result of the preceding constraints I was only able to complete 22 of 50 topics  But each included all languages whenever possible  In spite of this (and since scoring penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well

18 September 21, 2007CLEF 2009 -- Corfu, Greece Results from GikiCLEF Site

19 September 21, 2007CLEF 2009 -- Corfu, Greece Conclusions  The interactive approach showed some search strategies than might be exploited automatically  Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task  The trick for the future will be trying to effective combine the two  The interactive approach showed some search strategies than might be exploited automatically  Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task  The trick for the future will be trying to effective combine the two


Download ppt "Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information."

Similar presentations


Ads by Google