Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
How to… Research Like An Expert! Day 2. Today’s Goals By the end of the period, I will: understand Boolean search operators have created a successful.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Search Engines and Information Retrieval
ISP 433/533 Week 2 IR Models.
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
The aim of this part of the curriculum design process is to find the situational factors that will strongly affect the course.
Information Retrieval
Welcome to the CINAHL* tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to make your.
Overview of Search Engines
Welcome to the Sport Discus tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to make.
Designing for the Web 7 Useful Design Principles.
Educator’s Guide Using Instructables With Your Students.
Search Engines and Information Retrieval Chapter 1.
Searching ProQuest: Basic Keyword Search At first glance, how would you search this database?
Welcome to the British Nursing Index (BNI) tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
Choosing a Topic and Forming a Research Question Introduction Choosing and narrowing a topic Forming a research question Talk About It Your Turn Tech Tools.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Producción de Sistemas de Información Agosto-Diciembre 2007 Sesión # 8.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Querying Structured Text in an XML Database By Xuemei Luo.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Presenter: Shanshan Lu 03/04/2010
How to find reliable.  You will be required to use two internet sources, and no more, for the five required sources in your paper.  Due Friday—a works.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
ITGS Databases.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Decomposing Text Processing for Retrieval: Cheshire tries Ray R Larson School of Information University of California, Berkeley Ray R Larson.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Choosing a Topic and Forming a Research Question Introduction Choosing and narrowing a topic Forming a research question Talk About It Your Turn Tech Tools.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 RESEARCHING USING ONLINE SOURCES _____________________________ A Guide to Searching for and Evaluating Web Pages on the Internet.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Middle Search Plus Teacher’s Guide
Here’s the subject guide for your program or course
Workshop on the data collection of occupational data
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Navigation-Aided Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information University of California, Berkeley

September 21, 2007CLEF Corfu, Greece GikiCLEF Task  Perform QA style retrieval for complex questions with geographic elements  Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong  Perform QA style retrieval for complex questions with geographic elements  Task specifications were not really clear (to me at least) since only Wikipedia article titles that WERE answers were acceptable - and not articles that CONTAINED answers if the article “type” was wrong

September 21, 2007CLEF Corfu, Greece GikiCLEF Task  Questions were VERY complex, such as:  Which countries have the white, green and red colors in their national flag?  Which authors were born in and write about the Bohemian Forest?  What Belgians won the Ronde van Vlaanderen exactly twice?  List the left side tributaries of the Po river  Questions were VERY complex, such as:  Which countries have the white, green and red colors in their national flag?  Which authors were born in and write about the Bohemian Forest?  What Belgians won the Ronde van Vlaanderen exactly twice?  List the left side tributaries of the Po river

September 21, 2007CLEF Corfu, Greece Approach to GikiCLEF  We had no idea about how to handle many of these questions  So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system  We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant content  At least until we realized that relevant content was not a relevant answer  We had no idea about how to handle many of these questions  So, we decided to devote our participation to exploring approaches via an interactive interface to the Cheshire II system  We wanted to see what techniques would be effective (and which not) in suggesting documents with relevant content  At least until we realized that relevant content was not a relevant answer

September 21, 2007CLEF Corfu, Greece Adapting Cheshire II for GikiCLEF  For this task we created an interactive version of the database which included:  Multiple indexes and re-implementation of links in the wikipedia corpus as title searches to both retrieve identical and similar titles  Cross-searching between the different language corpora  Some parts of some queries were better searched in specific languages  Relied on semi-intelligent translations (me) and occasionally Babelfish  For this task we created an interactive version of the database which included:  Multiple indexes and re-implementation of links in the wikipedia corpus as title searches to both retrieve identical and similar titles  Cross-searching between the different language corpora  Some parts of some queries were better searched in specific languages  Relied on semi-intelligent translations (me) and occasionally Babelfish

September 21, 2007CLEF Corfu, Greece

September 21, 2007CLEF Corfu, Greece

September 21, 2007CLEF Corfu, Greece

September 21, 2007CLEF Corfu, Greece Not all topics are so simple  Some topics require multiple background searches to help determine possible answers  E.g. Searching for the football teams of all South American countries requires that you know all South American Countries  Often Wikipedia List pages are available for these kinds of questions -- but are not themselves considered relevant in this task  For example…  Some topics require multiple background searches to help determine possible answers  E.g. Searching for the football teams of all South American countries requires that you know all South American Countries  Often Wikipedia List pages are available for these kinds of questions -- but are not themselves considered relevant in this task  For example…

September 21, 2007CLEF Corfu, Greece

September 21, 2007CLEF Corfu, Greece

September 21, 2007CLEF Corfu, Greece NOT RELEVANT ANSWER?

September 21, 2007CLEF Corfu, Greece RELEVANT - But no way to verify without the flag page or images and computer vision analysis

September 21, 2007CLEF Corfu, Greece Multilingual Search  Conducted English search first and identified what I thought were relevant items  Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections  Usually relied on translation approximations and cognates  E.g. Bulgarian had enough similarities to Russian to be able to get a sense of the meaning  Conducted English search first and identified what I thought were relevant items  Then did the same with each other language using the English results as a guide, but open to new relevant items not in the English collections  Usually relied on translation approximations and cognates  E.g. Bulgarian had enough similarities to Russian to be able to get a sense of the meaning

September 21, 2007CLEF Corfu, Greece Search Methods  Basic ranked search was not very effective on its own  Using ranked search with all terms required (boolean constraint) was more effective  Sometimes the best approach was a simple Boolean exact match on names  But when going across languages may need ranked approximate searches too  Definitely need a type classification index for pages that can be used to constrain results  Probably need to use link indexes more often too (to find pages of the correct type that link to the relevant page)  Basic ranked search was not very effective on its own  Using ranked search with all terms required (boolean constraint) was more effective  Sometimes the best approach was a simple Boolean exact match on names  But when going across languages may need ranked approximate searches too  Definitely need a type classification index for pages that can be used to constrain results  Probably need to use link indexes more often too (to find pages of the correct type that link to the relevant page)

September 21, 2007CLEF Corfu, Greece Limitations  Interactive search was a very slow process  Often took several hours per question  The short test period fell during a family vacation  It is more fun to go out to dinner at a nice restaurant than to try to translate Bulgarian  Interactive search was a very slow process  Often took several hours per question  The short test period fell during a family vacation  It is more fun to go out to dinner at a nice restaurant than to try to translate Bulgarian

September 21, 2007CLEF Corfu, Greece Limitations  As a result of the preceding constraints I was only able to complete 22 of 50 topics  But each included all languages whenever possible  In spite of this (and since scoring penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well  As a result of the preceding constraints I was only able to complete 22 of 50 topics  But each included all languages whenever possible  In spite of this (and since scoring penalizes wrong answers as well as rewarding correct ones…) manage to score pretty well

September 21, 2007CLEF Corfu, Greece Results from GikiCLEF Site

September 21, 2007CLEF Corfu, Greece Conclusions  The interactive approach showed some search strategies than might be exploited automatically  Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task  The trick for the future will be trying to effective combine the two  The interactive approach showed some search strategies than might be exploited automatically  Automatic approaches that rely only on conventional IR techniques will probably continue to lag the “knowledge-based” approaches used for this task  The trick for the future will be trying to effective combine the two