Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Information Retrieval in Practice
Search Engines and Information Retrieval
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Scalable Text Mining with Sparse Generative Models
Overview of Search Engines
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Querying Structured Text in an XML Database By Xuemei Luo.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Vector Space Models.
Semi-Automatic Image Annotation Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski and Brent Field Microsoft Research.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Information Retrieval
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Post-Ranking query suggestion by diversifying search Chao Wang.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Facilitating Document Annotation Using Content and Querying Value.
WHY IS THIS HAPPENING IN THE PROGRAM? Session 5 Options for Further Investigation & Information Flow.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Enterprise Track: Thread-based Retrieval Enterprise Track: Thread-based Retrieval Yejun Wu and Douglas W. Oard Goal Explore -- document expansion.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Information Retrieval in Practice
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Retrieval and Web Design
Presentation transcript:

Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign WWW 2010, Raleigh April 30, 2010

Motivation keyword search: “john kennedy” Information need is not completely specified Asking a questions is the most natural way to express information need Information need usually corresponds to an underlying question Accurate determination of the underlying question may save user effort substantially How can we do that? Idea: guess the information need by automatically generating and presenting clarification questions and short answers to them. 1

Motivation Clicking on the answer to a question redirects a user to the page where the answer is contained for more information 2

Motivation Clicking on a question brings up both a new result list and a list of questions reflecting the user feedback 3

Benefits Questions correspond to particular facts or aspects of a keyword query topic Questions allow ad hoc exploration of query topics without any prior knowledge about them More natural way of engaging users into interactive feedback Questions are shortcut to the answers or can progressively guide users to the answers they are looking for Questions can always be answered 4

Question-guided search How to index the right content? How to use the index to generate questions? How to rank generated questions? How to interpret and utilize the question-based user feedback? Problem: How to automatically generate such clarification questions? 5

Roadmap Indexing Question generation Ranking Feedback Evaluation

Syntactic Parsing (Minipar) found Mary solution a to the problem subj:person obj det mod pcomp-n Mary found a solution to the problem. headmodifier det Dependency parsing helps to avoid language variability (“After thinking for a while, Mary found a solution to the difficult problem”) 6

Syntactic pattern Smaller number of semantic patterns, compared to surface patterns Patterns are manually specified in the configuration file (we used 32 patterns) Indexing: searching document collection for occurrences of syntactic patterns (tree automata). semantic label syntactic label verb subj:pers pred mod pc:loc Main idea: locate instances of syntactic patterns during indexing and convert those instances into questions for a given query 7 <question focus=“2" text="Where {1:term} {2:stem} {3:term}?" /> <question focus=“2" text="Where {1:term} {2:stem} {3:term}?" />

Examples of patterns 0:verb 1:s (person) 2:mod 3:pc (loc) 4:mod 5:pc (date) Wilson lived in Columbia, South Carolina, the state capital, from , where his father was professor at the Columbia Theological Seminary. Voight is the father of actress Angelina Jolie (Angelina Jolie Voight is her birth name) and actor James Haven 0:verb 1:s (person) 2:pred 3:mod 4:pc (person)

Indexing Kennedy was born at 83 Beals Street in Brookline, Massachusetts on Tuesday, May 29, 1917 at 3:00pm, the second son of Joseph Kennedy and Rose Fitzgerald. 2 verb subj:pers pred mod pcomp:loc dictionary idterm 1kennedy 2was 3be 4born 5in 6brookline 7massachusetts instances iidsiddidpidstidtidslidslpos Terms filling the slots of pattern instances Pattern instances 9

Big picture Index collection patterns instances query questions 10

Roadmap Motivation Indexing Question generation Ranking Feedback Evaluation

Question generation Kennedy was born at 83 Beals Street in Brookline, Massachusetts on Tuesday, May 29, 1917 at 3:00pm, the second son of Joseph Kennedy and Rose Fitzgerald verb subj:pers pred mod pcomp:loc Question templates: 1.Who [1:term] [3:term] [4:term] [5:stem]? 2.Where [1:term][2:stem][3:term]? Question templates: 1.Who [1:term] [3:term] [4:term] [5:stem]? 2.Where [1:term][2:stem][3:term]? 2 Questions: 1.Who was born in Brookline, Massachusetts ? 2.Where was Kennedy born ? Questions: 1.Who was born in Brookline, Massachusetts ? 2.Where was Kennedy born ?

Roadmap Motivation Indexing Question generation Ranking Feedback Evaluation

Ranking QT : number of query terms that occur both in the query and the question (questions containing more query terms are more relevant) PM : number of query terms that occur both in the query and the slots of the pattern instance, from which the question was generated (specific vs. general patterns) DS : the retrieval score of the query with respect to the document that contains the instance (allows to incorporate standard retrieval methods) Each ranking heuristic is function, that maps a question into a real number. 12

Ranking Ranking function is a binary function on question pairs such that if, Given a query, which matches 3 pattern instances,, and generates 6 questions 13

Question generation algorithm search index for instances of syntactic patterns matching query terms locate question templates containing query terms for each instance instantiate the slots of question templates containing query terms compute ranking heuristics determine the position of a generated question in the ranked list 14

Roadmap Motivation Indexing Question generation Ranking Feedback Evaluation

Feedback Kennedy was born at 83 Beals Street in Brookline, Massachusetts on Tuesday, May 29, 1917 at 3:00pm, the second son of Joseph Kennedy and Rose Fitzgerald o Original query: “john kennedy” o Clicked question: Where was Kennedy born? o Expanded query: “john kennedy born brookline massachusetts” o New query brings up questions about brookline massachusetts Feedback cycle: clicking on a question generates an expanded query, which is resubmitted to the system and retrieves a new set of questions 15

Sample session 1 16

Sample session 1 17

Sample session 2 18

Sample session 2 19

Sample session 2 20

Roadmap Motivation Indexing Question generation Ranking Feedback Evaluation

Dataset: 3000 most viewed articles in 2009 combined with the biographic articles about the famous Americans Total of articles ~300 Mb 30 top-ranked questions presented for evaluation for each query Users could submit their own queries or click on the list of pre-defined ones Users were asked to judge well-formedness, interestingness and relevance of questions 21

Evaluation Feedback questions are the most interesting ones; Overall click-through rate >3% (users clicked on at least one question); Relevance is the highest for normal queries. 22

Evaluation Question based feedback aggressively refines the information need by bringing up a small number of highly relevant questions to the top of the question list; Question feedback improves the ranking by bringing the highly relevant and interesting questions to the first 3-4 positions in the ranked list. 23

Evaluation Relevance of questions is strongly correlated with interestingness Users mostly click on medium length (3,4,5-word) questions and find such medium length questions to be more interesting 24

Evaluation Compared distribution of clicked, interesting and relevant questions across question groups, determined by the head word Users found factual questions (what, who) to be more interesting than questions about time and location 25

Concluding remarks New framework for interactive search Proposed methods for all components of the retrieval process: indexing of syntactic patterns, questions generation based on templates, ranking and feedback Implemented the idea in a prototype working with a subset of Wikipedia Experimental results show the promise of the proposed framework 26

Future work Aggregation of semantically related questions and alternative question presentation interface Methods for automatic induction of syntactic patterns and interesting question templates from text collections Alternative question ranking functions (learning-to-rank) Using external resources (knowledge bases) to generate refinement questions independent of document collections 27

Thank you!