Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.

Slides:

Advertisements

Similar presentations

ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,

Advertisements

Introduction to Information Retrieval

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Intelligent Information Retrieval CS 336 –Lecture 3: Text Operations Xiaoyan Li Spring 2006.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Information Retrieval Models: Probabilistic Models

Answer Extraction Ling573 NLP Systems and Applications May 19, 2011.

Information Retrieval in Practice

Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.

Question Classification (cont’d) Ling573 NLP Systems & Applications April 12, 2011.

Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.

Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques for NLP March 9, 2011 Examples from Dan Jurafsky)

A Basic Q/A System: Passage Retrieval. Outline  Query Expansion  Document Ranking  Passage Retrieval  Passage Re-ranking.

Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.

Question-Answering: Overview Ling573 Systems & Applications March 31, 2011.

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.

Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

Overview of Search Engines

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Query Expansion.

Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:

Hang Cui et al. NUS at TREC-13 QA Main Task 1/20 National University of Singapore at the TREC- 13 Question Answering Main Task Hang Cui Keya Li Renxu Sun.

A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.

COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.

1 Query Operations Relevance Feedback & Query Expansion.

Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.

Structured Use of External Knowledge for Event-based Open Domain Question Answering Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh National University.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Effective Query Formulation with Multiple Information Sources

Deep Processing QA & Information Retrieval Ling573 NLP Systems and Applications April 11, 2013.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Ling573 NLP Systems and Applications May 7, 2013.

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.

For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Supertagging CMSC Natural Language Processing January 31, 2006.

Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.

Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.

Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013.

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.

Hui Fang (ACL 2008) presentation 2009/02/04 Rick Liu.

Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from

Information Retrieval in Practice

Query Reformulation & Answer Extraction

Entity- & Topic-Based Information Ordering

ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

Presentation transcript:

Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011

Roadmap Motivation: Retrieval gaps Query Formulation: Question Series Query reformulation: AskMSR patterns MULDER parse-based formulation Classic query expansion Semantic resources Pseudo-relevance feedback

Retrieval Gaps Goal: Based on question, Retrieve documents/passages that best capture answer

Retrieval Gaps Goal: Based on question, Retrieve documents/passages that best capture answer Problem: Mismatches in lexical choice, sentence structure

Retrieval Gaps Goal: Based on question, Retrieve documents/passages that best capture answer Problem: Mismatches in lexical choice, sentence structure Q: How tall is Mt. Everest?

Retrieval Gaps Goal: Based on question, Retrieve documents/passages that best capture answer Problem: Mismatches in lexical choice, sentence structure Q: How tall is Mt. Everest? A: The height of Everest is…

Retrieval Gaps Goal: Based on question, Retrieve documents/passages that best capture answer Problem: Mismatches in lexical choice, sentence structure Q: How tall is Mt. Everest? A: The height of Everest is… Q: When did the first American president take office? A: George Washington was inaugurated in….

Query Formulation Goals:

Query Formulation Goals: Overcome lexical gaps & structural differences To enhance basic retrieval matching To improve target sentence identification Issues & Approaches:

Query Formulation Goals: Overcome lexical gaps & structural differences To enhance basic retrieval matching To improve target sentence identification Issues & Approaches: Differences in word forms:

Query Formulation Goals: Overcome lexical gaps & structural differences To enhance basic retrieval matching To improve target sentence identification Issues & Approaches: Differences in word forms: Morphological analysis Differences in lexical choice:

Query Formulation Goals: Overcome lexical gaps & structural differences To enhance basic retrieval matching To improve target sentence identification Issues & Approaches: Differences in word forms: Morphological analysis Differences in lexical choice: Query expansion Differences in structure

Query Formulation Convert question suitable form for IR Strategy depends on document collection Web (or similar large collection): ‘stop structure’ removal: Delete function words, q-words, even low content verbs Corporate sites (or similar smaller collection): Query expansion Can’t count on document diversity to recover word variation Add morphological variants, WordNet as thesaurus Reformulate as declarative: rule-based Where is X located -> X is located in

Question Series TREC 2003-… Target: PERS, ORG,.. Assessors create series of questions about target Intended to model interactive Q/A, but often stilted Introduces pronouns, anaphora

Question Series TREC 2003-… Target: PERS, ORG,.. Assessors create series of questions about target Intended to model interactive Q/A, but often stilted Introduces pronouns, anaphora

Handling Question Series Given target and series, how deal with reference?

Handling Question Series Given target and series, how deal with reference? Shallowest approach: Concatenation: Add the ‘target’ to the question

Handling Question Series Given target and series, how deal with reference? Shallowest approach: Concatenation: Add the ‘target’ to the question Shallow approach: Replacement: Replace all pronouns with target

Handling Question Series Given target and series, how deal with reference? Shallowest approach: Concatenation: Add the ‘target’ to the question Shallow approach: Replacement: Replace all pronouns with target Least shallow approach: Heuristic reference resolution

Question Series Results No clear winning strategy

Question Series Results No clear winning strategy All largely about the target So no big win for anaphora resolution If using bag-of-words features in search, works fine

Question Series Results No clear winning strategy All largely about the target So no big win for anaphora resolution If using bag-of-words features in search, works fine ‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit?

Question Series Results No clear winning strategy All largely about the target So no big win for anaphora resolution If using bag-of-words features in search, works fine ‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?

Question Series Results No clear winning strategy All largely about the target So no big win for anaphora resolution If using bag-of-words features in search, works fine ‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed? Wouldn’t replace ‘the band’

Question Series Results No clear winning strategy All largely about the target So no big win for anaphora resolution If using bag-of-words features in search, works fine ‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed? Wouldn’t replace ‘the band’ Most teams concatenate

AskMSR Shallow Processing for QA (Dumais et al 2002, Lin2007)

Intuition Redundancy is useful! If similar strings appear in many candidate answers, likely to be solution Even if can’t find obvious answer strings

Intuition Redundancy is useful! If similar strings appear in many candidate answers, likely to be solution Even if can’t find obvious answer strings Q: How many times did Bjorn Borg win Wimbledon? Bjorn Borg blah blah blah Wimbledon blah 5 blah Wimbledon blah blah blah Bjorn Borg blah 37 blah. blah Bjorn Borg blah blah 5 blah blah Wimbledon 5 blah blah Wimbledon blah blah Bjorn Borg.

Intuition Redundancy is useful! If similar strings appear in many candidate answers, likely to be solution Even if can’t find obvious answer strings Q: How many times did Bjorn Borg win Wimbledon? Bjorn Borg blah blah blah Wimbledon blah 5 blah Wimbledon blah blah blah Bjorn Borg blah 37 blah. blah Bjorn Borg blah blah 5 blah blah Wimbledon 5 blah blah Wimbledon blah blah Bjorn Borg. Probably 5

Query Reformulation Identify question type: E.g. Who, When, Where,… Create question-type specific rewrite rules:

Query Reformulation Identify question type: E.g. Who, When, Where,… Create question-type specific rewrite rules: Hypothesis: Wording of question similar to answer For ‘where’ queries, move ‘is’ to all possible positions Where is the Louvre Museum located? => Is the Louvre Museum located The is Louvre Museum located The Louvre Museum is located,.etc.

Query Reformulation Identify question type: E.g. Who, When, Where,… Create question-type specific rewrite rules: Hypothesis: Wording of question similar to answer For ‘where’ queries, move ‘is’ to all possible positions Where is the Louvre Museum located? => Is the Louvre Museum located The is Louvre Museum located The Louvre Museum is located,.etc. Create type-specific answer type (Person, Date, Loc)

Query Form Generation 3 query forms: Initial baseline query

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?”

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x”

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x” Generated by ~12 pattern matching rules on terms, POS E.g. wh-word did A verb B -

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x” Generated by ~12 pattern matching rules on terms, POS E.g. wh-word did A verb B -> A verb+ed B ?x (general) Where is A? ->

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x” Generated by ~12 pattern matching rules on terms, POS E.g. wh-word did A verb B -> A verb+ed B ?x (general) Where is A? -> A is located in ?x (specific) Inexact reformulation: bag-of-words

Query Reformulation Examples

Deeper Processing for Query Formulation MULDER (Kwok, Etzioni, & Weld) Converts question to multiple search queries Forms which match target Vary specificity of query Most general bag of keywords Most specific partial/full phrases

Deeper Processing for Query Formulation MULDER (Kwok, Etzioni, & Weld) Converts question to multiple search queries Forms which match target Vary specificity of query Most general bag of keywords Most specific partial/full phrases Employs full parsing augmented with morphology

Syntax for Query Formulation Parse-based transformations: Applies transformational grammar rules to questions

Syntax for Query Formulation Parse-based transformations: Applies transformational grammar rules to questions Example rules: Subject-auxiliary movement: Q: Who was the first American in space?

Syntax for Query Formulation Parse-based transformations: Applies transformational grammar rules to questions Example rules: Subject-auxiliary movement: Q: Who was the first American in space? Alt: was the first American…; the first American in space was Subject-verb movement: Who shot JFK?

Syntax for Query Formulation Parse-based transformations: Applies transformational grammar rules to questions Example rules: Subject-auxiliary movement: Q: Who was the first American in space? Alt: was the first American…; the first American in space was Subject-verb movement: Who shot JFK? => shot JFK Etc Morphology based transformation: Verb-conversion: do-aux+v-inf

Syntax for Query Formulation Parse-based transformations: Applies transformational grammar rules to questions Example rules: Subject-auxiliary movement: Q: Who was the first American in space? Alt: was the first American…; the first American in space was Subject-verb movement: Who shot JFK? => shot JFK Etc Morphology based transformation: Verb-conversion: do-aux+v-inf => conjugated verb

Machine Learning Approaches Diverse approaches: Assume annotated query logs, annotated question sets, matched query/snippet pairs

Machine Learning Approaches Diverse approaches: Assume annotated query logs, annotated question sets, matched query/snippet pairs Learn question paraphrases (MSRA) Improve QA by setting question sites Improve search by generating alternate question forms

Machine Learning Approaches Diverse approaches: Assume annotated query logs, annotated question sets, matched query/snippet pairs Learn question paraphrases (MSRA) Improve QA by setting question sites Improve search by generating alternate question forms Question reformulation as machine translation Given question logs, click-through snippets Train machine learning model to transform Q -> A

Query Expansion Basic idea: Improve matching by adding words with similar meaning/similar topic to query

Query Expansion Basic idea: Improve matching by adding words with similar meaning/similar topic to query Alternative strategies: Use fixed lexical resource E.g. WordNet

Query Expansion Basic idea: Improve matching by adding words with similar meaning/similar topic to query Alternative strategies: Use fixed lexical resource E.g. WordNet Use information from document collection Pseudo-relevance feedback

WordNet Based Expansion In Information Retrieval settings, mixed history Helped, hurt, or no effect With long queries & long documents, no/bad effect

WordNet Based Expansion In Information Retrieval settings, mixed history Helped, hurt, or no effect With long queries & long documents, no/bad effect Some recent positive results on short queries E.g. Fang 2008 Contrasts different WordNet, Thesaurus similarity Add semantically similar terms to query Additional weight factor based on similarity score

Similarity Measures Definition similarity: S def (t 1,t 2 ) Word overlap between glosses of all synsets Divided by total numbers of words in all synsets glosses

Similarity Measures Definition similarity: S def (t 1,t 2 ) Word overlap between glosses of all synsets Divided by total numbers of words in all synsets glosses Relation similarity: Get value if terms are: Synonyms, hypernyms, hyponyms, holonyms, or meronyms

Similarity Measures Definition similarity: S def (t 1,t 2 ) Word overlap between glosses of all synsets Divided by total numbers of words in all synsets glosses Relation similarity: Get value if terms are: Synonyms, hypernyms, hyponyms, holonyms, or meronyms Term similarity score from Lin’s thesaurus

Results Definition similarity yields significant improvements Allows matching across POS More fine-grained weighting that binary relations

Collection-based Query Expansion Xu & Croft 97 (classic) Thesaurus expansion problematic: Often ineffective Issues:

Collection-based Query Expansion Xu & Croft 97 (classic) Thesaurus expansion problematic: Often ineffective Issues: Coverage: Many words – esp. NEs – missing from WordNet

Collection-based Query Expansion Xu & Croft 97 (classic) Thesaurus expansion problematic: Often ineffective Issues: Coverage: Many words – esp. NEs – missing from WordNet Domain mismatch: Fixed resources ‘general’ or derived from some domain May not match current search collection Cat/dog vs cat/more/ls

Collection-based Query Expansion Xu & Croft 97 (classic) Thesaurus expansion problematic: Often ineffective Issues: Coverage: Many words – esp. NEs – missing from WordNet Domain mismatch: Fixed resources ‘general’ or derived from some domain May not match current search collection Cat/dog vs cat/more/ls Use collection-based evidence: global or local

Global Analysis Identifies word cooccurrence in whole collection Applied to expand current query Context can differentiate/group concepts

Global Analysis Identifies word cooccurrence in whole collection Applied to expand current query Context can differentiate/group concepts Create index of concepts: Concepts = noun phrases (1-3 nouns long)

Global Analysis Identifies word cooccurrence in whole collection Applied to expand current query Context can differentiate/group concepts Create index of concepts: Concepts = noun phrases (1-3 nouns long) Representation: Context Words in fixed length window, 1-3 sentences

Global Analysis Identifies word cooccurrence in whole collection Applied to expand current query Context can differentiate/group concepts Create index of concepts: Concepts = noun phrases (1-3 nouns long) Representation: Context Words in fixed length window, 1-3 sentences Concept identifies context word documents Use query to retrieve 30 highest ranked concepts Add to query

Local Analysis Aka local feedback, pseudo-relevance feedback

Local Analysis Aka local feedback, pseudo-relevance feedback Use query to retrieve documents Select informative terms from highly ranked documents Add those terms to query

Local Analysis Aka local feedback, pseudo-relevance feedback Use query to retrieve documents Select informative terms from highly ranked documents Add those terms to query Specifically, Add 50 most frequent terms, 10 most frequent ‘phrases’ – bigrams w/o stopwords Reweight terms

Local Context Analysis Mixes two previous approaches Use query to retrieve top n passages (300 words) Select top m ranked concepts (noun sequences) Add to query and reweight

Local Context Analysis Mixes two previous approaches Use query to retrieve top n passages (300 words) Select top m ranked concepts (noun sequences) Add to query and reweight Relatively efficient Applies local search constraints

Experimental Contrasts Improvements over baseline: Local Context Analysis: +23.5% (relative) Local Analysis: +20.5% Global Analysis: +7.8%

Experimental Contrasts Improvements over baseline: Local Context Analysis: +23.5% (relative) Local Analysis: +20.5% Global Analysis: +7.8% LCA is best and most stable across data sets Better term selection that global analysis

Experimental Contrasts Improvements over baseline: Local Context Analysis: +23.5% (relative) Local Analysis: +20.5% Global Analysis: +7.8% LCA is best and most stable across data sets Better term selection that global analysis All approaches have fairly high variance Help some queries, hurt others

Experimental Contrasts Improvements over baseline: Local Context Analysis: +23.5% (relative) Local Analysis: +20.5% Global Analysis: +7.8% LCA is best and most stable across data sets Better term selection that global analysis All approaches have fairly high variance Help some queries, hurt others Also sensitive to # terms added, # documents

Global Analysis Local Analysis LCA What are the different techniques used to create self-induced hypnosis?

Deliverable #2 Discussion More training data available Test data released Requirements Deliverable Reports