Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Group Members: Satadru Biswas ( ) Tanmay Khirwadkar ( ) Arun Karthikeyan Karra (05d05020) CS Course Seminar Group-2 Question Answering.
Vikas BhardwajColumbia University NLP for the Web – Spring 2010 Improving QA Accuracy by Question Inversion Prager et al. IBM T.J. Watson Res. Ctr. 02/18/2010.
Answer Extraction Ling573 NLP Systems and Applications May 19, 2011.
Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.
Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.
Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.
Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques for NLP March 9, 2011 Examples from Dan Jurafsky)
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Question-Answering: Overview Ling573 Systems & Applications March 31, 2011.
1 Question-Answering via the Web: the AskMSR System Note: these viewgraphs were originally developed by Professor Nick Kushmerick, University College Dublin,
An Analysis of the AskMSR Question-Answering System Eric Brill, Susan Dumais, and Michelle Banko Microsoft Research.
Team 8 Mowry, Srinivasan and Wong Ling 573, Spring University of Washington.
Overview of Search Engines
Information Retrieval in Practice
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Hang Cui et al. NUS at TREC-13 QA Main Task 1/20 National University of Singapore at the TREC- 13 Question Answering Main Task Hang Cui Keya Li Renxu Sun.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Translingual Topic Tracking with PRISE Gina-Anne Levow and Douglas W. Oard University of Maryland February 28, 2000.
A Data Driven Approach to Query Expansion in Question Answering Leon Derczynski, Robert Gaizauskas, Mark Greenwood and Jun Wang Natural Language Processing.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
1 Statistical Properties for Text Rong Jin. 2 Statistical Properties of Text  How is the frequency of different words distributed?  How fast does vocabulary.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Deep Processing QA & Information Retrieval Ling573 NLP Systems and Applications April 11, 2013.
Question Answering over Implicitly Structured Web Content
Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Ling573 NLP Systems and Applications May 7, 2013.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Supertagging CMSC Natural Language Processing January 31, 2006.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Statistical Properties of Text
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
General Architecture of Retrieval Systems 1Adrienn Skrop.
1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Query Reformulation & Answer Extraction
Ling573 NLP Systems and Applications May 16, 2013
Lecture 12: Data Wrangling
Question Answer System Deliverable #2
Information Retrieval and Web Design
Presentation transcript:

Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013

Announcement Thursday’s class will be pre-recorded. Will be accessed from the Adobe Connect recording. Will be linked before regular Thursday class time. Please post any questions to the GoPost.

Roadmap Two extremes in QA systems: Redundancy-based QA: Aranea LCC’s PowerAnswer-2 Deliverable #2

Redundancy-based QA AskMSR (2001,2002); Aranea (Lin, 2007)

Redundancy-based QA Systems exploit statistical regularity to find “easy” answers to factoid questions on the Web

Redundancy-based QA Systems exploit statistical regularity to find “easy” answers to factoid questions on the Web —When did Alaska become a state? (1) Alaska became a state on January 3, (2) Alaska was admitted to the Union on January 3, 1959.

Redundancy-based QA Systems exploit statistical regularity to find “easy” answers to factoid questions on the Web —When did Alaska become a state? (1) Alaska became a state on January 3, (2) Alaska was admitted to the Union on January 3, —Who killed Abraham Lincoln? (1) John Wilkes Booth killed Abraham Lincoln. (2) John Wilkes Booth altered history with a bullet. He will forever be known as the man who ended Abraham Lincoln’s life.

Redundancy-based QA Systems exploit statistical regularity to find “easy” answers to factoid questions on the Web —When did Alaska become a state? (1) Alaska became a state on January 3, (2) Alaska was admitted to the Union on January 3, —Who killed Abraham Lincoln? (1) John Wilkes Booth killed Abraham Lincoln. (2) John Wilkes Booth altered history with a bullet. He will forever be known as the man who ended Abraham Lincoln’s life. Text collection

Redundancy-based QA Systems exploit statistical regularity to find “easy” answers to factoid questions on the Web —When did Alaska become a state? (1) Alaska became a state on January 3, (2) Alaska was admitted to the Union on January 3, —Who killed Abraham Lincoln? (1) John Wilkes Booth killed Abraham Lincoln. (2) John Wilkes Booth altered history with a bullet. He will forever be known as the man who ended Abraham Lincoln’s life. Text collection may only have (2), but web?

Redundancy-based QA Systems exploit statistical regularity to find “easy” answers to factoid questions on the Web —When did Alaska become a state? (1) Alaska became a state on January 3, (2) Alaska was admitted to the Union on January 3, —Who killed Abraham Lincoln? (1) John Wilkes Booth killed Abraham Lincoln. (2) John Wilkes Booth altered history with a bullet. He will forever be known as the man who ended Abraham Lincoln’s life. Text collection may only have (2), but web? anything

Redundancy & Answers How does redundancy help find answers?

Redundancy & Answers How does redundancy help find answers? Typical approach: Answer type matching E.g. NER, but Relies on large knowledge-base Redundancy approach:

Redundancy & Answers How does redundancy help find answers? Typical approach: Answer type matching E.g. NER, but Relies on large knowledge-based Redundancy approach: Answer should have high correlation w/query terms Present in many passages Uses n-gram generation and processing

Redundancy & Answers How does redundancy help find answers? Typical approach: Answer type matching E.g. NER, but Relies on large knowledge-based Redundancy approach: Answer should have high correlation w/query terms Present in many passages Uses n-gram generation and processing In ‘easy’ passages, simple string match effective

Redundancy Approaches AskMSR (2001): Lenient: 0.43; Rank: 6/36; Strict: 0.35; Rank: 9/36

Redundancy Approaches AskMSR (2001): Lenient: 0.43; Rank: 6/36; Strict: 0.35; Rank: 9/36 Aranea (2002, 2003): Lenient: 45%; Rank: 5; Strict: 30%; Rank:6-8

Redundancy Approaches AskMSR (2001): Lenient: 0.43; Rank: 6/36; Strict: 0.35; Rank: 9/36 Aranea (2002, 2003): Lenient: 45%; Rank: 5; Strict: 30%; Rank:6-8 Concordia (2007): Strict: 25%; Rank 5

Redundancy Approaches AskMSR (2001): Lenient: 0.43; Rank: 6/36; Strict: 0.35; Rank: 9/36 Aranea (2002, 2003): Lenient: 45%; Rank: 5; Strict: 30%; Rank:6-8 Concordia (2007): Strict: 25%; Rank 5 Many systems incorporate some redundancy Answer validation Answer reranking LCC: huge knowledge-based system, redundancy improved

Intuition Redundancy is useful! If similar strings appear in many candidate answers, likely to be solution Even if can’t find obvious answer strings

Intuition Redundancy is useful! If similar strings appear in many candidate answers, likely to be solution Even if can’t find obvious answer strings Q: How many times did Bjorn Borg win Wimbledon? Bjorn Borg blah blah blah Wimbledon blah 5 blah Wimbledon blah blah blah Bjorn Borg blah 37 blah. blah Bjorn Borg blah blah 5 blah blah Wimbledon 5 blah blah Wimbledon blah blah Bjorn Borg.

Intuition Redundancy is useful! If similar strings appear in many candidate answers, likely to be solution Even if can’t find obvious answer strings Q: How many times did Bjorn Borg win Wimbledon? Bjorn Borg blah blah blah Wimbledon blah 5 blah Wimbledon blah blah blah Bjorn Borg blah 37 blah. blah Bjorn Borg blah blah 5 blah blah Wimbledon 5 blah blah Wimbledon blah blah Bjorn Borg. Probably 5

Query Reformulation Identify question type: E.g. Who, When, Where,… Create question-type specific rewrite rules:

Query Reformulation Identify question type: E.g. Who, When, Where,… Create question-type specific rewrite rules: Hypothesis: Wording of question similar to answer For ‘where’ queries, move ‘is’ to all possible positions Where is the Louvre Museum located? => Is the Louvre Museum located The is Louvre Museum located The Louvre Museum is located,.etc.

Query Reformulation Identify question type: E.g. Who, When, Where,… Create question-type specific rewrite rules: Hypothesis: Wording of question similar to answer For ‘where’ queries, move ‘is’ to all possible positions Where is the Louvre Museum located? => Is the Louvre Museum located The is Louvre Museum located The Louvre Museum is located,.etc. Create type-specific answer type (Person, Date, Loc)

Query Form Generation 3 query forms: Initial baseline query

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?”

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x”

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x” Generated by ~12 pattern matching rules on terms, POS E.g. wh-word did A verb B -

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x” Generated by ~12 pattern matching rules on terms, POS E.g. wh-word did A verb B -> A verb+ed B ?x (general) Where is A? ->

Query Form Generation 3 query forms: Initial baseline query Exact reformulation:weighted 5 times higher Attempts to anticipate location of answer Extract using surface patterns “When was the telephone invented?” “the telephone was invented ?x” Generated by ~12 pattern matching rules on terms, POS E.g. wh-word did A verb B -> A verb+ed B ?x (general) Where is A? -> A is located in ?x (specific) Inexact reformulation: bag-of-words

Query Reformulation Examples

Redundancy-based Answer Extraction Prior processing: Question formulation Web search Retrieve snippets – top 100

Redundancy-based Answer Extraction Prior processing: Question formulation Web search Retrieve snippets – top 100 N-grams: Generation Voting Filtering Combining Scoring Reranking

N-gram Generation & Voting N-gram generation from unique snippets: Approximate chunking – without syntax All uni-, bi-, tri-, tetra- grams Concordia added 5-grams (prior errors)

N-gram Generation & Voting N-gram generation from unique snippets: Approximate chunking – without syntax All uni-, bi-, tri-, tetra- grams Concordia added 5-grams (prior errors) Score: based on source query: exact 5x, others 1x N-gram voting: Collates n-grams N-gram gets sum of scores of occurrences What would be highest ranked ?

N-gram Generation & Voting N-gram generation from unique snippets: Approximate chunking – without syntax All uni-, bi-, tri-, tetra- grams Concordia added 5-grams (prior errors) Score: based on source query: exact 5x, others 1x N-gram voting: Collates n-grams N-gram gets sum of scores of occurrences What would be highest ranked ? Specific, frequent: Question terms, stopwords

N-gram Filtering Throws out ‘blatant’ errors Conservative or aggressive?

N-gram Filtering Throws out ‘blatant’ errors Conservative or aggressive? Conservative: can’t recover error Question-type-neutral filters:

N-gram Filtering Throws out ‘blatant’ errors Conservative or aggressive? Conservative: can’t recover error Question-type-neutral filters: Exclude if begin/end with stopword Exclude if contain words from question, except ‘Focus words’ : e.g. units Question-type-specific filters:

N-gram Filtering Throws out ‘blatant’ errors Conservative or aggressive? Conservative: can’t recover error Question-type-neutral filters: Exclude if begin/end with stopword Exclude if contain words from question, except ‘Focus words’ : e.g. units Question-type-specific filters: ‘how far’, ‘how fast’:

N-gram Filtering Throws out ‘blatant’ errors Conservative or aggressive? Conservative: can’t recover error Question-type-neutral filters: Exclude if begin/end with stopword Exclude if contain words from question, except ‘Focus words’ : e.g. units Question-type-specific filters: ‘how far’, ‘how fast’: exclude if no numeric ‘who’,’where’:

N-gram Filtering Throws out ‘blatant’ errors Conservative or aggressive? Conservative: can’t recover error Question-type-neutral filters: Exclude if begin/end with stopword Exclude if contain words from question, except ‘Focus words’ : e.g. units Question-type-specific filters: ‘how far’, ‘how fast’: exclude if no numeric ‘who’,’where’: exclude if not NE (first & last caps)

N-gram Filtering Closed-class filters: Exclude if not members of an enumerable list

N-gram Filtering Closed-class filters: Exclude if not members of an enumerable list E.g. ‘what year ‘ -> must be acceptable date year

N-gram Filtering Closed-class filters: Exclude if not members of an enumerable list E.g. ‘what year ‘ -> must be acceptable date year Example after filtering: Who was the first person to run a sub-four-minute mile?

N-gram Filtering Impact of different filters: Highly significant differences when run w/subsets

N-gram Filtering Impact of different filters: Highly significant differences when run w/subsets No filters: drops 70%

N-gram Filtering Impact of different filters: Highly significant differences when run w/subsets No filters: drops 70% Type-neutral only: drops 15%

N-gram Filtering Impact of different filters: Highly significant differences when run w/subsets No filters: drops 70% Type-neutral only: drops 15% Type-neutral & Type-specific: drops 5%

N-gram Combining Current scoring favors longer or shorter spans?

N-gram Combining Current scoring favors longer or shorter spans? E.g. Roger or Bannister or Roger Bannister or Mr…..

N-gram Combining Current scoring favors longer or shorter spans? E.g. Roger or Bannister or Roger Bannister or Mr….. Bannister pry highest – occurs everywhere R.B. + Generally, good answers longer (up to a point)

N-gram Combining Current scoring favors longer or shorter spans? E.g. Roger or Bannister or Roger Bannister or Mr….. Bannister pry highest – occurs everywhere R.B. + Generally, good answers longer (up to a point) Update score: S c += ΣS t, where t is unigram in c Possible issues:

N-gram Combining Current scoring favors longer or shorter spans? E.g. Roger or Bannister or Roger Bannister or Mr….. Bannister pry highest – occurs everywhere R.B. + Generally, good answers longer (up to a point) Update score: S c += ΣS t, where t is unigram in c Possible issues: Bad units: Roger Bannister was

N-gram Combining Current scoring favors longer or shorter spans? E.g. Roger or Bannister or Roger Bannister or Mr….. Bannister pry highest – occurs everywhere R.B. + Generally, good answers longer (up to a point) Update score: S c += ΣS t, where t is unigram in c Possible issues: Bad units: Roger Bannister was – blocked by filters Also, increments score so long bad spans lower Improves significantly

N-gram Scoring Not all terms created equal

N-gram Scoring Not all terms created equal Usually answers highly specific Also disprefer non-units Solution

N-gram Scoring Not all terms created equal Usually answers highly specific Also disprefer non-units Solution: IDF-based scoring S c =S c * average_unigram_idf

N-gram Scoring Not all terms created equal Usually answers highly specific Also disprefer non-units Solution: IDF-based scoring S c =S c * average_unigram_idf

N-gram Scoring Not all terms created equal Usually answers highly specific Also disprefer non-units Solution: IDF-based scoring S c =S c * average_unigram_idf

N-gram Reranking Promote best answer candidates:

N-gram Reranking Promote best answer candidates: Filter any answers not in at least two snippets

N-gram Reranking Promote best answer candidates: Filter any answers not in at least two snippets Use answer type specific forms to raise matches E.g. ‘where’ -> boosts ‘city, state’ Small improvement depending on answer type

Summary Redundancy-based approaches Leverage scale of web search Take advantage of presence of ‘easy’ answers on web Exploit statistical association of question/answer text

Summary Redundancy-based approaches Leverage scale of web search Take advantage of presence of ‘easy’ answers on web Exploit statistical association of question/answer text Increasingly adopted: Good performers independently for QA Provide significant improvements in other systems Esp. for answer filtering

Summary Redundancy-based approaches Leverage scale of web search Take advantage of presence of ‘easy’ answers on web Exploit statistical association of question/answer text Increasingly adopted: Good performers independently for QA Provide significant improvements in other systems Esp. for answer filtering Does require some form of ‘answer projection’ Map web information to TREC document

Summary Redundancy-based approaches Leverage scale of web search Take advantage of presence of ‘easy’ answers on web Exploit statistical association of question/answer text Increasingly adopted: Good performers independently for QA Provide significant improvements in other systems Esp. for answer filtering Does require some form of ‘answer projection’ Map web information to TREC document Aranea download:

Deliverable #2: Due 4/19 Baseline end-to-end Q/A system: Redundancy-based with answer projection also viewed as Retrieval with web-based boosting Implementation: Main components Basic redundancy approach Basic retrieval approach (IR next lecture)

Data Questions: XML formatted questions and question series Answers: Answer ‘patterns’ with evidence documents Training/Devtext/Evaltest: Training: Thru 2005 Devtest: 2006 Held-out: … Will be in /dropbox directory on patas Documents: AQUAINT news corpus data with minimal markup