Ling573 NLP Systems and Applications May 16, 2013

Slides:

Advertisements

Similar presentations

Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >

Advertisements

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.

Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.

Group 3 Chad Mills Esad Suskic Wee Teck Tan 1. Outline  Pre-D4 Recap  General Improvements  Short-Passage Improvements  Results  Conclusion 2.

A Noisy-Channel Approach to Question Answering Authors: Echihabi and Marcu Date: 2003 Presenters: Omer Percin Emin Yigit Koksal Ustun Ozgur IR 2013.

Answer Extraction Ling573 NLP Systems and Applications May 19, 2011.

Answer Extraction Ling573 NLP Systems and Applications May 17, 2011.

Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.

Passage Retrieval & Re-ranking Ling573 NLP Systems and Applications May 5, 2011.

Question Classification (cont’d) Ling573 NLP Systems & Applications April 12, 2011.

Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.

Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.

Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Ensemble Learning (2), Tree and Forest

Information Retrieval in Practice

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Hang Cui et al. NUS at TREC-13 QA Main Task 1/20 National University of Singapore at the TREC- 13 Question Answering Main Task Hang Cui Keya Li Renxu Sun.

Query Processing: Query Formulation Ling573 NLP Systems and Applications April 14, 2011.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,

A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.

Ling573 NLP Systems and Applications May 7, 2013.

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Automatic Question Answering  Introduction  Factoid Based Question Answering.

Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.

Supertagging CMSC Natural Language Processing January 31, 2006.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

NTU & MSRA Ming-Feng Tsai

Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.

SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.

Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.

Learning Analogies and Semantic Relations Nov William Cohen.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Question Classification II

Queensland University of Technology

Linguistic Graph Similarity for News Sentence Searching

Large-Scale Content-Based Audio Retrieval from Text Queries

Query Reformulation & Answer Extraction

Ling573 NLP Systems and Applications May 30, 2013

Web News Sentence Searching Using Linguistic Graph Similarity

Relation Extraction CSCI-GA.2591

Entity- & Topic-Based Information Ordering

Dipartimento di Ingegneria «Enzo Ferrari»,

Web IR: Recent Trends; Future of Web Search

Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui

ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.

CSCI 5832 Natural Language Processing

Panagiotis G. Ipeirotis Luis Gravano

Question Answer System Deliverable #2

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Information Retrieval and Web Design

Introduction to Search Engines

Presentation transcript:

Ling573 NLP Systems and Applications May 16, 2013 Answer Extraction Ling573 NLP Systems and Applications May 16, 2013

Roadmap Deliverable 3 Discussion Deliverable 4 Answer extraction: What worked Deliverable 4 Answer extraction: Learning answer patterns Answer extraction: classification and ranking Noisy channel approaches

Reminder Rob Chambers Speech Tech talk & networking event This evening: 6:00pm Johnson 203 Speech Technology and Mobile Applications: Speech in Windows Phone

Deliverable #3 Document & Passage Retrieval What was tried: Query processing:

Deliverable #3 Question Answering: What was tried: Focus on question processing What was tried: Question classification

Deliverable #3 Question Answering: What was tried: Focus on question processing What was tried: Question classification Data: Li & Roth, TREC – given or hand-tagged Features: unigrams, POS, NER, head chunks, semantic info Classifiers: MaxEnt, SVM {+ confidence} Accuracies: mid-80%s

Deliverable #3 Question Answering: What was tried: Focus on question processing What was tried: Question classification Data: Li & Roth, TREC – given or hand-tagged Features: unigrams, POS, NER, head chunks, semantic info Classifiers: MaxEnt, SVM {+ confidence} Accuracies: mid-80%s Application: Filtering: Restrict results to have compatible class Boosting: Upweight compatible answers Gazetteers, heuristics, NER

Question Processing What was tried: Question Reformulation: Target handling: Replacement of pronouns, overlapping NPs, etc Per-qtype reformulations: With backoff to bag-of-words Inflection generation + irregular verb handling Variations of exact phrases

What was tried Assorted clean-ups and speedups Search result caching Search result cleanup, dedup-ing Google vs Bing Code refactoring

What worked Target integration: most variants helped Query reformulation: type specific Qtype boosting, in some cases Caching for speed/analysis

Results Major improvements over D2 baseline Most lenient results approach or exceed 0.1 MRR Current best: ~0.34 Strict results improve, but less than lenient

Deliverable #4 Answer extraction/refinement Fine-grained passages

Deliverable #4 Answer extraction/refinement Fine-grained passages Lengths not to exceed 100-char, 250-char

Deliverable #4 Answer extraction/refinement Fine-grained passages Lengths not to exceed 100-char, 250-char Evaluate on 2006 Devtest Final held-out evaltest from 2007 Released later, no tuning allowed

Deliverable #4 Any other refinements across system Question processing Retrieval – Web or AQUAINT Answer processing Whatever you like to improve final scores

Plug Error analysis Look at training and devtest data What causes failures? Are the answers in any of the retrieval docs? Web/TREC If not, why? Are answers retrieved by not highly ranked?

Last Plugs Tonight: 6pm: JHN 102 Tomorrow: 3:30 PCAR 291 Jay Waltmunson: Speech Tech and Mobile UW Ling Ph.D. Presentation and Networking Tomorrow: 3:30 PCAR 291 UW/MS Symposium Hoifung Poon (MSR): Semantic Parsing Chloe Kiddon (UW): Knowledge Extraction w/TML

Answer Extraction Pattern-based Extraction review Learning Answer Reranking I Noisy Channel Answer Extraction Learning Answer Reranking II

Answer Selection by Pattern Identify question types and terms Filter retrieved passages, replace qterm by tag Try to match patterns and answer spans Discard duplicates and sort by pattern precision

Pattern Sets WHY-FAMOUS BIRTHYEAR 1.0 <ANSWER> <NAME> called 1.0 laureate <ANSWER> <NAME> 1.0 by the <ANSWER> , <NAME> ,1.0 <NAME> - the <ANSWER> of 1.0 <NAME> was the <ANSWER> of BIRTHYEAR 1.0 <NAME> ( <ANSWER> - ) 0.85 <NAME> was born on <ANSWER> , 0.6 <NAME> was born in <ANSWER> 0.59 <NAME> was born <ANSWER> 0.53 <ANSWER> <NAME> was born

Results Improves, though better with web data

Limitations & Extensions Where are the Rockies? ..with the Rockies in the background

Limitations & Extensions Where are the Rockies? ..with the Rockies in the background Should restrict to semantic / NE type

Limitations & Extensions Where are the Rockies? ..with the Rockies in the background Should restrict to semantic / NE type London, which…., lies on the River Thames <QTERM> word* lies on <ANSWER> Wildcards impractical

Limitations & Extensions Where are the Rockies? ..with the Rockies in the background Should restrict to semantic / NE type London, which…., lies on the River Thames <QTERM> word* lies on <ANSWER> Wildcards impractical Long-distance dependencies not practical

Limitations & Extensions Where are the Rockies? ..with the Rockies in the background Should restrict to semantic / NE type London, which…., lies on the River Thames <QTERM> word* lies on <ANSWER> Wildcards impractical Long-distance dependencies not practical Less of an issue in Web search Web highly redundant, many local dependencies Many systems (LCC) use web to validate answers

Limitations & Extensions When was LBJ born? Tower lost to Sen. LBJ, who ran for both the…

Limitations & Extensions When was LBJ born? Tower lost to Sen. LBJ, who ran for both the… Requires information about: Answer length, type; logical distance (1-2 chunks)

Limitations & Extensions When was LBJ born? Tower lost to Sen. LBJ, who ran for both the… Requires information about: Answer length, type; logical distance (1-2 chunks) Also, Can only handle single continuous qterms Ignores case Needs handle canonicalization, e.g of names/dates

Integrating Patterns II Fundamental problem:

Integrating Patterns II Fundamental problem: What if there’s no pattern??

Integrating Patterns II Fundamental problem: What if there’s no pattern?? No pattern -> No answer!!! More robust solution: Not JUST patterns

Integrating Patterns II Fundamental problem: What if there’s no pattern?? No pattern -> No answer!!! More robust solution: Not JUST patterns Integrate with machine learning MAXENT!!! Re-ranking approach

Answering w/Maxent

Feature Functions Pattern fired: Binary feature

Feature Functions Pattern fired: Answer frequency/Redundancy factor: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results

Feature Functions Pattern fired: Answer frequency/Redundancy factor: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results Answer type match (binary)

Feature Functions Pattern fired: Answer frequency/Redundancy factor: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results Answer type match (binary) Question word absent (binary): No question words in answer span

Feature Functions Pattern fired: Answer frequency/Redundancy factor: Binary feature Answer frequency/Redundancy factor: # times answer appears in retrieval results Answer type match (binary) Question word absent (binary): No question words in answer span Word match: Sum of ITF of words matching b/t questions & sent

Training & Testing Trained on NIST QA questions Train: TREC 8,9; Cross-validation: TREC-10 5000 candidate answers/question Positive examples: NIST pattern matches Negative examples: NIST pattern doesn’t match Test: TREC-2003: MRR: 28.6%; 35.6% exact top 5

Noisy Channel QA Employed for speech, POS tagging, MT, summ, etc Intuition: Question is a noisy representation of the answer

Noisy Channel QA Employed for speech, POS tagging, MT, summ, etc Intuition: Question is a noisy representation of the answer Basic approach: Given a corpus of (Q,SA) pairs Train P(Q|SA) Find sentence with answer as Si,Aij that maximize P(Q|Si,Aij)

QA Noisy Channel A: Presley died of heart disease at Graceland in 1977, and.. Q: When did Elvis Presley die?

QA Noisy Channel A: Presley died of heart disease at Graceland in 1977, and.. Q: When did Elvis Presley die? Goal: Align parts of Ans parse tree to question Mark candidate answers Find highest probability answer

Approach Alignment issue:

Approach Alignment issue: Answer sentences longer than questions Minimize length gap Represent answer as mix of words/syn/sem/NE units

Approach Alignment issue: Answer sentences longer than questions Minimize length gap Represent answer as mix of words/syn/sem/NE units Create ‘cut’ through parse tree Every word –or an ancestor – in cut Only one element on path from root to word

Approach Alignment issue: Answer sentences longer than questions Minimize length gap Represent answer as mix of words/syn/sem/NE units Create ‘cut’ through parse tree Every word –or an ancestor – in cut Only one element on path from root to word Presley died of heart disease at Graceland in 1977, and.. Presley died PP PP in DATE, and.. When did Elvis Presley die?

Approach (Cont’d) Assign one element in cut to be ‘Answer’ Issue: Cut STILL may not be same length as Q

Approach (Cont’d) Assign one element in cut to be ‘Answer’ Issue: Cut STILL may not be same length as Q Solution: (typical MT) Assign each element a fertility 0 – delete the word; > 1: repeat word that many times

Approach (Cont’d) Assign one element in cut to be ‘Answer’ Issue: Cut STILL may not be same length as Q Solution: (typical MT) Assign each element a fertility 0 – delete the word; > 1: repeat word that many times Replace A words with Q words based on alignment Permute result to match original Question Everything except cut computed with OTS MT code

Schematic Assume cut, answer guess all equally likely

Training Sample Generation Given question and answer sentences Parse answer sentence Create cut s.t.: Words in both Q & A are preserved Answer reduced to ‘A_’ syn/sem class label Nodes with no surface children reduced to syn class Keep surface form of all other nodes 20K TREC QA pairs; 6.5K web question pairs

Selecting Answers For any candidate answer sentence: Do same cut process

Selecting Answers For any candidate answer sentence: Do same cut process Generate all candidate answer nodes: Syntactic/Semantic nodes in tree

Selecting Answers For any candidate answer sentence: Do same cut process Generate all candidate answer nodes: Syntactic/Semantic nodes in tree What’s a bad candidate answer?

Selecting Answers For any candidate answer sentence: Do same cut process Generate all candidate answer nodes: Syntactic/Semantic nodes in tree What’s a bad candidate answer? Stopwords Question words! Create cuts with each answer candidate annotated Select one with highest probability by model

Example Answer Cuts Q: When did Elvis Presley die? SA1: Presley died A_PP PP PP, and … SA2: Presley died PP A_PP PP, and …. SA3: Presley died PP PP in A_DATE, and … Results: MRR: 24.8%; 31.2% in top 5

Error Analysis Component specific errors: Patterns: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’

Error Analysis Component specific errors: Patterns: Stats based: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’ Stats based: No restrictions on answer type – frequently ‘it’

Error Analysis Component specific errors: Patterns: Stats based: Some question types work better with patterns Typically specific NE categories (NAM, LOC, ORG..) Bad if ‘vague’ Stats based: No restrictions on answer type – frequently ‘it’ Patterns and stats: ‘Blatant’ errors: Select ‘bad’ strings (esp. pronouns) if fit position/pattern

Combining Units Linear sum of weights?

Combining Units Linear sum of weights? Problematic: Misses different strengths/weaknesses

Combining Units Linear sum of weights? Learning! (of course) Problematic: Misses different strengths/weaknesses Learning! (of course) Maxent re-ranking Linear

Feature Functions 48 in total Component-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap

Feature Functions 48 in total Component-specific: Redundancy-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap Redundancy-specific: # times candidate answer appears (log, sqrt)

Feature Functions 48 in total Component-specific: Redundancy-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap Redundancy-specific: # times candidate answer appears (log, sqrt) Qtype-specific: Some components better for certain types: type+mod

Feature Functions 48 in total Component-specific: Redundancy-specific: Scores, ranks from different modules Patterns. Stats, IR, even QA word overlap Redundancy-specific: # times candidate answer appears (log, sqrt) Qtype-specific: Some components better for certain types: type+mod Blatant ‘errors’: no pronouns, when NOT DoW

Experiments Per-module reranking: Use redundancy, qtype, blatant, and feature from mod

Experiments Per-module reranking: Combined reranking: Use redundancy, qtype, blatant, and feature from mod Combined reranking: All features (after feature selection to 31)

Experiments Per-module reranking: Combined reranking: Use redundancy, qtype, blatant, and feature from mod Combined reranking: All features (after feature selection to 31) Patterns: Exact in top 5: 35.6% -> 43.1% Stats: Exact in top 5: 31.2% -> 41% Manual/knowledge based: 57%

Experiments Per-module reranking: Combined reranking: Use redundancy, qtype, blatant, and feature from mod Combined reranking: All features (after feature selection to 31) Patterns: Exact in top 5: 35.6% -> 43.1% Stats: Exact in top 5: 31.2% -> 41% Manual/knowledge based: 57% Combined: 57%+