Question Classification II

Slides:

Advertisements

Similar presentations

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Advertisements

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

Question Classification (cont’d) Ling573 NLP Systems & Applications April 12, 2011.

Jiho Han Ronny (Dowon) Ko.  Objective: automatically generate the summary of review extracting the strength/weakness of the product  Use NLP techniques.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.

COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

Ling 570 Day 17: Named Entity Recognition Chunking.

Complex Linguistic Features for Text Classification: A Comprehensive Study Alessandro Moschitti and Roberto Basili University of Texas at Dallas, University.

Image Classification 영상분류

A Language Independent Method for Question Classification COLING 2004.

Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.

Enhanced Answer Type Inference from Questions using Sequential Models Vijay Krishnan Sujatha Das Soumen Chakrabarti IIT Bombay.

CPSC 503 Computational Linguistics

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Supertagging CMSC Natural Language Processing January 31, 2006.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)

Comp 411 Principles of Programming Languages Lecture 3 Parsing

Energy models and Deep Belief Networks

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Ling573 NLP Systems and Applications May 16, 2013

Tokenizer and Sentence Splitter CSCI-GA.2591

CSSE463: Image Recognition Day 11

Relation Extraction CSCI-GA.2591

CIS 700 Advanced Machine Learning Structured Machine Learning: Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.

CSCI 5832 Natural Language Processing

Are End-to-end Systems the Ultimate Solutions for NLP?

Probabilistic and Lexicalized Parsing

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

Machine Learning in Natural Language Processing

Question Classification

CSSE463: Image Recognition Day 11

Text Categorization Assigning documents to a fixed set of categories

COSC 4335: Other Classification Techniques

Word embeddings based mapping

CSCI 5832 Natural Language Processing

Overfitting and Underfitting

Support Vector Machines and Kernels

CSCI 5832 Natural Language Processing

Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing

CSSE463: Image Recognition Day 11

Chapter 10: Compilers and Language Translation

CSSE463: Image Recognition Day 11

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Extracting Why Text Segment from Web Based on Grammar-gram

Presentation transcript:

Question Classification II Ling573 NLP Systems and Applications April 30, 2013

Roadmap Question classification variations: Question series SVM classifiers Sequence classifiers Sense information improvements Question series

Question Classification with Support Vector Machines Hacioglu & Ward 2003 Same taxonomy, training, test data as Li & Roth

Question Classification with Support Vector Machines Hacioglu & Ward 200 Same taxonomy, training, test data as Li & Roth Approach: Shallow processing Simpler features Strong discriminative classifiers

Question Classification with Support Vector Machines Hacioglu & Ward 2003 Same taxonomy, training, test data as Li & Roth Approach: Shallow processing Simpler features Strong discriminative classifiers

Features & Processing Contrast: (Li & Roth) POS, chunk info; NE tagging; other sense info

Features & Processing Contrast: (Li & Roth) Preprocessing: POS, chunk info; NE tagging; other sense info Preprocessing: Only letters, convert to lower case, stopped, stemmed

Features & Processing Contrast: (Li & Roth) Preprocessing: Terms: POS, chunk info; NE tagging; other sense info Preprocessing: Only letters, convert to lower case, stopped, stemmed Terms: Most informative 2000 word N-grams Identifinder NE tags (7 or 9 tags)

Classification & Results Employs support vector machines for classification Best results: Bi-gram, 7 NE classes

Classification & Results Employs support vector machines for classification Best results: Bi-gram, 7 NE classes Better than Li & Roth w/POS+chunk, but no semantics

Classification & Results Employs support vector machines for classification Best results: Bi-gram, 7 NE classes Better than Li & Roth w/POS+chunk, but no semantics Fewer NE categories better More categories, more errors

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet?

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet?

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod?

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod?

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod? How much does a rhino weigh?

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod? How much does a rhino weigh?

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod? How much does a rhino weigh? Single contiguous span of tokens

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod? How much does a rhino weigh? Single contiguous span of tokens

Enhanced Answer Type Inference … Using Sequential Models Krishnan, Das, and Chakrabarti 2005 Improves QC with CRF extraction of ‘informer spans’ Intuition: Humans identify Atype from few tokens w/little syntax Who wrote Hamlet? How many dogs pull a sled at Iditarod? How much does a rhino weigh? Single contiguous span of tokens Who is the CEO of IBM?

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession?

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession?

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession? Idea: Augment Q classifier word ngrams w/IS info

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession? Idea: Augment Q classifier word ngrams w/IS info Informer span features: IS ngrams

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession? Idea: Augment Q classifier word ngrams w/IS info Informer span features: IS ngrams Informer ngrams hypernyms: Generalize over words or compounds

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession? Idea: Augment Q classifier word ngrams w/IS info Informer span features: IS ngrams Informer ngrams hypernyms: Generalize over words or compounds WSD?

Informer Spans as Features Sensitive to question structure What is Bill Clinton’s wife’s profession? Idea: Augment Q classifier word ngrams w/IS info Informer span features: IS ngrams Informer ngrams hypernyms: Generalize over words or compounds WSD? No

Effect of Informer Spans Classifier: Linear SVM + multiclass

Effect of Informer Spans Classifier: Linear SVM + multiclass Notable improvement for IS hypernyms

Effect of Informer Spans Classifier: Linear SVM + multiclass Notable improvement for IS hypernyms Better than all hypernyms – filter sources of noise Biggest improvements for ‘what’, ‘which’ questions

Perfect vs CRF Informer Spans

Recognizing Informer Spans Idea: contiguous spans, syntactically governed

Recognizing Informer Spans Idea: contiguous spans, syntactically governed Use sequential learner w/syntactic information

Recognizing Informer Spans Idea: contiguous spans, syntactically governed Use sequential learner w/syntactic information Tag spans with B(egin),I(nside),O(outside) Employ syntax to capture long range factors

Recognizing Informer Spans Idea: contiguous spans, syntactically governed Use sequential learner w/syntactic information Tag spans with B(egin),I(nside),O(outside) Employ syntax to capture long range factors Matrix of features derived from parse tree

Recognizing Informer Spans Idea: contiguous spans, syntactically governed Use sequential learner w/syntactic information Tag spans with B(egin),I(nside),O(outside) Employ syntax to capture long range factors Matrix of features derived from parse tree Cell:x[i,l], i is position, l is depth in parse tree, only 2 Values: Tag: POS, constituent label in the position Num: number of preceding chunks with same tag

Parser Output Parse

Parse Tabulation Encoding and table:

CRF Indicator Features Cell: IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NP Also, IsPrevTag, IsNextTag

CRF Indicator Features Cell: IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NP Also, IsPrevTag, IsNextTag Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

CRF Indicator Features Cell: IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NP Also, IsPrevTag, IsNextTag Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd All features improve

CRF Indicator Features Cell: IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NP Also, IsPrevTag, IsNextTag Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd All features improve Question accuracy: Oracle: 88%; CRF: 86.2%

Question Classification Using Headwords and Their Hypernyms Huang, Thint, and Qin 2008 Questions: Why didn’t WordNet/Hypernym features help in L&R?

Question Classification Using Headwords and Their Hypernyms Huang, Thint, and Qin 2008 Questions: Why didn’t WordNet/Hypernym features help in L&R? Best results in L&R - ~200,000 feats; ~700 active Can we do as well with fewer features?

Question Classification Using Headwords and Their Hypernyms Huang, Thint, and Qin 2008 Questions: Why didn’t WordNet/Hypernym features help in L&R? Best results in L&R - ~200,000 feats; ~700 active Can we do as well with fewer features? Approach: Refine features:

Question Classification Using Headwords and Their Hypernyms Huang, Thint, and Qin 2008 Questions: Why didn’t WordNet/Hypernym features help in L&R? Best results in L&R - ~200,000 feats; ~700 active Can we do as well with fewer features? Approach: Refine features: Restrict use of WordNet to headwords

Question Classification Using Headwords and Their Hypernyms Huang, Thint, and Qin 2008 Questions: Why didn’t WordNet/Hypernym features help in L&R? Best results in L&R - ~200,000 feats; ~700 active Can we do as well with fewer features? Approach: Refine features: Restrict use of WordNet to headwords Employ WSD techniques SVM, MaxEnt classifiers

Head Word Features Head words: Chunks and spans can be noisy

Head Word Features Head words: Chunks and spans can be noisy E.g. Bought a share in which baseball team?

Head Word Features Head words: Chunks and spans can be noisy E.g. Bought a share in which baseball team? Type: HUM: group (not ENTY:sport) Head word is more specific

Head Word Features Head words: Chunks and spans can be noisy E.g. Bought a share in which baseball team? Type: HUM: group (not ENTY:sport) Head word is more specific Employ rules over parse trees to extract head words

Head Word Features Head words: Chunks and spans can be noisy E.g. Bought a share in which baseball team? Type: HUM: group (not ENTY:sport) Head word is more specific Employ rules over parse trees to extract head words Issue: vague heads E.g. What is the proper name for a female walrus? Head = ‘name’?

Head Word Features Head words: Chunks and spans can be noisy E.g. Bought a share in which baseball team? Type: HUM: group (not ENTY:sport) Head word is more specific Employ rules over parse trees to extract head words Issue: vague heads E.g. What is the proper name for a female walrus? Head = ‘name’? Apply fix patterns to extract sub-head (e.g. walrus)

Head Word Features Head words: Chunks and spans can be noisy E.g. Bought a share in which baseball team? Type: HUM: group (not ENTY:sport) Head word is more specific Employ rules over parse trees to extract head words Issue: vague heads E.g. What is the proper name for a female walrus? Head = ‘name’? Apply fix patterns to extract sub-head (e.g. walrus) Also, simple regexp for other feature type E.g. ‘what is’ cue to definition type

WordNet Features Hypernyms: Enable generalization: dog->..->animal Can generate noise: also

WordNet Features Hypernyms: Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses?

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses? Restrict to matching WordNet POS

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses? Restrict to matching WordNet POS Which word senses?

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses? Restrict to matching WordNet POS Which word senses? Use Lesk algorithm: overlap b/t question & WN gloss

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses? Restrict to matching WordNet POS Which word senses? Use Lesk algorithm: overlap b/t question & WN gloss How deep?

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses? Restrict to matching WordNet POS Which word senses? Use Lesk algorithm: overlap b/t question & WN gloss How deep? Based on validation set: 6

WordNet Features Hypernyms: Adding low noise hypernyms Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person Adding low noise hypernyms Which senses? Restrict to matching WordNet POS Which word senses? Use Lesk algorithm: overlap b/t question & WN gloss How deep? Based on validation set: 6 Q Type similarity: compute similarity b/t headword & type Use type as feature

Other Features Question wh-word: What,which,who,where,when,how,why, and rest

Other Features Question wh-word: N-grams: uni-,bi-,tri-grams What,which,who,where,when,how,why, and rest N-grams: uni-,bi-,tri-grams

Other Features Question wh-word: N-grams: uni-,bi-,tri-grams What,which,who,where,when,how,why, and rest N-grams: uni-,bi-,tri-grams Word shape: Case features: all upper, all lower, mixed, all digit, other

Results Per feature-type results:

Results: Incremental Additive improvement:

Error Analysis Inherent ambiguity: What is mad cow disease? ENT: disease or DESC:def

Error Analysis Inherent ambiguity: Inconsistent labeling: What is mad cow disease? ENT: disease or DESC:def Inconsistent labeling: What is the population of Kansas? NUM: other What is the population of Arcadia, FL ?

Error Analysis Inherent ambiguity: Inconsistent labeling: Parser error What is mad cow disease? ENT: disease or DESC:def Inconsistent labeling: What is the population of Kansas? NUM: other What is the population of Arcadia, FL ? NUM:count Parser error

Question Classification: Summary Issue: Integrating rich features/deeper processing

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error Large numbers of features can be problematic for training

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error Large numbers of features can be problematic for training Alternative solutions:

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error Large numbers of features can be problematic for training Alternative solutions: Use more accurate shallow processing, better classifier

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error Large numbers of features can be problematic for training Alternative solutions: Use more accurate shallow processing, better classifier Restrict addition of features to

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error Large numbers of features can be problematic for training Alternative solutions: Use more accurate shallow processing, better classifier Restrict addition of features to Informer spans Headwords

Question Classification: Summary Issue: Integrating rich features/deeper processing Errors in processing introduce noise Noise in added features increases error Large numbers of features can be problematic for training Alternative solutions: Use more accurate shallow processing, better classifier Restrict addition of features to Informer spans Headwords Filter features to be added