Automatic classification for implicit discourse relations Lin Ziheng.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin, Hwee Tou Ng and Min-Yen Kan Department of Computer Science National University.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
LING 388: Language and Computers Sandiway Fong Lecture 2.
Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,
GLARF-ULA: ULA08 Workshop March 19, 2007 GLARF-ULA: Working Towards Usability Unified Linguistic Annotation Workshop Adam Meyers New York University March.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Constituent and Dependency Trees notes for CSCI-GA.2590 Prof. Grishman.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Discourse Parsing in the Penn Discourse Treebank: Using Discourse Structures to Model Coherence and Improve User Tasks Ziheng Lin Ph.D. Thesis Proposal.
Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.
Two-Phase Semantic Role Labeling based on Support Vector Machines Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ.
Unsupervised Models for Named Entity Classifcation Michael Collins Yoram Singer AT&T Labs, 1999.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Context Free Grammar S -> NP VP NP -> det (adj) N
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
TagHelper & SIDE Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Ling 570 Day 17: Named Entity Recognition Chunking.
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
CSA2050 Introduction to Computational Linguistics Parsing I.
CPSC 503 Computational Linguistics
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Automatic recognition of discourse relations Lecture 3.
Part-of-speech tagging
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Relation Extraction: Rule-based Approaches CSCI-GA.2590 Ralph Grishman NYU.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Progress Update Lin Ziheng 6/26/2016. Outline  Update Summarization  Opinion Summarization  Discourse Analysis 6/26/2016.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.
COSC 6336 Natural Language Processing Statistical Parsing
Statistical NLP Winter 2009
Introduction to Machine Learning and Text Mining
Authorship Attribution Using Probabilistic Context-Free Grammars
Syntactic Category Prediction for Improving Translation Quality in English-Korean Machine Translation Sung-Dong Kim, Dept. of Computer Engineering, Hansung.
CS 388: Natural Language Processing: Statistical Parsing
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
CS : Language Technology For The Web/Natural Language Processing
By Hossein Hematialam and Wlodek Zadrozny Presented by
Presentation transcript:

Automatic classification for implicit discourse relations Lin Ziheng

PDTB and discourse relations  Explicit relations  Arg1: The bill intends to restrict the RTC to Treasury borrowings only, Arg2: unless the agency receives specific congressional authorization. (Alternative) (wsj_2200)  Implicit relations  Arg1: The loss of more customers is the latest in a string of problems.  Arg2: [for instance] Church's Fried Chicken Inc. and Popeye's Famous Fried Chicken Inc., which have merged, are still troubled by overlapping restaurant locations. (Instantiation) (wsj_2225)

EXPANSION Conjunction Instantiation Restatement specification equivalence generalization Alternative conjunctive disjunctive chosen alternative Exception List COMPARISON Contrast juxtaposition opposition Pragmatic Contrast Concession expectation contra-expectation Pragmatic Concession CONTINGENCY Cause reason result Pragmatic Cause justification Condition hypothetical general unreal present unreal past factual present factual past Pragmatic Condition relevance implicit assertion TEMPORAL Synchronous Asynchronous precedence succession PDTB and discourse relations (2)  PDTB hierarchy of relation classes, types and subtypes

Level-1 classesLevel-2 types Training instances %Adjusted % TEMPORALAsynchronous Synchrony CONTINGENCYCause Pragmatic Cause Condition10.01 Pragmatic Condition10.01 COMPARISONContrast Pragmatic Contrast40.03 Concession Pragmatic Concession EXPANSIONConjunction Instantiation Restatement Alternative Exception20.01 List Total Adjusted total PDTB and discourse relations (3)  Level-2 relation types, on implicit dataset from the training sections (sec )  Remove Condition, Pragmatic Condition, Pragmatic Contrast, Pragmatic Concession and Exception  11 relation types remained  Dominating types:  Cause  Conjunction  Restatement

Contextual features  Arg1: Tokyu Department Store advanced 260 to Arg2: [and] Tokyu Corp. was up 150 at (List) (wsj_0374)  Arg1: Tokyu Department Store advanced 260 to Tokyu Corp. was up 150 at Arg2: [and] Tokyu Construction gained 170 to 1610.(List) (wsj_0374) r 1.Arg1r 1.Arg2 r 2.Arg1 r 2.Arg2 r 1.Arg1r 1.Arg2r 2.Arg2 r1r1 r2r2 r2r2 r1r1 Shared argument Fully embedded argument r 2.Arg1

Contextual features (2)  For each relation curr, look at the surrounding two relations prev and next, giving to a total of six features Shared argument: 1.prev.Arg2 = curr.Arg1 2.curr.Arg2 = next.Arg1 Fully embedded argument: 1.prev embedded in curr.Arg1 2.next embedded in curr.Arg2 3.curr embedded in prev.Arg2 4.curr embedded in next.Arg1 First figure in previous slide where curr = r 2 Second figure in previous slide where curr = r 2

Syntactic Features  Arg1: "The HUD budget has dropped by more than 70% since 1980," argues Mr. Colton. Arg2: [so] "We've taken more than our fair share. (Cause) (wsj_2227)

Syntactic Features (2)  Collect all production rules:  Ignore function tags, such as -TPC, -SBJ, -EXT  From Arg1: S  NP VP, NP  DT NNP NN, VP  VBZ VP, VP  VBN PP PP, PP  IN NP, NP  QP NN, QP  JJ IN CD, NP  CD, DT  The, NNP  HUD, NN  budget, VBZ  has, VBN  dropped, IN  by, JJ  more, IN  than, CD  70, NN  %, IN  since, CD  1980  From Arg2: S  `` NP VP., NP  PRP, VP  VBP VP, VP  VBN NP, NP  NP PP, NP  JJR, PP  IN NP, NP  PRP$ JJ NN, ``  ``, PRP  We, VBP  ‘ve, VBN  taken, JJR  more, IN  than, PRP$  our, JJ  fair, NN  share,. .

Dependency features

Dependency features (2)  Collect all words with dependency types from their dependents  From Arg1: budget  det nn, dropped  nsubj aux prep prep, by  pobj, than  advmod, 70  quantmod, %  num, since  pobj, argues  ccomp nsubj, Colton  nn  From Arg2: taken  nsubj aux dobj, more  prep, than  pobj, share  poss amod

Lexical features  Collect all word pairs from Arg1 and Arg2, i.e., all (w i, w j ) where w i is a word from Arg1 and w j is a word from Arg2

Experiments  Classifier: OpenNLP MaxEnt  Training data: sections 2 – 21  Test data: section 23  Use Mutual Information(MI) to rank features for production rules, dependency rules and word pairs separately  Majority baseline: 26.1%, where all instances are classified into Cause

Experiments (2)  Use contextual features and one other feature class  context + production rules  context + dependency rules  context + word pairs

Experiments (3)  With large numbers of features  context + all production rules: 36.68%  context + all dependency rules:27.94%  context + 10,000 word pairs:35.25%

Experiments (4)  Combine all feature classes, got an accuracy of 40.21%.  The following shows that all feature classes contribute to the performance Production rules Dependency rules Word pairsContextAccuracy 20000No Yes Yes Yes