Automatic classification for implicit discourse relations Lin Ziheng.

Automatic classification for implicit discourse relations Lin Ziheng

PDTB and discourse relations  Explicit relations  Arg1: The bill intends to restrict the RTC to Treasury borrowings only, Arg2: unless the agency receives specific congressional authorization. (Alternative) (wsj_2200)  Implicit relations  Arg1: The loss of more customers is the latest in a string of problems.  Arg2: [for instance] Church's Fried Chicken Inc. and Popeye's Famous Fried Chicken Inc., which have merged, are still troubled by overlapping restaurant locations. (Instantiation) (wsj_2225)

EXPANSION Conjunction Instantiation Restatement specification equivalence generalization Alternative conjunctive disjunctive chosen alternative Exception List COMPARISON Contrast juxtaposition opposition Pragmatic Contrast Concession expectation contra-expectation Pragmatic Concession CONTINGENCY Cause reason result Pragmatic Cause justification Condition hypothetical general unreal present unreal past factual present factual past Pragmatic Condition relevance implicit assertion TEMPORAL Synchronous Asynchronous precedence succession PDTB and discourse relations (2)  PDTB hierarchy of relation classes, types and subtypes

Level-1 classesLevel-2 types Training instances %Adjusted % TEMPORALAsynchronous5834.36 Synchrony2131.59 CONTINGENCYCause342625.6125.63 Pragmatic Cause690.52 Condition10.01 Pragmatic Condition10.01 COMPARISONContrast165612.3812.39 Pragmatic Contrast40.03 Concession1961.47 Pragmatic Concession 10.01 EXPANSIONConjunction297422.2422.25 Instantiation11768.798.8 Restatement257019.2119.23 Alternative1581.18 Exception20.01 List3452.58 Total 13375 Adjusted total 13366 PDTB and discourse relations (3)  Level-2 relation types, on implicit dataset from the training sections (sec. 2 - 21)  Remove Condition, Pragmatic Condition, Pragmatic Contrast, Pragmatic Concession and Exception  11 relation types remained  Dominating types:  Cause  Conjunction  Restatement

Contextual features  Arg1: Tokyu Department Store advanced 260 to 2410. Arg2: [and] Tokyu Corp. was up 150 at 2890. (List) (wsj_0374)  Arg1: Tokyu Department Store advanced 260 to 2410. Tokyu Corp. was up 150 at 2890. Arg2: [and] Tokyu Construction gained 170 to 1610.(List) (wsj_0374) r 1.Arg1r 1.Arg2 r 2.Arg1 r 2.Arg2 r 1.Arg1r 1.Arg2r 2.Arg2 r1r1 r2r2 r2r2 r1r1 Shared argument Fully embedded argument r 2.Arg1

Contextual features (2)  For each relation curr, look at the surrounding two relations prev and next, giving to a total of six features Shared argument: 1.prev.Arg2 = curr.Arg1 2.curr.Arg2 = next.Arg1 Fully embedded argument: 1.prev embedded in curr.Arg1 2.next embedded in curr.Arg2 3.curr embedded in prev.Arg2 4.curr embedded in next.Arg1 First figure in previous slide where curr = r 2 Second figure in previous slide where curr = r 2

Syntactic Features  Arg1: "The HUD budget has dropped by more than 70% since 1980," argues Mr. Colton. Arg2: [so] "We've taken more than our fair share. (Cause) (wsj_2227)

Syntactic Features (2)  Collect all production rules:  Ignore function tags, such as -TPC, -SBJ, -EXT  From Arg1: S  NP VP, NP  DT NNP NN, VP  VBZ VP, VP  VBN PP PP, PP  IN NP, NP  QP NN, QP  JJ IN CD, NP  CD, DT  The, NNP  HUD, NN  budget, VBZ  has, VBN  dropped, IN  by, JJ  more, IN  than, CD  70, NN  %, IN  since, CD  1980  From Arg2: S  `` NP VP., NP  PRP, VP  VBP VP, VP  VBN NP, NP  NP PP, NP  JJR, PP  IN NP, NP  PRP$ JJ NN, ``  ``, PRP  We, VBP  ‘ve, VBN  taken, JJR  more, IN  than, PRP$  our, JJ  fair, NN  share,. .

Dependency features

Dependency features (2)  Collect all words with dependency types from their dependents  From Arg1: budget  det nn, dropped  nsubj aux prep prep, by  pobj, than  advmod, 70  quantmod, %  num, since  pobj, argues  ccomp nsubj, Colton  nn  From Arg2: taken  nsubj aux dobj, more  prep, than  pobj, share  poss amod

Lexical features  Collect all word pairs from Arg1 and Arg2, i.e., all (w i, w j ) where w i is a word from Arg1 and w j is a word from Arg2

Experiments  Classifier: OpenNLP MaxEnt  Training data: sections 2 – 21  Test data: section 23  Use Mutual Information(MI) to rank features for production rules, dependency rules and word pairs separately  Majority baseline: 26.1%, where all instances are classified into Cause

Experiments (2)  Use contextual features and one other feature class  context + production rules  context + dependency rules  context + word pairs

Experiments (3)  With large numbers of features  context + all production rules: 36.68%  context + all dependency rules:27.94%  context + 10,000 word pairs:35.25%

Experiments (4)  Combine all feature classes, got an accuracy of 40.21%.  The following shows that all feature classes contribute to the performance Production rules Dependency rules Word pairsContextAccuracy 20000No37.5979 20000Yes38.3812 2000 Yes39.9478 200150200Yes40.2089

Automatic classification for implicit discourse relations Lin Ziheng.

Similar presentations

Presentation on theme: "Automatic classification for implicit discourse relations Lin Ziheng."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic classification for implicit discourse relations Lin Ziheng.

Similar presentations

Presentation on theme: "Automatic classification for implicit discourse relations Lin Ziheng."— Presentation transcript:

Similar presentations

About project

Feedback