1 Discourse Connective Argument Identification With Connective Specific Markers Robert Elwell and Jason Baldridge Alexander Shoulson University of Pennsylvania Feb. 26, 2011
2 Overview ► Introduction ► Problem Motivation ► Model ► Type-based Approach ► Feature Engineering ► Results and Analysis ► Conclusions
3 Introduction ► Problem is to identify ARG1 and ARG2 ► Uses PDTB 1.0 ► Already given gold-standard connectives ► Follows Wellner and Pustejovsky (W&P) (2007) ► Their work uses maximum-entry rankers, and treat every connective as the same class ► E&B argue that this causes conflicting information ► Propose treating classes of connectives differently ► One model for each
4 Which classes? (1/3) ► Subordinating conjunctions Drug makers shouldn’t be able to duck liability because people couldn’t identify precisely which identical drug was used.
5 Which classes? (2/3) ► Subordinating conjunctions ► Coordinating conjunctions Choose 203 business executives, including, perhaps, someone from your own staff, and put them out on the streets, to be deprived for one month of their homes, families, and income.
6 Which classes? (3/3) ► Subordinating conjunctions ► Coordinating conjunctions ► Adverbial connectives France’s second-largest government-owned insurance company, Assurances Generales de France, is building its own Navigation Mixte stake, currently thought to be between 8% and 10%. Analyists said they don’t think it is contemplating a takeover, however, and its officials couldn’t be reached.
7 Class Relevance (1/2) ► Why split by classes? ► Different classes behave differently ► Adverbials, for instance, prefer more distant arguments ► Why split by these classes? ► Behavior based on syntactic type (Knott 1996) ► Subordinating conjunctions, coordinating conjunctions, discourse adverbials, prepositional phrases, phrases taking sentence complements ► Only some connectives have structural links to their arguments, while others have anaphoric links
8 Class Relevance (2/2) ► Examples of connectives of each class: CoordinatingSubordinatingOther and or but yet then because when since even though except when afterwards previously nonetheless actually again
9 Exploiting PDTB (1/2) ► Overlapping arguments/connectives John loves Barolo. He ordered three cases of the ’97. But he had to cancel the order because then he discovered he was broke.
10 Exploiting PDTB (2/2) ► Why does this matter? ► Can use for feature engineering ► Include features that state: ► Previous and following connectives ► Whether or not there is an overlap of candidates
11 Model Overview (1/2) ► Two Stages ► Identify heads of candidate arguments ► Select the best candidate ► Some restrictions: ► Select candidates (according to W&P) only within ten steps of the connective ► Stay within the same sentence as connective for ARG2
12 Model Overview (2/2) ► Use Maximum Entropy Ranker ► Why Maximum Entropy? ► Accurate ► No independence assumption (a la Naïve Bayes) ► Good for overlapping features ► Fast to train (just a bunch of weights) ► Why a ranker as opposed to classifier? ► Classifiers identify all likely candidates ► No indication of which candidates are better ► Rankers give a likelihood for every candidate ► Can select the most likely
13 Ranking Model Formula
14 Three Core Sets of Models (1/3) ► Generalized Connective Model ► Treats all of the connectives as the same class ► Same as W&P ► We will refer to this as GC
15 Three Core Sets of Models (2/3) ► Generalized Connective Model (GC) ► Connective-Specific Models ► Train one model for each connective ► Captures nuanced word-specific information ► Comes at the cost of data (connectives are sparse) ► Could have unseen connectives during testing ► Backoff to the GC model ► Refer to this model as SC
16 Three Core Sets of Models (3/3) ► Generalized Connective Model (GC) ► Connective-Specific Models (SC) ► Type-Specific Models ► Uses the three types discussed previously ► Subordinating ► Coordinating ► Adverbial ► Determine connective types a priori from a dictionary ► Refer to these models as TC
17 Three Core Sets of Models (3/3) ► Generalized Connective Model (GC) ► Connective-Specific Models (SC) ► Type-Specific Models (TC)
18 ► Use weights to combine all three core model types ► Two steps: ► Combine TC and GC (to TG) ► Combine SC with TG (to SGT) Model Interpolation (1/3) Recall: GC – One model overall (General connective) TC – One model per “type” (Type-specific) SC – One model per connective (Connective-specific)
19 Model Interpolation (2/3)
20 Model Interpolation (3/3)
21 ► Use all features from W&P, plus the following: Feature Engineering (1/4)
22 ► These introduce a higher level of context sensitivity ► Allow inference based on surrounding connectives Feature Engineering (2/4)
23 ► In particular, these deal with discourse in quotes ► As opposed to the scope of the entire document ► Offer some degree of attribution detection Feature Engineering (3/4)
24 ► Introduces features from the morpha stemmer ► Discourages selection of head words that are immediate constituents of other connectives Feature Engineering (4/4)
25 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3)
26 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3)
27 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3) Suggests that different feature sets might be useful for ARG1 and ARG2
28 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3) TC-ALL is still too coarse-grained for ARG2
29 ► Accuracy comparison by parse source Results (2/3) Bikel ParserGold Standard
30 ► Accuracy comparison by parse source ► This model doesn’t suffer as much as W&P ► Possibly because features are less dependent on syntax Results (2/3) Bikel ParserGold Standard
31 ► Accuracy on gold-standard parses by type: Results (3/3)
32 ► Accuracy on gold-standard parses by type: ► This 67.5 stands out. Why is it so high? ► Adverbials are tricky ► Sparse, but don’t behave like the two other types ► GC treats them too much like subordinating and coordinating ► SC doesn’t have enough data for each adverbial ► Usually one each per document ► So treating them as a class yields the best results Results (3/3)
33 ► Improvement over W&P (77.8% vs. 74.2%) ► How? ► Richer model ► Interpolation of different connective classes ► Exploit qualities unique to each class and connective ► Additional features ► Morpology (morpha stemmer) ► Context sensitivity ► Awareness of other connectives ► Ways to further improve? ► Reranking a la W&P ► Different feature sets for ARG1 and ARG2 ► Use PDTB 2.0 Conclusions
34 References Robert Elwell and Jason Baldridge Discourse Connective Argument Identification with Connective Specific Rankers. In Proceedings of the 2008 IEEE International Conference on Semantic Computing (ICSC '08). IEEE Computer Society, Washington, DC, USA, DOI= /ICSC