Automatic recognition of discourse relations Lecture 3
Can RST analysis be done automatically? In the papers we’ll read the question is really about local rhetorical relations Part of the problem is the availability of training data for automatic labelling Manual annotation is slow and expansive Lots of data can be cleverly collected, but is it appropriate (SL’07 paper)
ME’02 Discourse relations are often signaled by cue phrases CONTRAST: but EXPLANATION-EVIDANCE: because But not always. In a manually annotated corpus 25% of contrast and explanation-evidence relations marked explicitly by a cue phrase Mary liked the play, John hated it He wakes up early every morning. There is a construction site opposite his building.
Cleverly labeling data through patterns with cue phrases CONTRAST [BOS…EOS][BOS But…EOS] [BOS…][but…EOS] [BOS…][although…EOS] [BOS Although…,][…EOS] CAUSE-EXPLANATION [BOS…][because…EOS] [BOS Because…,][…EOS] [BOS…EOS][Thus,…EOS]
Extraction patterns CONDITION [BOS If…,][…EOS] [BOS If…][then…EOS] [BOS…][if…EOS] ELABORATION [BOS…EOS][BOS…for example…EOS] [BOS…][which…EOS] NO-RELATION-SAME-TEXT NO-RELATION-DIFF-TEXT
Main idea Pairs of words can trigger a given relation John is good in math and sciences. Paul fails almost every class he takes. Embargo—legally Features for classification the cartesian product of the words in the two text spans being annotated
Probability of word-pairs given a relation log(W1,W2|RLk) + log(P(RLk) Classification results are well above the baseline Using only content words did not seem to be very helpful Model does not perform that well on manually annotated examples
Discussion Would be interesting to see the list of the most informative word-pairs per relation Is there an intrinsic difference in clauses explicitly marked for a relation compared to those where the relation is implicit?
B-GMR’07: Offer several improvements over ME’02 Tokenizing and stemming Improves accuracy Reduces model size Vocabulary size limit/minimum frequency Using 6,400 most frequent words is best Using a stoplist Performance deteriorates (as in the original ME’02 paper!) Topic segmentation for better example collection
SL’07 Using automatically labeled examples to classify rhetorical relations Is it a good idea? The answer is no, as already hinted by the other papers
Two classifiers Word-pair based Naïve Bayes Multi-feature (41) BoosTexter model Positional Length Lexical POS Temporal Cohesion (pronouns and ellipsis)
Explicit note here, not in the previous papers The distribution of different relations in the automatically extracted corpus does not reflect the true distribution In all studies data is downsampled
Testing on explicit relations Results deteriorate for both machine learning approaches Still better than random Natural data does not seem suitable for training Do not generalize well to examples which occur naturally without unambiguous discourse markers
Training on manually labeled, unmarked data Less training data is available Worse for the Naïve Bayes classifer Good for the Boostexter model Why? Semantic redundancy between discourse markers and the context they appear in?
Using the Penn discourse tree bank Implicit relations Not that good performance Explicit relations Performance closer to that in automatically collected test set Cheap data collection for this task probably not that good idea after all!