Download presentation
Presentation is loading. Please wait.
1
The Resolution of Speculation and Negation
Thomas McMahon
2
Introduction Paper: Speculation and Negation: Rules, Rankers, and the Role of Syntax Written by Erik Velldal, Lilja Ovrelid, Jonathon Read, and Stephan Oepen. Focus: Classification of cues and resolution of scope in speculative text. Techniques used for speculation then applied for negation resolution with minimal change.
3
Topic Choice I chose this topic, as I believed that the techniques used in the paper could also be applied to my own project (with some modifications). Additionally, as a portion of the paper discusses combining machine learning with manually-defined rules, I was interested in how this combination could be implemented.
4
Domain: Biomedical Texts
In 2008, a corpus of biomedical texts was released, known as the “Bioscope corpus”. Among other things, the corpus includes annotations for both speculation and negation. Due to the release of this corpus, there has been an increase in interest in the resolution of speculation and negation, specifically in the field of Biomedical texts. This paper is no different, and uses the Bioscope corpus as training and testing data for their models.
5
What is Speculation? In NLP, speculation refers to uncertainty in text. If a statement is speculative, it could be either true or false. “He could have gone to the store”. In this sentence, the person referred to as “He” may or may not have actually gone to the store. The word “could” is known as a speculation cue. In this case, the scope of the speculation is the entire sentence.
6
Components of Speculation Resolution
There are two major tasks in speculation resolution: Cue Recognition Scope Resolution A more complex example: “He went to the store, and he may have returned.” Only a portion of the sentence is speculative. Scope resolution is the task of recognizing the speculation bounds.
7
Speculation: Cue Identification
High Level: Classifying every token in a given text as either “cue” or “non-cue”. Problems: Some cues are composed of multiple tokens. “These results indicate that there is a correlation.” Almost all words that act as speculative cues also function as non-cues as well. Cue: “They appeared to be talking” Non-Cue: “They appeared within five minutes of the act.”
8
Cue Identification Evaluation
There are three levels in which cue identification is measured in this paper: Sentence-level Whether or not a sentence contains speculation Token-level Whether or not a given token is a speculative cue Cue-level Whether or not a cue has been classified correctly
9
Cue Identification: Initial Approach
Binary Token Classifier used to classify every word as either a “cue” or a “non-cue” Referred to as “Word-by-word Classification” (WbW). Implemented using a Support Vector Machine (SVM) F1 measure of 79.82% on test data for Cue-level evaluation. Unable to identify cues unseen in training data.
10
Filtered Approach The second attempt filtered the words being classified, so that only words that occurred as cues in the training data would be considered. This reduces the complexity of the model, and the set of relevant training data. Cannot resolve unseen cues, but neither could the unfiltered approach. 10-fold split showed average of 90% of cues seen in test data also occur in training data.
11
Classifier Global classifier for all cues is trained, rather than a unique classifier for each cue. Most cues appear too infrequently to train individual classifiers. Most multi-word cues contain highly frequent function words, which are filtered out of the training data. Once again, an SVM is used for classification.
12
Support Vector Machine
Machine Learning model used to classify new data based on training data. Creates a hyperplane to divide instances into different classes. Data must have features that are used to plot the instances on either side of the hyperplane.
13
Model Features After testing various feature configurations, it was decided that n-grams should be used rather than syntactic context. The features used are: Lemma form of neighbors, 3 positions left and right of potential cue. Surface form of neighbors, two positions right of potential cue. Cue disregarded, for generality.
14
Multiword Cues (MWC) Heuristics
As each cue is classified individually, it is possible that one portion of a MWC is classified correctly, while the other isn’t. Since the majority of MWCs are composed of a few highly frequent n-grams, heuristics can be applied to help capture these instances. The most frequent MWC is “indicate that”. So, if “indicate” is classified as a cue and “that” follows, “that” is automatically considered a cue.
15
MWC Heuristic List cannot {be}? exclude either .+ or indicate that
may ,? or may not no {evidence | proof | understood | exclude} raise the .*{possibility | question | issue | hypothesis} whether or not
16
Performance of Classifiers
The chart below shows the results of each classifier on the test data. Results shown are the cue-level evaluation numbers. The baseline classifier simply applies “cue” and “non-cue” based on the frequency that the specific token was either. Classifier F1 rating (Training Data) F1 rating (Test Data) Baseline 85.57 73.80 WbW Classifier 88.02 79.82 Filtered Model 89.11 80.80
17
Error Analysis 74% of errors are False Negatives.
most of these are due to function word cues, as they are highly ambiguous. Also caused by tokens that only appear once as a cue in the training data. 26% of errors are False Positives. Due to annotation errors on the corpus, 60% of these should actually be considered True Positives.
18
Speculation Scope Resolution
The task of finding the amount of a given sentence that is affected by a speculation cue. Only applies to sentences that contain a speculation cue. As the corpus itself states, the scope of speculation has a heavy correlation with the syntactic structure of the sentence.
19
Scope Resolution Techniques
Two different approaches are used, based on differing views of syntactic analysis: Dependency Parses. Constituent Parses. The dependency parse is utilized for a manually-defined rule-based approach. The constituent parse on the other hand is used for a machine learning approach.
20
The unknown amino acid may be used by these species.
Running Example The following sentence will be used (as it is in the paper) to demonstrate the effects of the different techniques: The unknown amino acid may be used by these species. Red = Speculation Cue Blue = Speculation Scope
21
Rule-Based Approach Speculative sentence is parsed into a dependency tree, with additional syntactic information. MaltParser used for this purpose. Speculation scope is defaulted to start at the cue, and continue to the end of the sentence. Scope is then modified based on the PoS tag of the speculation cue.
22
Example Dependency parse of example sentence.
Cue “may” recognized as a modal verb.
23
The unknown amino acid may be used by these species.
Example Default scope set to begin at the cue to end of the sentence. The unknown amino acid may be used by these species.
24
The unknown amino acid may be used by these species.
Example When the cue is a modal verb, the subject is included in the scope if the control verb (used) is recognized as either a passive or raising verb. The unknown amino acid may be used by these species.
25
Chart of Rules The chart below shows the 10 rules used to define the scope of speculation in this approach. Rules defined are based on both instructions given to the annotators of the corpus, as well as on observations from inspecting examples.
26
Results of Rule-Based Approach
Approach able to replicate the exact scope of speculation 72.3% of the time. Of the 185 failures, expert analysis shows that: 85 of the errors were genuine system errors. 22 were caused by annotation errors. The cause of the remaining 7 is up for debate. From the findings, the conclusion is that this approach fails to resolve scopes that more closely correspond to a constituent parse.
27
Data-Driven Approach Approach uses constituent parses of a sentence, rather than a dependency parse. Constituent parse built using the LinGO English Resource Grammar (ERG). after constituent tree is built, candidate scopes are selected for ranking. Ranking done using a Ranking SVM. Candidate scopes consist of any ancestor of the cue word in the parse.
28
Example Constituent Parse
Below is an example constituent tree produced using the ERG.
29
Example Constituent Parse
From the cue, three candidate scopes are identified.
30
Processing Training Data
Training data is processed by first parsing all sentences that contain speculation cues into constituent trees. If there is a candidate scope that corresponds to the actual scope of the annotations, it is labeled as “correct”. All other candidate scopes are labelled as “incorrect”.
31
Slackening Heuristics
There are times when a candidate scope will nearly match the actual scope, but is missing certain tokens/punctuation, or includes too much. Certain heuristics to handle certain recurring instances of this problem.
32
Candidate Scope Features
Path from the cue to the constituent. In our example, v_vp_mdl-p_le -> hd-cmp_u_c -> sb-hd_mc_c is the path from the topmost constituent to the cue. Generalized version: Just cue and constituent: (may, sb-hd_mc_c). Surface Properties, such as position of cue in constituent. Linguistic phenomena discovered in Rule-Based approach.
33
Results of Two Approaches
The following chart shows the performance of both the rule-based approach and data-driven approach. These results are based on gold standard cues, rather than the input of the cue classifier. Approach F1 measure (Training Data) F1 measure (Test Data) Rule-Based 73.40 66.60 Data-Driven 73.61 58.37
34
Hybrid Approach Both approaches perform fairly well. Though the rules outperform the ranker, analysis of each approach and their errors show that they tend to fail in different situations. Because of this, a hybrid approach is attempted, hoping to leverage the advantages of both approaches.
35
Combining Approaches The hybrid approach works as follows:
If an ERG parse is available for the given sentence, information from the constituent ranker is coupled with the dependency rules. This is done by adding a feature that states whether or not there exists a candidate scope that matches the scope predicted by the dependency rules. Elsewise, the dependency rules alone are used.
36
Hybrid Results The following table shows the results for all three approaches, showing that the hybrid approach outperforms the other two approaches individually. Approach F1 Score (Training Data) F1 Score (Test Data) Rules 73.40 66.60 Ranker 73.61 58.37 Hybrid 78.69 69.60
37
Overall Speculation Performance
The following chart shows the performance of the three approaches, using their cue classifier. Outperforms next leading end-to-end performer, but not significantly. Approach F1 Score (Training Data) F1 Score (Test Data) Rule-Based 68.61 56.48 Data-Driven 68.64 49.52 Hybrid 73.11 59.41
38
Negation Negation, like speculation, consists of a cue word and a scope. Detecting negation is an important component of sentiment analysis. As negation and speculation are treated similarly in the corpus, the approaches used for speculation are ported to negation, with slight modifications.
39
Negation Cue Detection
Similar to speculation, 82% of the negation cues that occur in the training data also have non-cue uses. The following modifications were made to the cue classifier, to deal with negation: Feature set changed. MWC heuristics added.
40
Classifier Modifications
Uses lemmas two positions left and right of focus word and Surface Form one position to the right. The same method for resolving MWC occurrences is used here, except with a different set of rules: rather than {can|could} not no longer instead of with the * exception of neither * nor {no(t?)|neither} * nor
41
Negation Cue Results The chart below shows the F1 score for the cue classifier for negation. The same WbW classifier is used for comparison. Once again, measures here are for the cue-level evaluation. Classifier F1 Measure (Training Data) F1 Measure (Test Data) WbW Classifier 94.37 84.84 Filtered Approach 96.45 90.33
42
Cue Classifier Error Analysis
Whereas the speculation classifier mostly consisted of False Negatives (74%), the negation classifier’s errors mostly consist of False Positives (85%). Large amount of false positives is caused by two highly ambiguous tokens: not absence
43
Adapting Rule-Based Approach
The rule-based approach for speculation contains many rules based on the instructions given to annotators of the corpus. The same rules were applied to negation annotation, and so the same rules are used for the scope resolver. Additional rules are also defined that deal with specific negation situations.
44
Negation Specific Rules
The chart below shows the 6 rules added for negation.
45
Adapting Data-Driven Approach
Additional heuristic added for slackening scope. Linguistic features added to record presence of adverb cues with verbal heads.
46
Results of Scope Resolution - Negation
The chart below shows the results for both approached individually, followed by the performance of the same hybrid approach as described before. Approach F1 measure (Training Data) F1 measure (Test Data) Rules 70.91 65.59 Ranker 68.35 60.90 Hybrid 74.35 70.21
47
Overall Performance of Negation Resolution
The chart below shows the performance of scope resolution, combined with the cues provided by their classifier. Only results for hybrid approach are given, in this case. Morante, being purely ML, takes a heavier hit when evaluating with test data. Approach F1 Measure (Training Data) F1 Measure (Test Data) Morante et al. 65.79 40.72 Hybrid 71.05 62.98
48
Conclusion System achieves state-of-the-art performance in end-to-end evaluation for both speculation and negation. Performance comparable to other top performing systems, in some aspects achieving the best results published at the time. Linguistically-informed approach, rather than strict statistic methods.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.