Download presentation
Presentation is loading. Please wait.
1
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae
2
Subjectivity – the Annotation Scheme http://www.cs.pitt.edu/~wiebe/pubs/ardasummer02/ Goal: to identify and characterize expressions of private states in a sentence. Private state = opinions, evaluations, emotions and speculations. Also judge the strength of each private state: low, medium, high, extreme. Annotation gold standard: a sentence is subjective if it contains at least one private-state expression of medium or higher strength objective – all the rest The time has come, gentlemen, for Sharon, the assassin, to realize that injustice cannot last long.
3
Using Extraction Patterns to Learn Subjective Nouns – Meta-Bootstrapping (1/2) (Riloff and Jones 1999) Mutual bootstrapping: Begin with a small set of seed words that represent a targeted semantic category (e.g. begin with 10 words that represent LOCATIONS ) and an unannotated corpus. Produce thousands of extraction patterns for the entire corpus (e.g. “ was hired”) Compute a score for each pattern based on the number of seed words among its extractions Select the best pattern, all of its extracted noun phrases are labeled as the target semantic category Re-score extraction patterns (original seed words + newly labeled words)
4
Using Extraction Patterns to Learn Subjective Nouns – Meta-Bootstrapping (2/2) Meta-bootstrapping: After the normal bootstrapping all nouns that were put into the semantic dictionary are reevaluated each noun is assigned a score based on how many different patterns extracted it. only the 5 best nouns are allowed to remain in the dictionary; the others are discarded restart mutual bootstrapping
5
Using Extraction Patterns to Learn Subjective Nouns – Basilisk (Thelen and Riloff 2002) Begin with an unannotated text corpus and a small set of seed words for a semantic category Bootstrapping: Basilisk automatically generates a set of extraction patterns for the corpus and scores each pattern based upon the number of seed words among its extractions best patterns in the Pattern Pool. All nouns extracted by a pattern in the Pattern Pool Candidate Word Pool. Basilisk scores each noun based upon the set of patterns that extracted it and their collective association with the seed words. The top 10 nouns are labeled as the targeted semantic class and are added to the dictionary. Repeat bootstrapping process.
6
Using Extraction Patterns to Learn Subjective Nouns – Experimental Results The graph tracks the accuracy as bootstrapping progressed. Accuracy was high during the initial iterations but tapered off as the bootstrapping continued. After 20 words, both algorithms were 95% accurate. After 100 words, Basilisk was 75% accurate and MetaBoot 81%. After 1000 words, MetaBoot 28% and Basilisk 53%.
7
Creating Subjectivity Classifiers – Subjective Noun Features Naïve Bayes classifier using the nouns as features. Sets: BA-Strong: the set of StrongSubjective nouns generated by Basilisk BA-Weak: the set of WeakSubjective nouns generated by Basilisk MB-Strong: the set of StrongSubjective nouns generated by Meta-Bootstrapping MB-Weak: the set of WeakSubjective nouns generated by Meta-Bootstrapping For each set – a three-valued feature: presence of 0, 1, ≥2 words from that set
8
Creating Subjectivity Classifiers – Previously Established Features (Wiebe, Bruce, O’Hara 1999) Sets: a set of stems positively correlated with the subjective training examples – subjStems a set of stems positively correlated with the objective training examples – objStems For each set – a three-valued feature the presence of 0, 1, ≥2 members of the set. A binary feature for each: presence in the sentence of a pronoun, adjective, cardinal number, modal other than will, adverb other than not. Other features from other researchers.
9
Creating Subjectivity Classifiers – Discourse Features subjClues = all sets defined before except objStems Four features: ClueRate subj for the previous and following sentences ClueRate obj for the previous and following sentences Feature for sentence length.
10
Creating Subjectivity Classifiers – Classification Results The results of Naïve Bayes classifiers trained with different combinations of features. Using both WBO and SubjNoun achieves better performance than either one alone. The best results are achieved with all the features combined. Another classification, with a higher precision, can be obtained by classifying a sentence as subjective if it contains any of the StrongSubjective nouns. 87% precision 26% recall
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.