Download presentation
Presentation is loading. Please wait.
Published byElwin Long Modified over 9 years ago
1
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter: Yong-Xiang Chen
2
Background & Motivation Subjectivity analysis focuses on determining whether a language unit expresses subjectivity –private state, opinion or attitude –and, if so, what polarity is expressed Many words being subjectivity-ambiguous –Having both subjective and objective senses –Example: two sense of the word “positive” having a positive electric charge (objective) involving advantage or good (subjective) –The annotation of words independent of sense or domain does not capture such distinctions
3
Goal & Advantage Determine the subjectivity of word sense –Avoid costly annotation during training step –Evaluate how useful of existing resources Which are not tailored towards word sense Increase the lexica’s usability –Allow to group fine-grained senses into higher-level classes based on subjectivity/objectivity Improve WSD task –For subjectivity-ambiguous words
4
Related work Esuli and Sebastiani (2006) –Determine the polarity of word senses in WordNet –Training set: Expand a small, manually determined seed set of WordNet senses –Use the resulting larger training set for supervised classification Wiebe and Mihalcea (2006) –Label word senses in WordNet as subjective or objective –The method relying on an independent, large manually annotated opinion corpus (MPQA) distributional similarity
5
Subjectivity VS. Polarity In this study, do not see polarity as a indicator to the subjectivity of sense –Most subjective senses have a relatively clear polarity –But polarity can be attached to objective words/senses as well Tuberculosis 結核病 (objective)(negative)
6
Annotation for subjectivity and polarity of word senses Annotate the Micro-WNOp corpus as test set –containing 1,105 WordNet synsets Subjectivity –subjective (S), objective (O), both (B) –(B): a WordNet synset contains both opinionated and objective expressions Polarity –positive (P), negative (N), varied (V) –(V): a sense’s polarity varies strongly with the context Uncompromising( 不妥協 ) will be positive or negative depending on what a person is uncompromising 7 sub categories –O:NoPol, O:P, O:N, S:P, S:N, S:V, and B
7
Annotation scheme Manually annotate polarity for subjective senses, as well as objective senses that carry a strong association –Annotate subjectivity for finding and analysing directly expressed opinions –Annotate polarity for either classifying these further or extracting objective words
8
High Agreement The overall agreement using all 7 categories is 84.6%, with a kappa of 0.77 –Between two annotators High agreement is due to –annotation of senses instead of words –sense descriptions providing more information –split of subjectivity and polarity annotation made the task clearer
9
Gold Standard The purpose is focus on subjectivity, so integrate labels into: S, O, B The Micro-WNOp corpus includes 298 different words –97 (32.5%) are subjectivity-ambiguous Excluded all senses with the label B from Micro- WNOp for testing the automatic algorithms –resulting in a final 1061 senses 703 objective 358 subjective
10
Algorithms 1.Standard Supervised Approach 2.Sentence Collections: Movie 3.Sentence Collections: MPQA 4.Word Lists: General Inquirer 5.Word Lists: Subjectivity List
11
Standard Supervised Approach 10-fold cross validation for training and testing on the annotated Micro-WNOp corpus Applied a Naive Bayes classifier Three types of features: –Lexical Features unigrams in the glosses as bag-of-words WordNet synsets –Part-of-Speech Features –Relation Features Employ 8 relations –antonym, similar-to, derived from, attribute, also-see, direct- hyponym, direct hypernym, and extended-antonym Each relation R leads to 2 features –describe for a sense A how many links of that type it has to synsets in the subjective or the objective training set
12
Sentence Collections Approach Cast word sense subjectivity classification as a sentence classification task Take the glosses that WordNet provides for each sense as the sentences to be classified Can in theory feed any collection of annotated sentences as training data 1.Movie-domain Subjectivity Data Set (Movie) 5000 subjective sentences and 5000 objective sentences 2.MPQA Corpus contains news articles manually annotated at the phrase level 6127 subjective and 4985 objective sentences Use a Naive Bayes algorithm with lexical unigram features
13
Word Lists Approach General Inquirer (GI) –concentrates on word polarity –assume that both positive and negative words in the GI list are subjective clues –1915 positive, 2291 negative and 7582 no-polarity words Subjectivity clues list (SL) –centers on subjectivity and provides part-of-speech, subjectivity strength, and prior polarity –8,000 subjective words Both are not include word senses information and cannot be used directly
14
Unsupervised algorithm Consider occurrence of subjective words in gloss to indicate a subjective sense overall Adopt rule-based unsupervised algorithm Compute a subjectivity score S for each WN synset –summing up the weight values of all subjectivity clues in its gloss GI:all subjectivity clues weighted 1 SL:2 to strongly subjective clues and 1 to weakly subjective clues 1.If S is equal or higher than an agreed threshold T, then the synset is classified as subjective Best thresholds: 2 for SL and 4 for the GI 2.Set two thresholds as rule to divide all synsets into subjective/objective training set Best thresholds –SL: T1=4 and T2=2 –GI: T1=3 and T2=1
15
Experiments and Evaluation
16
Discussion To three star methods, small but consistent improvement when we use additional features Why using SL always greatly outperforms GI? –the GI lexicon is annotated for polarity, not subjectivity It includes words that we see as objective but with a strong positive or negative –GI lexicon does not operate with a clearly expressed polarity definition and leading to conflicting annotations –GI contains fewer features –GI contains many fewer subjective clues
17
Discussion The results of using sentence dataset are not satisfactory –the subjectivity definition in the Movie corpus does not seem to match ours we define a word sense or a sentence as subjective if it expresses a private state (i.e., emotion, opinion, sentiment, etc.) in Movie dataset, its “objective” data set rarely contain opinions about the “movie”, but contain other opinionated content for example: about the “characters”
18
Comparison to Prior Approaches VS. SentiWordNet –If the sum of positive and negative scores of a sense in SentiWordNet is more than or equal to 0.5, then it is subjective and otherwise objective –SentiWordNet achieves 75.3% accuracy on the Micro-WNOp –The CV* and SL* perform slightly better than SentiWordNet Test data of Wiebe and Mihalcea (2006) is not publically available –Precision = 48.9%, Recall = 60% for subjective senses –our best SL* method has a precision = 66% at about the same recall
19
Conclusion Proposed different ways of extracting training data and clue sets The effectiveness of the resulting algorithms depends on the different definitions of subjectivity At least one of purpose methods performed on a par with a supervised classifier –it is possible to avoid any manual annotation for the subjectivity classification of word senses
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.