Download presentation
1
Predicting the Semantic Orientation of Adjectives
Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Gabriel Nicolae
2
Introduction Orientation/polarity = direction of deviation from the norm Nearly synonymous simple vs. simplistic Antonyms hot vs. cold
3
Introduction In linguistic constructs such as conjunctions the choice of arguments and connectives are mutually constrained. The tax proposal was simple and well-received simplistic but well-received simplistic and well-received by the public.
4
Exceptions
5
Goals Automatically identify antonyms Distinguish near synonyms How?
by retrieving semantic orientation information using indirect information collected from a large corpus Why? dictionaries and similar sources (thesauri, WordNet) do not include explicitly semantic orientation information lack of links between antonyms and synonyms when they depend on the domain of the discourse
6
Overview of their approach
Correlation between indicators and semantic orientation direct indicators: affixes (in-, un-) mostly negatives exceptions: independent, unbiased indirect indicators: conjunctions conjoined adjectives usually are of the same orientation for most connectives the situation is reversed for but fair and legitimate corrupt and brutal fair and brutal corrupt and legitimate vs. from corpus semantically anomalous
7
General algorithm Extract conjunctions of adjectives and morphological relations Label each two conjoined adjectives as being of the same or different orientation using a log-linear regression model Separate adjectives into two subsets of different orientation using a clustering algorithm The group with the higher average frequency is labeled as positive
8
Data collection Corpus: 21 million word 1987 Wall Street Journal
Training data: a set of adjectives with predetermined (hand-annotated) orientation labels (+ or -) 1,336 adjectives (657 +, 679 -) The training set was validated by four other people 500 adjectives: 89.15% agreement Test data: 15,048 conjunction tokens 9,296 distinct pairs of conjoined adjectives (type)
9
Data collection (cont.)
Each conjunction token is classified according to three variables: conjunction used and, or, but, either-or, neither-nor type of modification attributive, predicative, appositive, resultative number of the modified noun singular, plural
10
Validation of the conjunction hypothesis
Results Their conjunction hypothesis is validated overall and for almost all individual cases There are small differences in the behavior of conjunctions between linguistic environments (as represented by the three attributes) Conjoint antonyms appear far more frequently than expected by chance in conjunctions other than but
11
Prediction of link type
Baseline 1: always guessing that a link is of the same orientation type => 77.84% accuracy Baseline 2: Baseline 1 + but exhibits the opposite pattern => 80.82% accuracy Morphological relationships: Adjectives related in form almost always have different semantic orientations Highly accurate (97.06%), but applies only to 1,336 labeled adjectives (891,780 possible pairs) E.g. adequate-inadequate, thoughtful-thoughtless Baseline 1 + Morphology => 78.86% accuracy Baseline 2 + Morphology => 81.75% accuracy
12
Prediction of link type (cont.)
Log-linear regression model x: the vector of the observed counts in the various conjunction categories w: the vector of weights to be learned y: the response of the system Using the method of iterative stepwise refinement they selected 9 predictor variables from all 90 possible predictor variables. Small improvement: 80.97% accuracy (82.05% accuracy using Morphology) but now each prediction is rated between 0 and 1
13
Clustering Input: a graph of adjectives connected by dissimilarity links Small dissimilarity value => same-orientation link High dissimilarity value => different-orientation link Method used: apply an iterative optimization procedure on each connected component, based on the exchange method, a non-hierarchical clustering algorithm Idea: find the partition P such that the objective function Φ is minimized
14
Labeling the clusters as + or -
In oppositions of gradable adjectives where one member is semantically unmarked, the unmarked member is the most frequent one about 81% of the time Unmarked => positive orientation almost always So, label as positive the group that has the highest average frequency of words.
15
Graph connectivity and performance
They tested how graph connectivity affects the overall performance
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.