or ? Semantic Orientation Applied to Unsupervised Classification of Reviews Peter D. Turney ACL-2002
3 Overview Unsupervised learning algorithm for classifying reviews as recommended or not recommended The classification is based on the semantic orientation of the phrases in the review which contain adjectives and adverbs
4 Algorithm Input: review Identify phrases that contain adjectives or adverbs by using a part-of-speech tagger Estimate the semantic orientation of each phrase Assign a class to the given review based on the average semantic orientation of its phrases Output: classification ( or )
5 Step 1 Apply Brill’s part-of-speech tagger on the review Adjective are good indicators of subjective sentences. In isolation: unpredictable steering ( ) / plot ( ) Extract two consecutive words: one is an adjective or adverb, the other provides the context First WordSecond WordThird Word (not extracted) 1.JJNN or NNSAnything 2.RB, RBR, or RBSJJNot NN nor NNS 3.JJ Not NN nor NNS 4.NN or NNSJJNot NN nor NNS 5.RB, RBR, or RBSVB, VBD, VBN, or VBGAnything
6 Step 2 Estimate the semantic orientation of the extracted phrases using PMI-IR (Turney, 2001) Pointwise Mutual Information (Church and Hanks, 1989): Semantic Orientation: PMI-IR estimates PMI by issuing queries to a search engine (Altavista, ~350 million pages)
7 Step 2 – continued Added 0.01 to hits to avoid division by zero If hits(phrase NEAR “excellent”) and hits(phrase NEAR “poor”)≤4, then eliminate phrase Added “AND (NOT host:epinions)” to the queries not to include the Epinions website
8 Step 3 Calculate the average semantic orientation of the phrases in the given review If the average is positive, then If the average is negative, then PhrasePOS tagsSO direct depositJJ NN1.288 local branchJJ NN0.421 small partJJ NN0.053 online serviceJJ NN2.780 well otherRB JJ0.237 low feesJJ NNS0.333 … true serviceJJ NN other bankJJ NN inconveniently located RB VBN Average Semantic Orientation 0.322
9 Experiments 410 reviews from Epinions 170 (41%) ( ) 240 (59%) ( ) Average phrases per review: 26 Baseline accuracy: 59% DomainAccuracyCorrelation Automobiles84.00% Banks80.00% Movies65.83% Travel Destinations70.53% All74.39%0.5174
10 Discussion What makes the movies hard to classify? The average SO tends to classify a recommended movies as not recommended Evil characters make good movies The whole is not necessarily the sum of the parts Good beaches do not necessarily add up to a good vacation But good automobile parts usually add up to a good automobile
11 Applications Summary statistics for search engines Summarization of reviews Pick out the sentence with the highest positive/negative semantic orientation given a positive/negative review Filtering “flames” for newsgroups When the semantic orientation drops below a threshold, the message might be a potential flame
? Sentiment Classification using Machine Learning Techniques Bo Pang, Lillian Lee and Shivakumar Vaithyanathan EMNLP-2002
14 Overview Consider the problem of classifying documents by overall sentiment Three machine learning methods besides the human-generated lists of words Naïve Bayes Maximum Entropy Support Vector Machines
15 Experimental Data Movie-review domain Source: Internet Movie Database (IMDb) Stars or numerical value ratings converted into positive, negative, or neutral » no need to hand label the data for training or testing Maximum of 20 reviews/author/sentiment category 752 negative reviews 1301 positive reviews 144 reviewers
16 List of Words Baseline Maybe there are certain words that people tend to use to express strong sentiments Classification done by counting the number of positive and negative words in the document Random-choice baseline: 50%
17 Machine Learning Methods Bag-of-features framework: {f 1,…,f m } predefined set of m features n i (d) = number of times f i occurs in document d (Naïve Bayes)
18 Machine Learning Methods – continued (Maximum Entropy) where F i,c is a feature/class function: Support vector machines: Find hyperplane that maximizes the margin. The constraint optimization problem: c j is the correct class of document d j
19 Evaluation 700 positive-sentiment and 700 negative- sentiment documents 3 equal-sized folds The tag “NOT_” was added to every word between a negation word (“not”, “isn’t”, “didn’t”) and the first punctuation mark “good” is opposite to “not very good” Features: 16,165 unigrams appearing at least 4 times in the 1400-document corpus 16,165 most often occurring bigrams in the same data
20 Results POS information added to differentiate between: “I love this movie” and “This is a love story”
21 Conclusion Results produced by the machine learning techniques are better than the human- generated baselines SVMs tend to do the best Unigram presence information is the most effective Frequency vs. presence: “thwarted expectation”, many words indicative of the opposite sentiment to that of the entire review Some form of discourse analysis is necessary
Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status Simone Teufel and Marc Moens CL-2002
24 Overview Summarization of scientific articles: restore the discourse context of extracted material by adding the rhetorical status of each sentence in the document Gold standard data for summaries consisting of computational linguistics articles annotated with the rhetorical status and relevance for each sentence Supervised learning algorithm which classifies sentences into 7 rhetorical categories
25 Why? Knowledge about the rhetorical status of the sentence enables the tailoring of the summaries according to user’s expertise and task Nonexpert summary: background information and the general purpose of the paper Expert summary: no background, instead differences between this approach and similar ones Contrasts or complementarity among articles can be expressed
26 Rhetorical Status Generalizations about the nature of scientific texts + information to enable the construction of better summaries Problem structure: problems (research goals), solutions (methods), and results Intellectual attribution: what the new contribution is, as opposed to previous work and background (generally accepted statements) Scientific argumentation Attitude toward other people’s work: rival approach, prior approach with a fault, or an approach contributing parts of the authors’ own solution
27 Metadiscourse and Agentivity Metadiscourse is an aspect of scientific argumentation and a way of expressing attitude toward previous work “we argue that”, “in contrast to common belief, we” Agent roles in argumentation: rivals, contributors of part of the solution (they), the entire research community, or the authors of the paper (we)
28 Citations and Relatedness Just knowing that an article cites another is often not enough One needs to read the context of the citation to understand the relation between the articles Article cited negatively or contrastively Article cited positively or in which the authors state that their own work originates from the cited work
29 Rhetorical Annotation Scheme Only one category assigned to each full sentence Nonoverlapping, nonhierarchical scheme The rhetorical status is determined on the basis of the global context of the paper
30 Relevance Select important content from text Highly subjective » low human agreement Sentence is considered relevant if it describes the research goal or states a difference with a rival approach Other definitions: relevant sentence if it shows a high level of similarity with a sentence in the abstract
31 Corpus 80 conference articles Association for Computational Linguistics (ACL) European Chapter of the Association for Computational Linguistics (EACL) Applied Natural Language Processing (ANLP) International Joint Conference on Artificial Intelligence (IJCAI) International Conference on Computational Linguistics (COLING). XML markups added
32 The Gold Standard 3 tasked-trained annotators 17 pages of guidelines 20 hours of training No communication between annotators Evaluation measures of the annotation: Stability Reproducibility
33 Results of Annotation Kappa coefficient K (Siegel and Castellan, 1988) where P(A)= pairwise agreement and P(E)= random agreement Stability: K=.82,.81,.76 (N=1,220 and k=2) Reproducibility: K=.71
34 The System Supervised machine learning Naïve Bayes
35 Features Absolute location of a sentence Limitations of the author’s own method can be expected to be found toward the end, while limitations of other researchers’ work are discussed in the introduction
36 Features – continued Section structure: relative and absolute position of sentence within section: First, last, second or third, second-last or third-last, or either somewhere in the first, second, or last third of the section Paragraph structure: relative position of sentence within a paragraph Initial, medial, or final
37 Features – continued Headlines: type of headline of current section Introduction, Implementation, Example, Conclusion, Result, Evaluation, Solution, Experiment, Discussion, Method, Problems, Related Work, Data, Further Work, Problem Statement, or Non-Prototypical Sentence length Longer or shorter than 12 words (threshold)
38 Features – continued Title word contents: does the sentence contain words also occurring in the title? TF*IDF word contents High values to words that occur frequently in one document, but rarely in the overall collection of documents Do the 18 highest-scoring TF*IDF words belong to the sentence? Verb syntax: voice, tense, and modal linguistic features
39 Features – continued Citation Citation (self), citation (other), author name, or none + location of the citation in the sentence (beginning, middle, or end) History: most probable previous category AIM tends to follow CONTRAST Calculated as a second pass process during training
40 Features – continued Formulaic expressions: list of phrases described by regular expressions, divided into 18 classes, comprising a total of 644 patterns Clustering prevents data sparseness
41 Features – continued Agent: 13 types, 167 patterns The placeholder WORK_NOUN can be replaced by a set of 37 nouns including theory, method, prototype, algorithm Agent classes with a distribution very similar with the overall distribution of target categories were excluded
42 Features – continued Action: 365 verbs clustered into 20 classes based on semantic concepts such as similarity, contrast PRESENTATION_ACTIONs: present, report, state RESEARCH_ACTIONs: analyze, conduct, define, and observe Negation is considered
43 System Evaluation 10-fold-cross-validation
44 Feature Impact The most distinctive single feature is Location, followed by SegAgent, Citations, Headlines, Agent and Formulaic
