Download presentation
Presentation is loading. Please wait.
1
An Overview of Concepts and Selected Techniques
Sentiment Analysis An Overview of Concepts and Selected Techniques
2
Terms Sentiment Sentiment Analysis
A thought, view, or attitude, especially one based mainly on emotion instead of reason Sentiment Analysis aka opinion mining use of natural language processing (NLP) and computational techniques to automate the extraction or classification of sentiment from typically unstructured text 1. Subjective vs objective information 2. Essentially the same as other information retrieval tasks, but with some additional challenges as we will see
3
Motivation Consumer information Marketing Politics Social
Product reviews Marketing Consumer attitudes Trends Politics Politicians want to know voters’ views Voters want to know policitians’ stances and who else supports them Social Find like-minded individuals or communities Review info from blogs, newsgroups, etc Consumer attitudes towards -company’s products -competitor’s products Politics -can form basis of policy decisions
4
Problem Which features to use?
Words (unigrams) Phrases/n-grams Sentences How to interpret features for sentiment detection? Bag of words (IR) Annotated lexicons (WordNet, SentiWordNet) Syntactic patterns Paragraph structure Lead in: these problems are similar to other IR tasks Have a body of text--- need to know how to classify it GRANULARITY --Most research has used unigrams (single words) --some research shows that k-length n-grams work best Wordnet: Contains large lexicon with relationships Synonymy, antonymy, etc Syntactic patterns Indirect negation Setup/contradiction
5
Challenges Harder than topical classification, with which bag of words features perform well Must consider other features due to… Subtlety of sentiment expression irony expression of sentiment using neutral words Domain/context dependence words/phrases can mean different things in different contexts and domains Effect of syntax on semantics “[it] avoids all cliches and predictability found in Hollywood movies” “avoids” reverses polarity of “cliches” and “predictability” Thwarted expectation: “This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can't hold up" “unpredictable”: good for movie plot, bad for car steering
6
Approaches Machine learning Unsupervised methods Naïve Bayes
Maximum Entropy Classifier SVM Markov Blanket Classifier Accounts for conditional feature dependencies Allowed reduction of discriminating features from thousands of words to about 20 (movie review domain) Unsupervised methods Use lexicons Assume pairwise independent features Machine learning Strengths: -perform fairly well within a given domain with sufficient training data Weaknesses: --in a given domain tends to overfit training data; hard to transfer learning to other domains --need training data Unsupervised Strengths --domain independent; prior polarity --may aid machine learning techniques weaknesses: --when used alone, does not perform as well as machine learning w/in a given domain
7
LingPipe Polarity Classifier
First eliminate objective sentences, then use remaining sentences to classify document polarity (reduce noise)
8
LingPipe Polarity Classifier
Uses unigram features extracted from movie review data Assumes that adjacent sentences are likely to have similar subjective-objective (SO) polarity Uses a min-cut algorithm to efficiently extract subjective sentences
9
LingPipe Polarity Classifier
Graph for classifying three items. Document with three sentences: Y, M, N – nodes in the graph Assign weights for each node’s (sentence’s) preference for being in each of two classes (positive or negative) Assign weights for each node’s (sentence’s) preference for being in the same class as adjacent nodes.
10
LingPipe Polarity Classifier
Accurate as baseline but uses only 22% of content in test data (average) Metrics suggests properties of movie review structure Also shows performance of different classifiers
11
SentiWordNet Based on WordNet “synsets” Ternary classifier
Ternary classifier Positive, negative, and neutral scores for each synset Provides means of gauging sentiment for a text Wordnet: lexical resource developed at princeton A Synset represents a distinct semantic concept --contains a set of synonymous words
12
SentiWordNet: Construction
Created training sets of synsets, Lp and Ln Start with small number of synsets with fundamentally positive or negative semantics, e.g., “nice” and “nasty” Use WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand Lp and Ln over K iterations Lo (objective) is set of synsets not in Lp or Ln Trained classifiers on training set Rocchio and SVM Use four values of K to create eight classifiers with different precision/recall characteristics As K increases, P decreases and R increases
13
SentiWordNet: Results
24.6% synsets with Objective<1.0 Many terms are classified with some degree of subjectivity 10.45% with Objective<=0.5 0.56% with Objective<=0.125 Only a few terms are classified as definitively subjective Difficult (if not impossible) to accurately assess performance
14
SentiWordNet: How to use it
Use score to select features (+/-) e.g. Zhang and Zhang (2006) used words in corpus with subjectivity score of 0.5 or greater Combine pos/neg/objective scores to calculate document-level score e.g. Devitt and Ahmad (2007) conflated polarity scores with a Wordnet-based graph representation of documents to create predictive metrics
15
References http://www.answers.com/sentiment, 9/22/08
B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proc Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002. Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proc of LREC th Conf on Language Resources and Evaluation, 2006. Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining. TREC 2006 Blog Track, Opinion Retrieval Task. Devitt A, Ahmad K. Sentiment Polarity Identification in Financial News: A Cohesion-based Approach. ACL 2007. Bo Pang , Lillian Lee, A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.271-es, July 21-26, 2004.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.