Download presentation
Presentation is loading. Please wait.
Published byStanley Carpenter Modified over 9 years ago
1
Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay {adityaj@cse.iitb.ac.in}
2
Sentiment analysis (SA) Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia. Subjective Objective RECAP
3
Challenges of SA Domain dependent Sarcasm Thwarted expressions Negation Implicit polarity Time-bounded the sentences/words that contradict the overall sentiment of the set are in majority Example: “The actors are good, the music is brilliant and appealing. Yet, the movie fails to strike a chord.” Sarcasm uses words of a polarity to represent another polarity. Example: “The perfume is so amazing that I suggest you wear it with your windows shut” Sentiment of a word is w.r.t. the domain. Example: ‘unpredictable’ For steering of a car, For movie review, “I did not like the movie.” “Not only is the movie boring, it is also the biggest waste of producer’s money.” “Not withstanding the pressure of the public, let me admit that I have loved the movie.” “The camera of the mobile phone is less than one mega-pixel – quite uncommon for a phone of today.” “This phone allows me to send SMS.” “This phone has a touch-screen.” RECAP
4
How much opinion? Chart created using : www.technorati.com/chart/ RECAP
5
Using ML for NLP Documents represented as feature vectors for classifiers – Features: unigrams, etc. – Models: SVM, NB, etc. Chart created using : www.technorati.com/chart/ RECAP The movie is set in Australia. The movie is good. The: 2 movie: 2 is: 2 set: 1 in: 1 Australia: 1 good: 1
6
Support vector machines Basic idea Separating hyperplane Margin Support vectors “Maximum separating- margin classifier” RECAP
7
Results Compared to list-based classifiers (58-69%) RECAP
8
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?
9
Resources for SA SentiWordNet – WordNet synsets marked with three types of scores: positive, negative, objective I am feeling happy. I am feeling happy.
10
Lp Ln also-see antonymy Seed-set expansion in SWN The sets at the end of kth step are called Tr(k,p) and Tr(k,n) Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n) Seed words
11
Building SentiWordnet Classifier alternatives used: Rocchio (BowPackage) & SVM(LibSVM) Different training data based on expansion POS –NOPOS and NEG-NONEG classification Total eight classifiers – For different combinations of k and classifiers Synsets not in the expanded seed set are used as test synsets – Score is average of scores returned by the classifiers
12
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?
13
Subjectivity detection Aim: To extract subjective portions of text Algorithm used: Minimum cut algorithm
14
Constructing the graph To model item-specific and pairwise information independently. Nodes: Sentences of the document and source & sink Source & sink represent the two classes of sentences Edges: Weighted with either of the two scores Prediction whether the sentence is subjective or not Ind sub (s i )= Why graphs? Nodes and edges? Individual Scores Association scores Prediction whether two sentences should have the same subjectivity level T : Threshold – maximum distance upto which sentences may be considered proximal f: The decaying function i, j : Position numbers
15
Constructing the graph Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t) Add edges (s, v i ) each with weight ind 1 (x i ) Add edges (t, v i ) each with weight ind 2 (x i ) Add edges (v i, v k ) with weight assoc (v i, v k ) Partition cost:
16
Example Sample cuts:
17
Document Subjective Results (1/2) Naïve Bayes, no extraction : 82.8% Naïve Bayes, subjective extraction : 86.4% Naïve Bayes, ‘flipped experiment’ : 71 % Document Subjectivity detector Objective POLARITY CLASSIFIER
18
Results (2/2)
19
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?
20
Adjectives for SA Many adjectives have high sentiment value – A ‘beautiful’ bag – A ‘wooden’ bench – An ‘embarrassing’ performance – A ‘nice wooden’ bench – A ‘wooden nice’ bench An idea would be to augment this polarity information to adjectives in the WordNet
21
Setup Two anchor words (extremes of the polarity spectrum) were chosen PMI of adjectives with respect to these adjectives is calculated Polarity Score (W)= PMI(W,excellent) – PMI (W, poor) excellentpoor word PMI
22
Experimentation K-means clustering algorithm used on the basis of polarity scores The clusters contain words with similar polarities These words can be linked using an ‘isopolarity link’ in WordNet
23
Results Three clusters seen Major words were with negative polarity scores The obscure words were removed by selecting adjectives with familiarity count of 3 – the ones that are not very common Also reports an improvement when scores are used as feature values
24
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?
25
Subject-based SA The horse bolted. The movie lacks a good story.
26
Lexicon subj. bolt b VB bolt subj subj. lack obj. b VB lack obj ~subj Argument that sends the sentiment (subj./obj.) Argument that receives the sentiment (subj./obj.)
27
Lexicon Also allows ‘\S+’ characters Similar to regular expressions E.g. to put \S+ to risk – The favorability of the subject depends on the favorability of ‘\S+’.
28
Example The movie lacks a good story. G JJ good obj. The movie lacks \S+. B VB lack obj ~subj. Lexicon :Steps : 1)Consider a context window of upto five words 2)Shallow parse the sentence 3)Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step
29
Results DescriptionPrecisionRecall Benchmark corpus Mixed statements 94.3%28% Open Test corpus Reviews of a camera 94%24%
30
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets
31
Hindi document Sentiment Label Cross-lingual SA English document Sentiment Analysis System Sentiment Analysis System Multilingual content on the internet growing How can the sentiment it carries be identified? Can we take help of the ‘rich cousin’ English?
32
Alternatives to Cross-lingual SA Strategies for SA for target language Use corpus in target language Translate to a ‘rich’ source language Develop resources for target language
33
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets
34
Domain-dependence of words ‘deadly’ – It was one deadly match! – There are some deadly poisonous snakes in the jungles of Amazon.
35
General Approach Retain the ‘common-to-all-domain’ words Learn only the ‘special domain’ words Domain differences can be substantial
36
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets
37
Opinion spam: A side-effect of UGC Reviews contain rich user opinions on products and services Anyone can write anything on the Web – No quality control Result Incentives Low quality reviews, review spam / opinion Spam. Positive opinion -> Financial gain for organization
38
Different types of spam reviews Type 1 (untruthful opinions) Type 2 (reviews on brands only) Type 3 (non-reviews) Giving undeserving reviews to some target objects in order to promote/demote the object hyper spam - undeserving positive reviews defaming spam - malicious negative reviews DUPLICATES No comment on the product Comments on brands, manufacturer or sellers of the product Advertisements Other irrelevant reviews containing no opinions e.g. questions, answers and random text Although you should not expect prompt shippin. (It took 3 weeks and several e-mails before I received my order.) I would order again from this merchant, just because the price was right - http://www.pricegrabber.com It’s from nikon, what more you want.. Reference : [Jindal et al, 2008]
39
Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets
40
Challenges with tweets Ill-formed – Spelling mistakes – Informal words/emoticons – Extensions of words (‘happppyyyyy’) Vague topics www.clia.iitb.ac.in:8080/TwitterApp/index.jap
41
Mood analysis Real-time updation of moods w. r. t. a topic Snapshot: MoodViews SOME ACTUAL APPLICATIONS
42
Semantic search Sentiment search API by Evri Claims to allow deeper answers like “who”, “why”
43
A zeitgeist Understanding the ‘climate’ Snapshot: Twitscoop
44
… and many more
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.