Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay

Similar presentations


Presentation on theme: "Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay"— Presentation transcript:

1 Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay {adityaj@cse.iitb.ac.in}

2 Sentiment analysis (SA) Task of tagging text with orientation of opinion This is a good movie. This is a bad movie. The movie is set in Australia. Subjective Objective RECAP

3 Challenges of SA Domain dependent Sarcasm Thwarted expressions Negation Implicit polarity Time-bounded the sentences/words that contradict the overall sentiment of the set are in majority Example: “The actors are good, the music is brilliant and appealing. Yet, the movie fails to strike a chord.” Sarcasm uses words of a polarity to represent another polarity. Example: “The perfume is so amazing that I suggest you wear it with your windows shut” Sentiment of a word is w.r.t. the domain. Example: ‘unpredictable’ For steering of a car, For movie review, “I did not like the movie.” “Not only is the movie boring, it is also the biggest waste of producer’s money.” “Not withstanding the pressure of the public, let me admit that I have loved the movie.” “The camera of the mobile phone is less than one mega-pixel – quite uncommon for a phone of today.” “This phone allows me to send SMS.” “This phone has a touch-screen.” RECAP

4 How much opinion? Chart created using : www.technorati.com/chart/ RECAP

5 Using ML for NLP Documents represented as feature vectors for classifiers – Features: unigrams, etc. – Models: SVM, NB, etc. Chart created using : www.technorati.com/chart/ RECAP The movie is set in Australia. The movie is good. The: 2 movie: 2 is: 2 set: 1 in: 1 Australia: 1 good: 1

6 Support vector machines Basic idea Separating hyperplane Margin Support vectors “Maximum separating- margin classifier” RECAP

7 Results Compared to list-based classifiers (58-69%) RECAP

8 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?

9 Resources for SA SentiWordNet – WordNet synsets marked with three types of scores: positive, negative, objective I am feeling happy. I am feeling happy.

10 Lp Ln also-see antonymy Seed-set expansion in SWN The sets at the end of kth step are called Tr(k,p) and Tr(k,n) Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n) Seed words

11 Building SentiWordnet Classifier alternatives used: Rocchio (BowPackage) & SVM(LibSVM) Different training data based on expansion POS –NOPOS and NEG-NONEG classification Total eight classifiers – For different combinations of k and classifiers Synsets not in the expanded seed set are used as test synsets – Score is average of scores returned by the classifiers

12 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?

13 Subjectivity detection Aim: To extract subjective portions of text Algorithm used: Minimum cut algorithm

14 Constructing the graph To model item-specific and pairwise information independently. Nodes: Sentences of the document and source & sink Source & sink represent the two classes of sentences Edges: Weighted with either of the two scores Prediction whether the sentence is subjective or not Ind sub (s i )= Why graphs? Nodes and edges? Individual Scores Association scores Prediction whether two sentences should have the same subjectivity level T : Threshold – maximum distance upto which sentences may be considered proximal f: The decaying function i, j : Position numbers

15 Constructing the graph Build an undirected graph G with vertices {v1, v2…,s, t} (sentences and s,t) Add edges (s, v i ) each with weight ind 1 (x i ) Add edges (t, v i ) each with weight ind 2 (x i ) Add edges (v i, v k ) with weight assoc (v i, v k ) Partition cost:

16 Example Sample cuts:

17 Document Subjective Results (1/2) Naïve Bayes, no extraction : 82.8% Naïve Bayes, subjective extraction : 86.4% Naïve Bayes, ‘flipped experiment’ : 71 % Document Subjectivity detector Objective POLARITY CLASSIFIER

18 Results (2/2)

19 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?

20 Adjectives for SA Many adjectives have high sentiment value – A ‘beautiful’ bag – A ‘wooden’ bench – An ‘embarrassing’ performance – A ‘nice wooden’ bench – A ‘wooden nice’ bench An idea would be to augment this polarity information to adjectives in the WordNet

21 Setup Two anchor words (extremes of the polarity spectrum) were chosen PMI of adjectives with respect to these adjectives is calculated Polarity Score (W)= PMI(W,excellent) – PMI (W, poor) excellentpoor word PMI

22 Experimentation K-means clustering algorithm used on the basis of polarity scores The clusters contain words with similar polarities These words can be linked using an ‘isopolarity link’ in WordNet

23 Results Three clusters seen Major words were with negative polarity scores The obscure words were removed by selecting adjectives with familiarity count of 3 – the ones that are not very common Also reports an improvement when scores are used as feature values

24 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom?

25 Subject-based SA The horse bolted. The movie lacks a good story.

26 Lexicon subj. bolt b VB bolt subj subj. lack obj. b VB lack obj ~subj Argument that sends the sentiment (subj./obj.) Argument that receives the sentiment (subj./obj.)

27 Lexicon Also allows ‘\S+’ characters Similar to regular expressions E.g. to put \S+ to risk – The favorability of the subject depends on the favorability of ‘\S+’.

28 Example The movie lacks a good story. G JJ good obj. The movie lacks \S+. B VB lack obj ~subj. Lexicon :Steps : 1)Consider a context window of upto five words 2)Shallow parse the sentence 3)Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S+’ characters at each step

29 Results DescriptionPrecisionRecall Benchmark corpus Mixed statements 94.3%28% Open Test corpus Reviews of a camera 94%24%

30 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

31 Hindi document Sentiment Label Cross-lingual SA English document Sentiment Analysis System Sentiment Analysis System Multilingual content on the internet growing How can the sentiment it carries be identified? Can we take help of the ‘rich cousin’ English?

32 Alternatives to Cross-lingual SA Strategies for SA for target language Use corpus in target language Translate to a ‘rich’ source language Develop resources for target language

33 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

34 Domain-dependence of words ‘deadly’ – It was one deadly match! – There are some deadly poisonous snakes in the jungles of Amazon.

35 General Approach Retain the ‘common-to-all-domain’ words Learn only the ‘special domain’ words Domain differences can be substantial

36 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

37 Opinion spam: A side-effect of UGC Reviews contain rich user opinions on products and services Anyone can write anything on the Web – No quality control Result Incentives Low quality reviews, review spam / opinion Spam. Positive opinion -> Financial gain for organization

38 Different types of spam reviews Type 1 (untruthful opinions) Type 2 (reviews on brands only) Type 3 (non-reviews) Giving undeserving reviews to some target objects in order to promote/demote the object hyper spam - undeserving positive reviews defaming spam - malicious negative reviews DUPLICATES No comment on the product Comments on brands, manufacturer or sellers of the product Advertisements Other irrelevant reviews containing no opinions e.g. questions, answers and random text Although you should not expect prompt shippin. (It took 3 weeks and several e-mails before I received my order.) I would order again from this merchant, just because the price was right - http://www.pricegrabber.com It’s from nikon, what more you want.. Reference : [Jindal et al, 2008]

39 Motivation & Introduction Classifiers for SA Approaches to SA Applications Lecture 1Lecture 2 Outline Challenges of SA : Why SA is non- trivial Variants of SA : What forms does it exist in? Opinion on the web : Is doing SA really worth it? Fundamentals of supervised approaches Standard ML techniques Comparing different classifiers for SA Resources for SA : SentiWordNet Subjectivity detection : Separating the opinion from facts Adjectives for SA : Adjectives are great! Subject-based SA : Who defeated whom? Cross-lingual SA Cross-domain SA Opinion Spam SA for tweets

40 Challenges with tweets Ill-formed – Spelling mistakes – Informal words/emoticons – Extensions of words (‘happppyyyyy’) Vague topics www.clia.iitb.ac.in:8080/TwitterApp/index.jap

41 Mood analysis Real-time updation of moods w. r. t. a topic Snapshot: MoodViews SOME ACTUAL APPLICATIONS

42 Semantic search Sentiment search API by Evri Claims to allow deeper answers like “who”, “why”

43 A zeitgeist Understanding the ‘climate’ Snapshot: Twitscoop

44 … and many more


Download ppt "Sentiment Analysis & Opinion Mining Lecture Two: March 3, 2011 Aditya M Joshi M Tech3, CSE IIT Bombay"

Similar presentations


Ads by Google