Download presentation
Presentation is loading. Please wait.
Published byPaulina Waters Modified over 9 years ago
1
Arpit Maheshwari Pankhil Chheda Pratik Desai
2
Contents 1. Introduction And Basic Definitions 2. Applications 3. Challenges 4. Problem Formulation and Key Concepts 5. Popular Approaches 5.1 Polarity Classification: ML techniques 5.2 Subjectivity Detection: Learning Extractions 5.3 Sentiment Analysis: using minimum cuts for subjectivity 5.4 Sentiment Analysis: a new approach 6. Publicly Available Resources 7.References
3
1. Introduction and Basic Definitions Sentiment analysis : Determining the attitude / opinion/perspective of an author on a particular subject Different from other NLP tasks (emphasized in part 3: challenges)
4
Importance Consumer reviews play an important role in determining our attitude towards something unknown e.g. Product reviews, restaurant reviews etc. Need/curiosity to know the prevalent point-of-view e.g.- to know which party is favorable to win in coming elections Companies anxious to understand how their products and services are perceived
5
Key terms Sentiment/opinion Subjectivity Subjectivity analysis Sentiment analysis/Opinion Mining : taken broadly to mean the computational treatment of opinion, sentiment, and subjectivity in text
6
2. Applications 2.1 Applications to Review-Related Websites 2.2 Applications in Business 2.3 Applications Across Different Domains
7
3.General Challenges Comparision with a similar-looking NLP task: Topic-based categorization Appears easier on the first look Coming up with the right set of keywords might be less trivial than one might initially think
8
General Challenges contd.. Applying machine learning techniques based on unigram models can achieve over 80% in accuracy, which is much better than the performance based on hand-picked keywords, roughly 60% Compared to topic, sentiment can often be expressed in a more subtle manner Subjectivity is an innate problem for us in Sentiment Analysis
9
Contd.. Somewhat in contrast with topic-based text categorization, order effects can completely overwhelm frequency effects e.g.- This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.
10
4. Problem formulation and key concepts Classification : Given a piece of opinionated text, determine the opinion/mood/sentiment from a set of values (binary e.g. positive/negative or on a scale) Text Summarization: Reducing the length of text (subjective and/or objective) keeping the useful pieces of information/opinions
11
We would cover the problem of classification in greater detail After covering the issues in brief, we turn to certain popular approaches as found in literature Finally summarize the approaches mentioned followed by a list of publicly available resources relevant to Sentiment Analysis
12
Subjectivity Issue: The input text need not be completely opinionated Subjective vs Objective text Is the distinction clear? e.g.- Consider the difference between “the battery lasts 2 hours” vs. “the battery only lasts 2 hours”
13
Subjectivity contd.. Problem: It has been found that the subjectivity classification problem is itself more difficult than the Polarity classification problem An improvement in the former => improvement in latter (Why so?)
14
5. Popular Approaches 5.1 Polarity Classification: ML techniques 5.2 Subjectivity Detection: Learning Extractions 5.3 Sentiment Analysis: using minimum cuts for subjectivity 5.4 Sentiment Analysis: a new approach
15
5.1 Polarity Classification: ML techniques sentiments can be expressed more subtly as compd to expression of topic which is detectable by keywords 3 machine learning techniques for the task (common to many NLP tasks): 1. Naïve Bayes 2. Maximum entropy 3. SVM (Support Vector Machine)
16
Common framework BAG-OF-FEATURES: employed for all 3 methods {f 1, f 2, f 3 ….f m } : predefined set of m features that can appear in a document e.g. word “stinks”, bigram “hats off” D= {n 1 (d), n 2 (d), n 3 (d)….n m (d) }: no. of times the document d contains the features The vector D represents the document d
17
Naïve Bayes Approach c : category d: document to be classified To find c*=argmax{c} (P(c|d)) P(c|d) = P(c)P(d|c)/P(d) :Bayes Rule Assumption: f i `s are conditionally independent given d`s class P(c|d) = P(c)*P(f i |c) ni(d) /P(d) Training method: estimation of P(c) and P(f i |c)
18
Maximum Entropy Model P(c|d) = exp() / Z(d) Z(d) : normalization constant Feature Class Function: = 1 if n i (d)>0 = 0 else Feature Weight parameters: For details of a generic Maximum Entropy model use in NLP, refer Berger et al, 1996
19
SVM Large margin Classifiers
20
Results: (Pang et al, 2002) Note: baseline results ranged from 50% to 69%.
21
A Conclusion: Accuracy achieved by ML methods in Sentiment Analysis is less than topic-based categorization Need to look for novel approaches A common phenomenon in the documents was “thwarted expressions” narrative: author sets up a deliberate contrast to earlier discussion “The whole is not necessarily the sum of parts” – Turney, 2002
22
5.2 Subjectivity Detection: Learning Extractions Earlier resources contain lists of subjective words However, subjective language can be exhibited by a staggering variety of words and phrases Subjectivity learning systems must be trained on extremely large text collections
23
Riloff and Wiebe, 2003 Two salient points: 1. Exploring the use of bootstrapping methods to allow subjectivity classifiers to learn from a collection of unannotated texts 2. Using extraction patterns to represent subjective expressions. These patterns are linguistically richer and more flexible than single words or N-grams
24
Extraction Patterns Consider a subjective sentence like “His kid always drives me up the wall” Possible abstraction: drives up the wall This extraction pattern, so formed contributes to the sentence being subjective Other examples: agree with is out of his mind
25
Schematic representation
26
High-precision subjectivity Classifiers use lists of lexical items that have been shown in previous work to be good subjectivity clues The subjectivity clues are divided into those that are strongly subjective and those that are weakly subjective The high-precision subjective classifier classifies a sentence as subjective if it contains two or more of the strongly subjective clues. On a manually annotated test set, this classifier achieves 91.5% precision and 31.9% recall
27
Contd.. high-precision objective classifier classifies a sentence as objective if there are no strongly subjective clues and at most one weakly subjective clue in the current, previous, and next sentence combined (82.6% precision and 16.4% recall)
28
Learning Subjective Extraction Patterns Choose extraction patterns for which freq(pattern)>T1 and Pr(subjective|pattern) >T2 T1, T2 : threshold values
29
Evaluation of learned Patterns precision ranges from 71% - 85% Hence, the extraction patterns so learned are effective at recognizing subjective expressions
30
Evaluation of the Bootstrapping process The extraction patterns so learned do form a subjectivity detector, these can also be used to enhance the high-precision subjectivity classifier When incorporated, the following results were observed
31
5.3 Sentiment Analysis: minimum cuts Subjectivity Summarization Based on Minimum Cuts Basic Strategy: 1. Label the sentences in the document as either subjective or objective, discarding the latter 2. Apply a standard machine-learning classifier to the resulting extract
32
Schematic representation
33
Context and Subjectivity Detection Earlier subjectivity detectors considered sentences in isolation Context Dependency: Nearby statements shall receive similar subjectivity tag Implemented in an elegant fashion by a graphical concept of minimum cut
34
Cut-based Classification Two types of information: Individual scores ind j (x i ): non-negative estimates of each x i ’s preference for being in C j based on just the features of x i alone Association scores assoc(x i, x k ): non-negative estimates of how important it is that x i and x k be in the same class Optimization problem: Minimize the partition cost
35
Graphical Formulation Problem: Seems intractable owing to the exponential number of subsets possible One can use maximum-flow algorithms with polynomial asymptotic running times to exactly compute the minimum-cost cut
36
Possible choices for the Association score: assoc(s i, s j ) = c.f(j-i) if (j-i)<T {a threshold} = 0 else f is a decreasing function e.g. f(d) = 1, f(d) = e 1-d, f(d) = 1/d 2 etc Choices for Individual Score: Default Polarity Classifier (discussed earlier) with training dataset as subjective + objective sentences Our suggestion: Using extraction pattern based subjectivity detector as discussed
37
Why subjective sentences matter the most
38
1.How extract scores over a full review 2.Why context matters in subjectivity detection too
39
5.4 A new approach: A. agarwal, P. Bhattacharya Two salient points: 1. Using Wordnet synonymy graphs to determine the weight of an adjective 2. Using the technique of minimum-cut to exploit the relationship/similarity between documents
40
SVM employed as polarity classifier Evaluative strength of an adjective determined using wordnet synonymy graph d(w i,w j ) = distance between the two words on synonymy graphs Values in range [-1,1] These weights are used in place of the standard binary values in feature vectors of SVM
41
Similarity between sentences` subjectivity status was exploited with association score- assoc(s i, s j ) Similarity also exists between documents to receive the same polarity Assign a Mutual Similarity Co-efficient to documents as Where f k : k th feature F i (f k ): function that takes the value 1 if the k th feature is present in the i th document and is 0 otherwise
42
s max : largest value of the number of common features between any two documents s min :smallest value of the number of common features between any two documents No more classification of documents in isolation Minimum-cut technique applied using MSC and individual score of a document
43
Schematic Representation
44
Conclusive Summary
45
Publicly Available Resources An Annotated List of Datasets Blog06 Congressional floor-debate transcripts URL:http://www.cs.cornell.edu/home/llee/data/convote.htm URL:http://www.cs.cornell.edu/home/llee/data/convote.htm Cornell movie-review datasets URL: http://www.cs.cornell.edu/people/pabo/movie-review-data/http://www.cs.cornell.edu/people/pabo/movie-review-data/ Customer review datasets URL:http://www.cs.uic.edu/ ∼ liub/FBS/CustomerReviewData.zip Economining URL: http://economining.stern.nyu.edu/datasets.htmlhttp://economining.stern.nyu.edu/datasets.html French sentences URL: http://www.psor.ucl.ac.be/personal/yb/Resource.htmlhttp://www.psor.ucl.ac.be/personal/yb/Resource.html MPQA Corpus URL: http://www.cs.pitt.edu/mpqa/databaserelease/http://www.cs.pitt.edu/mpqa/databaserelease/ Multiple-aspect restaurant reviews URL: http://people.csail.mit.edu/bsnyder/naacl07http://people.csail.mit.edu/bsnyder/naacl07
46
Contd.. Multi-Domain Sentiment Dataset URL:http://www.cis.upenn.edu/ ∼ mdredze/datasets/sentiment/ Review-search results sets URL: http://www.cs.cornell.edu/home/llee/data/search-subj.htmlhttp://www.cs.cornell.edu/home/llee/data/search-subj.html List of other useful resources- General Inquirer URL: http://www.wjh.harvard.edu/ ∼ inquirer/ NTU Sentiment Dictionary [registration required] http://nlg18.csie.ntu.edu.tw:8080/opinion/userform.js OpinionFinder’s Subjectivity Lexicon URL: http://www.cs.pitt.edu/mpqa/ SentiWordnet URL: http://sentiwordnet.isti.cnr.it/ Taboada and Grieve’s Turney adjective list [available through the Yahoo! sentimentAI group]
47
References: B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment classification using machine learning techniques,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86, 2002. E. Riloff and J. Wiebe, “Learning extraction patterns for subjective expressions,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2003. B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” in Proceedings of the Association for Computational Linguistics (ACL), pp. 271–278, 2004. Alekh Agarwal and Pushpak Bhattacharyya, Sentiment Analysis: A New Approach for Effective Use of Linguistic Knowledge and Exploiting Similarities in a Set of Documents to be Classified, International Conference on Natural Language Processing ( ICON 05), IIT Kanpur, India, December, 2005.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.