Download presentation
Presentation is loading. Please wait.
1
Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven
2
Introduction Goal: determine the sentiment of a person towards a topic Practical use Customer feedback Marketing research Monitoring newsgroups and forums (flame detection) Augmentation of search engines (e.g. Opinmind.com) Opportunity Blogs Forums Review sites Noisy texts
3
Overview Introduction Emotions Machine learning (ML) techniques Challenges Experiments, results & discussion Conclusion & future work
4
Concepts of emotions “Sentiments are either emotions, or they are judgements or ideas prompted or coloured by emotions” An emotion Is usually caused by a person consciously or unconsciously evaluating an event, which is denoted appraisal in psychology Gives priority for one or a few kind of actions to which it gives a sense of urgency
5
Emotions in written text Appraisal: evaluation e.g. It was an amazing show. Direct expressions e.g. I am delighted of the final results. Elements of actions e.g. I was grinning the whole way through it and laughing out loud more than once.
6
Overview Introduction Emotions Machine learning (ML) techniques Challenges Experiments, results & discussion Conclusion & future work
7
ML: Document representation (1) Feature extraction Features are used to represent a document as a vector Values in the vector indicate frequency or presence of the feature at the corresponding index in a dictionary The dictionary consists of all features encountered in the training documents
8
ML: Document representation (2) Unigrams: all words N-grams: all sets of N successive words bigrams N = 1: unigrams, N = 2: bigrams, N = 3: trigrams e.g. I love, not worth, returned it Lemmas: basic dictionary form of all words e.g. cars -> car, was -> be, better -> good Opinion words: use only words from a pre-defined list as features Adjectives: use only adjectives (about 7.5% of the text)
9
ML: Document representation (3) Stopword removal from list with determiners, prepositions, possessive pronouns,... Negation tagging of each word following a negation until the first punctuation e.g. I don't like this movie. -> I don't NOT_like NOT_this NOT_movie.
10
ML: Techniques Classifiers successful for text classification Support Vector Machines (SVM) Naive Bayes Multinomial (NBM) Maximum Entropy (Maxent)
11
Challenges (1) Topic-sentiment relation e.g. Competing with the vastly superior Casino Royale for the same action-movie audience, Deja Vu will likely be brushed aside and quickly forgotten. e.g. A Good Year is a well-acted well-written well-directed movie but it just wasnt my cup of tea. Topic-neutral text e.g. In the movie Bond can start to untangle a terror network if he wins this big poker game at Casino Royale in Montenegro.
12
Challenges (2) Cross-domain classification Training (and testing) was done on a mixture of movie and car reviews Text quality e.g. Nothing but a French kiss-off Search Recent Archives Web for (rm) else • • • • • • • • • • • • • • • • ONLINE EXTRAS SITE SERVICES Movie Listings Friday Nov 10 2006 Posted on Fri Nov. 10 2006 MOVIE REVIEW A Good Year a flat bouquet Nothing but a French kiss- off Gladiator collaborators seem defeated by light-weight love story.By ROBERT W.
13
Overview Introduction Emotions Machine learning (ML) techniques Challenges Experiments, results & discussion Conclusion & future work
14
Corpora Pang and Lee's movie review corpus 1000 positive and 1000 negative reviews Reviews mix objective and subjective information Often used in the literature Our blog corpus 759 positive, 205 negative and 3527 neutral sentences Gathered from blogs, discussion boards and other websites Extended with reviews from Customer Review Datasets corpus by Hu and Liu for balancing positive and negative
15
Evaluation measures Accuracy Precision: Recall: Other Speed Available resources
16
Results (1) Pang and Lee's movie review corpus N-grams + easy to extract + require no special tools − large feature vector size NBM + fast
17
Results (2) Our blog corpus The baseline approach: uses basic ML techniques as described earlier Our latest approach: achieves considerable improvements over the baseline
18
Conclusion & future work Detection topic-sentiment relation far from perfect Dirty texts are making the task even more difficult Lack of training examples
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.