Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven.

Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven

Introduction Goal: determine the sentiment of a person towards a topic Practical use  Customer feedback  Marketing research  Monitoring newsgroups and forums (flame detection)‏  Augmentation of search engines (e.g. Opinmind.com)‏ Opportunity  Blogs  Forums  Review sites Noisy texts

Overview Introduction  Emotions Machine learning (ML) techniques Challenges Experiments, results & discussion Conclusion & future work

Concepts of emotions “Sentiments are either emotions, or they are judgements or ideas prompted or coloured by emotions” An emotion  Is usually caused by a person consciously or unconsciously evaluating an event, which is denoted appraisal in psychology  Gives priority for one or a few kind of actions to which it gives a sense of urgency

Emotions in written text Appraisal: evaluation  e.g. It was an amazing show. Direct expressions  e.g. I am delighted of the final results. Elements of actions  e.g. I was grinning the whole way through it and laughing out loud more than once.

Overview Introduction Emotions  Machine learning (ML) techniques Challenges Experiments, results & discussion Conclusion & future work

ML: Document representation (1)‏ Feature extraction  Features are used to represent a document as a vector  Values in the vector indicate frequency or presence of the feature at the corresponding index in a dictionary  The dictionary consists of all features encountered in the training documents

ML: Document representation (2)‏ Unigrams: all words N-grams: all sets of N successive words bigrams  N = 1: unigrams, N = 2: bigrams, N = 3: trigrams  e.g. I love, not worth, returned it Lemmas: basic dictionary form of all words  e.g. cars -> car, was -> be, better -> good Opinion words: use only words from a pre-defined list as features Adjectives: use only adjectives (about 7.5% of the text)‏

ML: Document representation (3)‏ Stopword removal  from list with determiners, prepositions, possessive pronouns,... Negation tagging  of each word following a negation until the first punctuation  e.g. I don't like this movie. -> I don't NOT_like NOT_this NOT_movie.

ML: Techniques Classifiers successful for text classification  Support Vector Machines (SVM)‏  Naive Bayes Multinomial (NBM)‏  Maximum Entropy (Maxent)‏

Challenges (1)‏ Topic-sentiment relation  e.g. Competing with the vastly superior Casino Royale for the same action-movie audience, Deja Vu will likely be brushed aside and quickly forgotten.  e.g. A Good Year is a well-acted well-written well-directed movie but it just wasnt my cup of tea. Topic-neutral text  e.g. In the movie Bond can start to untangle a terror network if he wins this big poker game at Casino Royale in Montenegro.

Challenges (2)‏ Cross-domain classification  Training (and testing) was done on a mixture of movie and car reviews Text quality  e.g. Nothing but a French kiss-off Search Recent Archives Web for (rm) else • • • • • • • • • • • • • • • • ONLINE EXTRAS SITE SERVICES Movie Listings Friday Nov 10 2006 Posted on Fri Nov. 10 2006 MOVIE REVIEW A Good Year a flat bouquet Nothing but a French kiss- off Gladiator collaborators seem defeated by light-weight love story.By ROBERT W.

Overview Introduction Emotions Machine learning (ML) techniques Challenges  Experiments, results & discussion Conclusion & future work

Corpora Pang and Lee's movie review corpus  1000 positive and 1000 negative reviews  Reviews mix objective and subjective information  Often used in the literature Our blog corpus  759 positive, 205 negative and 3527 neutral sentences  Gathered from blogs, discussion boards and other websites  Extended with reviews from Customer Review Datasets corpus by Hu and Liu for balancing positive and negative

Evaluation measures Accuracy Precision: Recall: Other  Speed  Available resources

Results (1)‏ Pang and Lee's movie review corpus N-grams + easy to extract + require no special tools − large feature vector size NBM + fast

Results (2)‏ Our blog corpus The baseline approach: uses basic ML techniques as described earlier Our latest approach: achieves considerable improvements over the baseline

Conclusion & future work Detection topic-sentiment relation far from perfect Dirty texts are making the task even more difficult Lack of training examples

Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven.

Similar presentations

Presentation on theme: "Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven.

Similar presentations

Presentation on theme: "Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven."— Presentation transcript:

Similar presentations

About project

Feedback