Download presentation
Presentation is loading. Please wait.
Published byBernard Brooks Modified over 8 years ago
1
14/12/2009ICON 20091 Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2009 Emotion Tagging – A Comparative Study on Bengali and English Blogs
2
14/12/2009ICON 20092 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
3
14/12/2009ICON 20093 Motivation (1/3) In psychology and common use, emotion is an aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008)
4
14/12/2009ICON 20094 Motivation (2/3) Natural Language Processing (NLP) tasks - Tracking users’ emotion (products, events, politics) - Customer relationship management - Question Answering (QA) systems - Modern Information Retrieval (IR) systems
5
14/12/2009ICON 20095 Motivation (3/3) Blogs - Communicative and informative repository of text based emotional contents in the Web 2.0. (Lin et al., 2007) - Online diary of the bloggers - Blog posts annotated by other bloggers - Large data suitable for machine learning Recognition of emotion from written text
6
14/12/2009ICON 20096 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
7
14/12/2009ICON 20097 Resources (1/4) Bengali Blog - Web blog archive (www.amarblog.com) - 14 different comic related topics and user comments - 1200 sentences English blog - Saima Aman and Stan Szpakowicz.2007. Identifying Expressions of Emotion in Text. V. Matoušek and P. Mautner (Eds.): TSD 2007, LNAI 4629, pp. 196–205 - 1200 sentences
8
14/12/2009ICON 20098 Resources (2/4) English Sentiment Lexicon - SentiWordNet (Esuli et al., 2006) - WordNet Affect lists (WAL) (Strapparava et al., 2004) Updating of WAL - Inadequate number of emotion word entries - Retrieved synsets from English SentiWordNet - Update with synsets
9
14/12/2009ICON 20099 Resources (3/4) No Sentiment lexicon in Bengali Both SentiWordNet and WordNet Affect lists into Bengali Translation - Using Bengali synsets (English to Bengali bilingual synset dictionary being developed as part of the English to Indian Languages Machine Translation (EILMT) project, a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India WAL (termed as Emotion List)
10
14/12/2009ICON 200910 Resources (4/4) A knowledge base for Emoticons
11
14/12/2009ICON 200911 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
12
14/12/2009ICON 200912 Word Level Tagging Semi-automatic annotation Emotion tag to a word with help of the Emotion list Other non-emotional words tagged with neutral type Stemming process Verified by linguists 700 sentences for training, 300 and 200 sentences as development and test set
13
14/12/2009ICON 200913 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14
14/12/2009ICON 200914 Baseline Model Identify word level emotion tagging accuracies for each emotion class All words incorporate no prior knowledge regarding word features Six separate modules for six emotion classes Words passed through six separate modules Tag each word with the emotion tag based on the emotion class in which that word appears
15
14/12/2009ICON 200915 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
16
14/12/2009ICON 200916 Morphology Minimize errors to recognize emotional words Bengali, like any other Indian languages, is morphologically very rich Different suffixes (e.g. verbs, the features are Tense, Aspect, and Person) Stemmer uses suffix list to identify the stem form For English, porter stemmer (Porter, 1997) 3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test set
17
14/12/2009ICON 200917 Baseline vs. Morphology (Result)
18
14/12/2009ICON 200918 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
19
14/12/2009ICON 200919 CRF based Model (1/4) 10 active features (Das and Bandyopadhyay, 2009a) · POS information (adjective, verb, noun, adverb) · First sentence in a topic · SentiWordNet emotion word (delight…) · Reduplication (so-so, good-good..) · Question words (what, why…) · Colloquial / Foreign words · Special punctuation symbols (!,@,?..) · Quoted sentence ( “you are 2 good man”) · Sentence Length (>=8,<15) · Emoticons (, , ..) Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations
20
14/12/2009ICON 200920 CRF based Model (2/4) Feature Analysis - Frequencies - Combination of multiple features vs. single feature - Feature with passive role (e.g. First sentence in a topic) (specific phenomenon for English blog corpus) but active for Topic or user comments or title sentences of Bengali blog - Special punctuation symbols (!,@,? Etc.), their frequencies and attachments obtain 3% and 6% improvement for Bengali and English - Length of a sentence (> eight and < fifteen words per sentence) - Added each feature if its inclusion along with the pre-selected features improves accuracy - Accuracy improvement of 20.83% for Bengali and 24.33% for English over baseline model
21
14/12/2009ICON 200921 CRF based Model (3/4)
22
14/12/2009ICON 200922 CRF based Model (4/4)
23
14/12/2009ICON 200923 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
24
14/12/2009ICON 200924 Sentence Level Tagging (1/2) Sense _ Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet for each synset in which each of the seed words appears - Average retrieved score is fixed as Sense_Tag_Weight (STW) of that particular emotion tag
25
14/12/2009ICON 200925 Sentence Level Tagging (2/2) Sense_Weight_Score (SWS) for each emotion tag - SWS i =(STW i *N i )/(∑j=1 to 7 STW j *N j ) | i Єj - SWS i is the Sentence level Sense_Weight_Score for the emotion tag i - N i is the number of occurrences of that emotion tag in the sentence - Sentence level emotion tag SET = [max i=1 to 7 (SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSi produced zero (0) emotion score Post-processing for handling negative words (Das and Bandyopadhyay, 2009b)
26
14/12/2009ICON 200926 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
27
14/12/2009ICON 200927 Evaluation (1/2) Accuracies - By counting number of sentences whose system assigned emotion tag match with the emotion tag corresponding to its emotion class
28
14/12/2009ICON 200928 Evaluation (2/2) Loss in accuracies - Frequent use of metaphoric words in blogs Bengali blogs collected from comic articles Emotions such as “happy”, “sad”, and “surprise” are present with sufficient number in the blog corpus Presence of adequate number of training examples for a particular emotion tag improves accuracy of that tag
29
14/12/2009ICON 200929 Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
30
14/12/2009ICON 200930 Conclusion Handling of metaphors Phrase level analysis concerning genre of corpus Document level emotion identification More emotion annotated data - To improve the performance - Suitable for machine learning approach
31
14/12/2009ICON 200931 Thank you
32
14/12/2009ICON 200932 Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.