Download presentation
Presentation is loading. Please wait.
1
Case Study: Sentiment Analysis
2
What is Sentiment Analysis?
1
3
Introduction Resources Evaluation Conclusions
Context Rise of online commerce, Growth of the user generated content: forums, discussion groups, blogs, social media review websites, aggregation sites; Large volume, high velocity and vast variety of non-structured data; Costumer empowerment; Increasing impact of the online word-of-mouth. Online commerce, user generated contents, the Online word-of-mouth are all preconditions that make it necessary for companies to automatically extract, analyse and summarise not only factual information, but also opinions and emotions freely expressed in the web. In fact, an appropriate management of the online corporate reputation requires a careful monitoring of these new digital environments, that strengthen the influence of the stakeholders and give support during the decision-making processes.
4
About the Sentiment Analysis
Introduction Resources Evaluation Conclusions About the Sentiment Analysis Sentiment analysis is the computational treatment of opinions, sentiments and emotions freely expressed in texts. It is also called opinion mining, subjectivity analysis, or appraisal extraction. Specific research challenges: Sentiment and Subjectivity Classification, Feature-based Sentiment Analysis, Sentiment analysis of comparative sentences, Opinion search and retrieval, Opinion holders, Opinion features and opinion targets extraction. For all these reasons, it is required a software able to transform unstructured texts into structured data, liable to be stored and queried in database tables. The Sentiment Analysis represents a really active NLP field that includes as specific research challenges the Sentiment and Subjectivity Classification, the Feature-based Sentiment Analysis, Sentiment analysis of comparative sentences, the Opinion search and retrieval, or the Opinion spam detecting and, in the end, the Opinion Holder and Target extraction. In the present paper we focus on the Sentiment Polarity Classification, in particular on the document-level sentiment classification, which means classifying an opinionated document as expressing a positive or negative opinion on an object.
5
Introduction Resources Evaluation Conclusions
About an opinion oj, fjk, ooijkl, hi, tl where oj is the object, about which the opinion is expressed fjk represents the feature(s) about the object ooijkl is the, positive or negative, opinion orientation hi is the opinion holder, that expresses the opinion tl is the time when the opinion is expressed Opinions can be a positive or negative view, attitude, or appraisal about a topic, stated by an Opinion Holder. They can be represented as a quintuple where oj is the object, fjk is the feature, ooijkl is the opinion orientation, hi is the opinion holder and tl is the time when the opinion is expressed. Because the time is a structured information, we did not take it into account in this work.
6
Positive or negative movie review?
unbelievably disappointing Full of zany characters and richly applied satire, and some great plot twists this is the greatest screwball comedy ever filmed It was pathetic. The worst part about it was the boxing scenes. 6
7
Positive or negative camera review?
7
8
Positive or negative printer review?
8
9
Sentiment analysis has many other names
Opinion extraction Opinion mining Sentiment mining Subjectivity analysis
10
Why sentiment analysis?
Movie: is this review positive or negative? Products: what do people think about the new iPhone? Public sentiment: how is consumer confidence? Is despair increasing? Politics: what do people think about this candidate or issue? Prediction: predict election outcomes or market trends from sentiment
11
Scherer Typology of Affective States
Emotion: brief organically synchronized … evaluation of a major event angry, sad, joyful, fearful, ashamed, proud, elated Mood: diffuse non-caused low-intensity long-duration change in subjective feeling cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal stances: affective stance toward another person in a specific interaction friendly, flirtatious, distant, cold, warm, supportive, contemptuous Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons liking, loving, hating, valuing, desiring Personality traits: stable personality dispositions and typical behavior tendencies nervous, anxious, reckless, morose, hostile, jealous Scherer, Klaus R Emotion as a Multicomponent Process: A model and some cross-cultural data. In P. Shaver, ed., Review of Personality and Social Psych 5:
12
Sentiment analysis is the detection of attitudes
Emotion: brief organically synchronized … evaluation of a major event angry, sad, joyful, fearful, ashamed, proud, elated Mood: diffuse non-caused low-intensity long-duration change in subjective feeling cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal stances: affective stance toward another person in a specific interaction friendly, flirtatious, distant, cold, warm, supportive, contemptuous Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons liking, loving, hating, valuing, desiring Personality traits: stable personality dispositions and typical behavior tendencies nervous, anxious, reckless, morose, hostile, jealous Scherer, Klaus R Emotion as a Multicomponent Process: A model and some cross-cultural data. In P. Shaver, ed., Review of Personality and Social Psych 5:
13
Sentiment analysis is the detection of attitudes
Holder (source) of attitude Target (aspect) of attitude Type of attitude From a set of types Like, love, hate, value, desire, etc. Or (more commonly) simple weighted polarity: positive, negative, neutral, together with strength Text containing the attitude Sentence Entire document
14
Sentiment analysis is the detection of attitudes
Simplest task: Is the attitude of this text positive or negative? More complex: Rank the attitude of this text from 1 to 5 Advanced: Detect the target, source, or complex attitude types
15
Problems: What makes reviews hard to classify?
Subtlety: Perfume review in Perfumes: the Guide: “If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.” Dorothy Parker on Katherine Hepburn “She runs the gamut of emotions from A to B”
16
Sentiment Polarity Classification
Task Method Classification of opinionated documents as expressing a positive or negative opinion about an object. The whole document is the basic information unit and its Semantic Orientation (SO) is then calculated. Lexicon-based approach, based on the assumption that the text Sentiment Orientation comes from the Semantic Orientations of words and phrases contained in it. In the Sentiment Classification task, the whole document is considered as the basic information unit and its Semantic Orientation is calculated on its base. To ground this task on a lexicon means to hypothesize that the polarities of opinion words, or better, of sentences and phrases in which they occur, can be considered indicators of the polarity of the document in which the words and the expressions are contained.
17
Sentiment Lexicons 1
18
Lexicon-based Approaches
Lexical Resources Methods to automatically build a Dictionary: Latent Semantic Analysis [Landauer, Dumais, 1997]; Bootstrapping algorithms [Riloff, Wiebe, Wilson, 2003]; Graph propagation algorithms applied on the web [Velikovich et al., 2010; Kaji and Kitsuregawa, 2007] Distributional similarity [Wiebe, 2000]; Conjunctions (e.g. “and” or “but”), or morphological relations between adjectives [Hatzivassiloglou and McKeown, 1997]; Context coherency [Kanayama and Nasukawa, 2006]; Word Similarity [Mohammad, Dorr, and Dunne 2009]; Pointwise Mutual Information (PMI) based on Seed Words [Turney 2002; Turney and Littman, 2003; Rao and Ravichandran , 2009 and Velikovich et al., 2010; Gamon, and Aue, 2005]. Semantic Oorientation indicators: Adjectives [Hatzivassiloglou and McKeown, 1997; Hu and Liu, 2004; Taboada, Anthony, and Voll, 2006]; Adverbs [Benamara et al., 2007]; Nouns [Vermeij, 2005; Riloff, Wiebe, Wilson, 2003]; Verbs [Neviarouskaya, Prendinger, Ishizuka, 2009] as well. In Literature, the most commonly used SO indicators are adjectives or adjective phrases, but recently it became really common the use of adverbs, nouns and verbs as well. Hand-built lexicons are definitely more accurate than the automatically-built ones. Nevertheless, to manually draw up a dictionary is considered a strongly time-consuming activity; that is why it can be noticed in literature the presence of a large number of studies on automatic polarity lexicons creation and propagation.
19
Home page: http://www.wjh.harvard.edu/~inquirer
The General Inquirer Home page: List of Categories: Spreadsheet: Categories: Positiv (1915 words) and Negativ (2291 words) Strong vs Weak, Active vs Passive, Overstated versus Understated Pleasure, Pain, Virtue, Vice, Motivation, Cognitive Orientation, etc Free for Research Use Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie The General Inquirer: A Computer Approach to Content Analysis. MIT Press
20
LIWC (Linguistic Inquiry and Word Count)
Home page: 2300 words, >70 classes Affective Processes negative emotion (bad, weird, hate, problem, tough) positive emotion (love, nice, sweet) Cognitive Processes Tentative (maybe, perhaps, guess), Inhibition (block, constraint) Pronouns, Negation (no, never), Quantifiers (few, many) $30 or $90 fee Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC Austin, TX
21
MPQA Subjectivity Cues Lexicon
Home page: 6885 words from 8221 lemmas 2718 positive 4912 negative Each word annotated for intensity (strong, weak) GNU GPL Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005. Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
22
Bing Liu Opinion Lexicon
Bing Liu's Page on Opinion Mining 6786 words 2006 positive 4783 negative Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005. Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
23
Bing Liu Opinion Lexicon
Bing Liu's Page on Opinion Mining 6786 words 2006 positive 4783 negative Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005. Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
24
SentIta A semiautomatic lexicon for the determination of the individual words’ Prior Polarity: Hand-tagged Nooj dictionary of adjectives (5000+ entries); Automatically created Nooj dictionary of adverbs, derived from the adjectives one (3500+ entries) Hand-made dictionary of verbs of sentiment derived from the Psycological Predicates classes 41, 42, 43 (650+ entries) Hand-tagged dictionary of sentiment nouns manually derived thanks to the nominalization of the just mentioned verb classes (1300+ entries) Manually evaluated list of frozen sentences that contains adjectives (500 entries) .
25
Adjectives of Sentiment
Introduction Resources Evaluation Conclusions Lexical Resources Adjectives of Sentiment Adjective Translation Tag Description Score Evaluation Scale meraviglioso wonderful +POS+FORTE Strongly positive +3 divertente funny +POS Positive +2 accettabile acceptable +POS+DEB Weakly positive +1 insapore flavourless +NEG+DEB Strongly negative -1 cafone bumpkin +NEG Negative -2 disastroso disastrous +NEG+FORTE Weakly negative -3 Adjective Translation Tag Description Score Strenght scale straripante overflowing +FORTE Strong +1 episodico episodic +DEB Weak -1 In order to obtain two separate scales for the evaluation of the strength and of the polarity, every entry of the lexicon of sentiment has been weighed combining four tags: +POS (positive), +NEG (negative), +FORTE (intense) and +DEB (weak). This way we created an evaluation scale from -3 to +3 and a strength scale from -1 to +1.
26
Adverbs of Sentiment Introduction Resources Evaluation Conclusions
Lexical Resources Adverbs of Sentiment e.g. meraviglioso,A+POS+FORTE → meravigliosamente,AVV+POS+FORTE As we anticipated, thanks to the morphological grammar showed here, it has been possible to derive the dictionary of Sentiment Adverbs from the Adjectives one. All the adverbs contained in the Italian dictionary of simple words have been put in a Nooj text and the above-mentioned grammar has been used to quickly populate the new dictionary by extracting the ones ending with the suffix -mente, and by making such words inherit the adjectives’ polarity. The produces list of sentiment adverbs have been manually checked, in order to adjust the grammar’s mistakes and in order to add the Prior Polarity to the adverbs that did not ended with the suffix used in the grammar.
27
Psychological Semantic Predicates
Introduction Resources Evaluation Conclusions Lexical Resources Verbs and Names of Sentiment Class Psychological Semantic Predicates Nominalizations 41 angosciare,V+FLX=V4+NEG+FORTE+41 to anguish angoscia,N+FLX=N45+NEG+FORTE+41 anguish 42 piacere,V+FLX=V37+POS+42 to like piacere,N+FLX=N5+POS+42 pleasure piacevolezza,N+FLX=N41+POS+42 pleasantness 43 amare,V+FLX=V3+POS+FORTE+43 to love amorevolezza,N+FLX=N41+POS+43 kindness amore,N+FLX=N41+POS+FORTE+43 love innamoramento,N+FLX=N41+POS+43 infatuation 43B biasimare,V+FLX=V3+NEG+43B to blame biasimo,N+FLX=N5+NEG+43B blame The verbs chosen for our sentiment lexicon are the Psychological Semantic Predicates belonging to the Italian Lexicon-grammar classes 41, 42, 43 and 43B. The nominalizations of these predicates have been used to manually build the Sentiment dictionary of names.
28
N0 essere (Agg + Ppass) Prep C1
Introduction Resources Evaluation Conclusions Lexical Resources Frozen sentences of Sentiment N0 Agg come C1 Il pavimento è lucido come uno specchio “You can see your face in the floor” Intensely Positive N0 essere (Agg + Ppass) Prep C1 Max è matto da legare “Max is so crazy he should be locked up” Intensely Negative N0 essere Agg e Agg Max è bello e fritto “Max is cooked” C0 essere Agg (come C1 + E) La coscienza è sporca ↔ Mary ha la coscienza sporca “The conscience is guilty ↔ Mary has a guilty conscience” Negative N0 essere C1 Agg Mary è una gatta morta “Mary is a cock tease” Weakely Negative In the end, 500+ Italian frozen sentences containing adjectives have been evaluated and then formalised with a pair of dictionary-grammar. It is interesting to notice that the 84% of the idioms has a clear SO, while just the 36% of the adjectives they contain is polarised.
29
Frozen sentences of Sentiment
Introduction Resources Evaluation Conclusions Lexical Resources Frozen sentences of Sentiment Intensification +2 → +3 Mary è bella[A+POS] come il sole. “Mary is as beautiful as the sun.” Intensely Positive sentence Polarization 0 → -2 Mary è bianca[A+NEUTRAL] come un cadavere. “Mary is as white as a dead body.” (Mary is pale) Negative sentence Switching +2 → -2 Mary è agile[A+POS] come una gatta di piombo. “Mary is as agile as a lead cat.” (Mary is not agile.) Among the idioms considered, there are the comparative frozen sentences of the type N0 Agg come C1 that usually intensify the polarity of the adjective of sentiment they contain, as happens in the first example. Otherwise, it is also possible for an idiom of that sort to be polarised when the adjective contained in it is neutral, or even to reverse its polarity, as exemplified in the third sentence.
30
Frozen sentences of Sentiment
Introduction Resources Evaluation Conclusions Lexical Resources Frozen sentences of Sentiment An extract of the more complex grammar that recognises and automatically evaluates Sentiment Idioms is reported in this slide.
31
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Contextual Valence Shifter Grammar In order to put our sentiment words in context, finding the Semantic Orientation of sentences written in natural language, a grammar net that computes the polarity of the opinion lexicon has been built with Nooj. Adjectives, adverbs, nouns and verbs have been treated separately in four dedicated metanodes. A fifth metanode has been dedicated to domain-independent sentiment expressions that are not built around specific sentiment words, but must be considered opinion indicators as well.
32
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Contextual Valence Shifter Grammar The Sentiment Pattern Extraction and Annotation are performed using six different metanodes, which are enclosed in every matanode of the main graph. In this work, metanodes work as “boxes” for the Sentiment Expressions, that receive the same label if they are embedded in the same Sentiment box. We considered as Contextual Valence Shifters Negation, Intensification, Modality and Comparison. Using metanodes as polarity boxes
33
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Contextual Valence Shifters: Negation Negative operators: non, “not”, mica, per niente, affatto, “not at all” Negative quantifiers: nessuno, “nobody” niente, nulla, “nothing” Lexical negation: senza, “without”, mancanza di, assenza di, carenza di, “lack of” Negation Switching +2 → -2 La Citroen non[Negative_Operator] produce auto valide[A+POS] “Citroen does not produce efficient cars” Negative Negation Shifting +3 → -1 Grafica non[Negative_Operator] proprio spettacolare[A+POS+FORTE] “Not quite spectacular graphic” Weakly Negative Negation and Intesification +2 → +3 Personale alla reception non[Negative_Operator] sempre[AVV+FORTE] gentile[A+POS]. “Not always kind desk clerks.” As exemplified in the following sentences extracted from a sentiment corpus that will be described in Section X, negation indicators not always change a sentence polarity in its positive or negative counterparts; they often have the effect of increasing or decreasing the sentence score (11). That is why we prefer to talk about valence “shifting” rather than “switching”. We avoided the most used, but complex (and often misleading) mathematical calculations between the words sentiment scores. We instead put into the appropriate “box” the patterns built combining all these negation indicators with the sentiment words. Benamara, F., Chardon, B., Mathieu, Y., Popescu, V., & Asher, N. (2012). How do negation and modality impact on opinions?. In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (pp ). Association for Computational Linguistics.
34
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Contextual Valence Shifters: Intensification Adverb-Adjective -2 → -1 Parzialmente[AVV+DEB] deludente[A+NEG] anche il reparto degli attori. “Partially unsatisfying also the actor staff.” Weakly Negative Adjective-Noun -2 → -3 Ciò che ne deriva (...) è una terribile[A+NEG] confusione[N+NEG] narrativa. “What comes from it is a terrible narrative chaos.” Intensely Negative Adverb-Verb +2 → +3 Alla guida ci si diverte[V+POS] molto[AVV+FORTE]. “In the driver's seat you have a lot of fun.” Intensely Positive Adverb-Adverb Ne sono rimasta molto[AVV+FORTE] favorevolmente[AVV+POS] colpita “I have been very favourably affected of it” Intensely Positive In order to take Intensification into account, we combined in the grammar the words belonging to the strength scale with the sentiment words listed in the evaluation scale. In general, the adverbs intensify or attenuate adjectives, verbs and other adverbs, while the adjectives modify the intensity of nouns.
35
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Contextual Valence Shifters: Intensification Repetition +2 → +3 Hotel meraviglioso[A+POS+FORTE], lussuoso[A+POS] e impeccabile[A+POS+FORTE] “Wonderful, luxurious, flawless hotel” Intensely Positive Superlative Questo smartphone ha un bellissimo[A+POS+SUP] display “This smartphone has a wonderful display” False Intensifiers/downtoners 0 → -2 Samsung S4 è troppo[AVV+FORTE] delicato[A+NEUTRAL] “Samsung S4 is too delicate” Negative 0 → -1 I personaggi sono poco[AVV+DEB] delineati[A+NEUTRAL] “the characters are poorly outlined” Weakly Negative The repetition of more than one negative or positive words or the use of absolute superlative affixes affect the words’ Prior Polarity. A word that at a first glance seems to be an intensifier, but at a deeper analysis reveals is negative attitude is troppo, “too much”, that, in our corpus, turns the 84% of its patterns into Negative expressions. It works as an Intensifier only when it occurs with Positive adjectives. That is why it does not appear in the Dictionaries, it has just been used into the positive boxes of the Adjective matanode. A similar thing happens with poco, “not much” when it appears with positive adjectives. In the 86% of its occurrences, it converts the patterns in into Weakly Negative expressions. These are the reasons why they have been excluded from the intensifier list and have been treated as other lexicon-independent sentiment indicators. Kennedy, A., & Inkpen, D. (2006). Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22(2), Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1-10). Springer Netherlands.
36
Grammatical Resources
Contextual Valence Shifters: Modality Modal verbs and Imperfect tense +2 → -1 Poteva[Modal+IM] essere una trama interessante[A+POS] “It could be an interesting plot” Weakly Negative Contextual Valence Shifters: Conditional tense Modal verbs and Conditional tense 0 → -1 Non avrei[V+C] dovuto[Modal+PP] buttare via i miei soldi “I should not have burnt my money” Negative Modal verbs occurring with an imperfect tense can turn a sentence into a Weakly Negative one when combined with positive words. Conditional sentences also have a particular impact on the opinion polarity, especially with the modal verbs. In such cases, as exemplified in the sentence reported below, the sentence polarity is always negative in our corpus.
37
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Contextual Valence Shifters: Comparison Comparative frozen sentences N0 Agg come C1 +2 → +3 Mary è bella[A+POS] come il sole. “Mary is as beautiful as the sun.” Intensely Positive Comparative sentences 0 → +2 L'S3 è complessivamente superiore all'Iphone5 “The S3 is on the whole superior to the iPhone5” Positive Comparative Superlative sentences Il suo motore era anche il più brioso[A+POS] “Its engine was also the most lively” 0 → -3 Un film peggiore di qualsiasi telefilm. “A movie worse than whatever television series” Intensely Negative As far as the comparative sentences are concerned, we considered in this work the already mentioned comparative frozen sentences; some simple comparative sentences that involve the expressions meglio di, migliore di, “better than”, peggio di, peggiore di, “worse than”, superiore a, “superior to” inferiore a, “less than” the comparative superlative, that confers to the first term of the comparison the higher polarity score, so it always increases the strength of the opinion. Thus, its polarity can be -3 or +3.
38
Grammatical Resources
Introduction Resources Evaluation Conclusions Grammatical Resources Domain-independent Sentiment Expressions valerne la pena[POS] “to be worthwhile” essere (dotato + fornito + provvisto) di[POS] “to be equipped with” grazie a[POS] “thanks to” essere un (aspetto + nota + cosa + lato) negativo[NEG] “to be a negative side” non essere niente di che[POCONEG] “to be nothing special”; tradire le (aspettative + attese + promesse)[NEG] “not live up to one's expectations” In this slide we introduce the idea that, in order to reach high levels of Recall, the lexicon-based patterns require the support of lexicon independent expressions. This is the case in which one can see the importance of the Finite-state automata. Without them it would be really difficult and uneconomical to provide the machine with concise instructions to correctly recognise and evaluate some kind of opinionated sentences that can often reach high levels of variability.
39
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Corpus of Costumer Reviews Cars Smartphones Books Movies Hotels Videogames TOT Negative Documents 50 300 Positive Documents Text Files 20 120 Word Forms 17163 19226 8903 37213 12553 5597 101655 Tokens 21663 24979 10845 45397 16230 7070 126184 Cars: Smartphones: alatest.it), Books: Movies: Hotels: it.hotels.com, Videogames: The dataset used to evaluate our tools has been built using Italian opinionated texts in the form of users’ review and comments. It contains 600 texts units and refers to six different domains, for all of which different e-commerce and opinion websites have been exploited. Each domain contains 50 positive and 50 negative texts.
40
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Corpus of Costumer Reviews Positive opinions: 4 or 5 stars Negative opinions: 1 or 2 stars We did not read in advance the reviews, we extracted their polarity on the base of the stars selected by the Opinion Holedrs. Positive opinions are the ones that received 4 or 5 stars by the users. The negative ones had 1 or 2 stars.
41
Experiment and results
DOXA: a Nooj based Opinion Classifier Using the command-line program noojapply.exe, we built a prototype written in JAVA by which users can automatically apply our resources to every kind of opinionated text. With it, we sum up the values corresponding to every sentiment expression and, then, we standardize the result for the total number of sentiment expressions contained in the review. Doxa compares this value with the stars that the Opinion holder gave to his review and provides statistics about the opinions expressed in every domain. In this slide we report the analysis made up on the domain of hotels reviews.
42
Experiment and results
DOXA: a Nooj based Opinion Classifier Using the command-line program noojapply.exe, we built a prototype written in JAVA by which users can automatically apply our resources to every kind of opinionated text. With it, we sum up the values corresponding to every sentiment expression and, then, we standardize the result for the total number of sentiment expressions contained in the review. Doxa compares this value with the stars that the Opinion holder gave to his review and provides statistics about the opinions expressed in every domain. In this slide we report the analysis made up on the domain of hotels reviews.
43
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Sentence-level performance PRECISION* (%) Cars Smartphones Movies Books Hotels Videogames Average Sentence-level A 88 90 79 87,5 91,5 83,5 86,6 ADV 80,4 75,8 87,9 92,3 92,0 50* 79,7 N 81,8 85,7 82,8 77,8 85,3 85,7* 83,2 V 88,2* 57,1* 84,8 89,5 100* 79,5 D-ind 78,1 76,7 90,0 94,7 78,4 82,5 In the next slides we will present the results pertaining to the Nooj output, which have been produced applying the sentiment resources described in the previous paragraphs to our corpus of costumer reviews. Because our lexical and grammatical resources are not domain-specific, we observed their interaction with every single part of the corpus, which is composed of many different domains, each one of them characterised by its own peculiarities. Moreover, in order to verify the performances of every part of speech, we checked the Precision applying separately every single metanode (A, ADV, N, V, D-ind) of the main graph of the sentiment grammar. The values marked by the asterisks have been reported to be thorough, but they are not really relevant because of the small number of concordances on which they have been calculated. *P= 𝑇𝑃 𝑇𝑃+𝐹𝑃
44
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Sentence-level performance Not pertinent matches Cars Smartphones Movies Books Hotels Videogames Average Sentence-level A -5 -5,5 -28 -10,5 -1 -1,5 -8,6 ADV -2,2 -29,7 -7,7 -6,6 N -11,4 -14,3 -39,5 -14,8 -5,9 - 14,3 -16,7 V -17,6 -15,8 -5,6 D-ind -13,3 -6,7 -2,5 -5,3 -6,1 -5,4 -4,0 -25,6 -11,1 -1,9 -4,2 -8,7 Sometimes polarised sentences and phrases are not opinion indicators; expecially in the reviews of movies and books, in which polariesed sentences can just refer to the plot. Le belle case sono dimora della degenerazione più bieca[MOLTONEG] “Pretty houses hosts the grimmest degeneracy”. Apprezzo[V+POS] molto[AVV+FORTE] Amazon per questo “I really appreciate Amazon for this” Anyway, in many cases, just evaluating the polarity and the intensity of the expressions contained into a document is not enough to determine whereas such expressions truly contribute to the identification of the polarity and the intensity of the document itself. In this work we chose to exclude sentences like the first one from the correct matches, because it is not an opinion, it refers to the plot of a movie. We, instead, considered worthwhile to include the second one, because of its high influence on the numerical value that the opinion holder confer to his own opinion.
45
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Sentence-level performance Not pertinent matches Cars Smartphones Movies Books Hotels Videogames Average Sentence-level A -5 -5,5 -28 -10,5 -1 -1,5 -8,6 ADV -2,2 -29,7 -7,7 -6,6 N -11,4 -14,3 -39,5 -14,8 -5,9 - 14,3 -16,7 V -17,6 -15,8 -5,6 D-ind -13,3 -6,7 -2,5 -5,3 -6,1 -5,4 -4,0 -25,6 -11,1 -1,9 -4,2 -8,7 Sometimes polarised sentences and phrases are not opinion indicators; expecially in the reviews of movies and books, in which polariesed sentences can just refer to the plot. Le belle case sono dimora della degenerazione più bieca[MOLTONEG] “Pretty houses hosts the grimmest degeneracy”. Apprezzo[V+POS] molto[AVV+FORTE] Amazon per questo “I really appreciate Amazon for this” We chose to present these two results separately because this way it has been possible to distinguish the domains that suffer more this problem (movies, books) from the ones that are less influenced by the “pertinence” problem (hotels, smartphones).
46
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Sentence-level performance PRECISION (%) Cars Smartphones Movies Books Hotels Videogames Average Sentence-level A 80,0 84,5 51,0 77,0 90,5 82,0 77,5 ADV 78,2 75,8 58,2 84,6 92,0 50,0 73,1 N 70,4 71,4 43,3 63,0 79,4 66,5 V 88,2 57,1 67,2 73,7 100,0 73,9 D-ind 79,3 83,5 64,8 70,0 87,5 89,4 79,1 79,2 74,5 56,9 81,3 78,6 74,0 The adjectives’ patterns cover the 81% of the total number of occurrences (almost 5000 matches), while the adverbs, the nouns and the verbs reach, respectively, a percentage of 4%, 6%, and 2%. The remaining 7% is covered by the domain independent expressions, that, in any case, contribute to the achievement of satisfactory levels of Recall. This table corrects the results reported in by excluding from the correct matches the ones that do not give their contribution to the individuation of the correct document SO.
47
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Sentence-level performance PRECISION (%) Cars Smartphones Movies Books Hotels Videogames Average Sentence-level A 80,0 84,5 51,0 77,0 90,5 82,0 77,5 ADV 78,2 75,8 58,2 84,6 92,0 50,0 73,1 N 70,4 71,4 43,3 63,0 79,4 66,5 V 88,2 57,1 67,2 73,7 100,0 73,9 D-ind 79,3 83,5 64,8 70,0 87,5 89,4 79,1 79,2 74,5 56,9 81,3 78,6 74,0 RECALL* (%) Cars Smartphones Movies Books Hotels Videogames Average Sentence-level 72,7 79,6 64,8 65,7 72,1 58,8 69,0 A summary of the Sentence-level performance of our lexical and grammatical resources, that includes also the Recall, is given in this slide. *R= 𝑇𝑃 𝑇𝑃+𝐹𝑁
48
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Document-level performance Document-level (%) Cars Smartphones Movies Books Hotels Videogames Average PRECISION POL 71,0 72,0 63,0 74,0 91,0 PRECISION INT 32,0 45,0 25,0 33,0 49,0 34,0 36,3 RECALL POL 100 98,6 96,1 98,9 91,2 97,5 POL: True Positives are the documents that have been correctly classified by Doxa, with a polarity attribution that corresponds to the one specified by the Opinion Holder. INT: True Positives are the document that received by Doxa exactly the same stars specified by the Opinion Holder. As far as the document-level performance is concerned, we calculated the precision twice by considering in a firts case as true positive the reviews correctly classified by Doxa on the base of their polarity and in a second case by considering as true positive the documents that received by our tool exactly the same stars specified by the Opinion Holder. As we can see, the latter seems have a very low precision….
49
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Document-level performance Corrections. Polarity Switching: 4 stars → 2 stars “Honestly the game is pretty boring and repetitive, missions seem all the same and the online version is pretty perplexing, due to the largeness of the map, and to the small number of players per session, it is really difficult to meet enemies”. Reshaping. Polarity Shifting : 4 stars → 3 stars ... but at a deeper analysis we discover that is really common for the Opinion Holders to write in their reviews texts that do not perfectly correspond to the stars they specified. That increases the importance of a software like Doxa, that does not stop the analysis on the structured data, but enters the semantic dimension of texts written in Natural language. “Everything really nice, except for the closet door in the room that had a large scrape (absolutely not fitting with the 5 stars furnishings!)”. Other disappointment: SPA not furnished with steam bath. In the end: receptionists not always kind and willing”.
50
Experiment and results
Introduction Resources Evaluation Conclusions Experiment and results Evaluation of the Doxa performances Sentence-level (%) Cars Smartphones Movies Books Hotels Videogames Average PRECISION* 79,2 74,5 56,9 73,7 81,3 78,6 74,0 RECALL 72,7 79,6 64,8 65,7 72,1 58,8 69,0 F-measure** 75,8 77,0 60,6 69,5 76,4 67,3 71,4 Document-level (%) Cars Smartphones Movies Books Hotels Videogames Average PRECISION 71,0 72,0 63,0 74,0 91,0 RECALL 100 98,6 96,1 98,9 91,2 97,5 F-measure** 83,0 83,2 77,3 83,6 94,8 80,5 84,1 A summary of the results obtained by the tool is given here. As we can see, even though the sentence-level Recall is not so high, the one achieved in the document-level analiysis is more than enough. Taking the F-measure into account, the best results were achived with the smartphone’s domain in the sentence-level task and with the hotel’s dataset into the document-level performance. * Average values ** F-measure=2∗ 𝑃∗𝑅 𝑃+𝑅
51
Introduction Resources Evaluation Conclusions
What we must improve: Bigger Dictionary of Verbs and Nouns I bottoncini per tirare su e giù i finestrini si indeboliscono e si logorano “The little buttons that pull up and down the window become weak and threadbare” Negative Ho provato una fortissima antipatia per il personaggio di Bella “I felt a very strong dislike for the character of Bella” Intensely Negative Sentiment Dictionary of bad words È un gioco di merda! “It is a crappy game!” Sentiment Dictionary of multiword expressions Ci sono una miriade di giri di parole e parti inutili “There is a mass of roundabout expressions and useless sections” Domain-dependent expressions L'albergo è situato in zona Monti vicinissimo alla fermata metro Cavour “The hotel is situated in the area Monti really close to the Cavour subway stop” Positive Il telefono si spegne all'improvviso “The phone turns off itself all of the sudden” We conclude this presentation anticipating the future line of action that our research will take: We will enlarge the dictionary of verbs and nouns; We’ll build a sentiment dictionary of bad words; We’ll provide the dictionary of multiword expressions with annotation of sentiment; And we’ll build a grammar of sentiment expression that is specific for each domain.
52
Introduction Resources Evaluation Conclusions
What we, perhaps, can not improve: Irony +2 → -2 La ripresa è degna[A+POS] di un trattore con aratro inserito. “The pickup is worthy of a tractor with an inserted plough” Negative E quel tocco di piccante (...) è gradevole[A+POS] quanto lo sarebbe una spruzzata di pepe su un gelato alla panna. “And the touch of piquancy (…) is as pleasant as a spattering of pepper on a cream flavoured ice-cream” Stereotypes +2 → -1 La nuova fiat 500 è consigliabile[A+POS] molto di più ad una ragazza. “The new Fiat 500 is recommended a lot more to a girl” Weakly Negative 0 → -1 Un gioco per bambini di 12 anni. “A game for 12 years old child” Irony and cultural stereotypes remain an open problem for the NLP in general and for the sentimen analysis. For the moment we decided to give up with them, but we do not exclude that in the next feature we will try to face also these challanges.
53
Thank you Annibale Elia Daniela Guglielmo Alessandro Maisto
Daniela Guglielmo Alessandro Maisto Serena Pelosi
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.