Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University

Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University potemkin@philol.msu.ru

Terms Sentiment ◦A thought, view, or attitude, especially one based mainly on emotion instead of reason Sentiment Analysis (opinion mining) ◦ use of natural language processing (NLP) and computational techniques for extraction or classification of sentiment from (unstructured) text

What for? Consumer information ◦Product reviews ◦Consumer attitudes ◦Trends Politics ◦Politicians want to know voters’ views ◦Voters want to know policitians’ intentions and who else supports them Social Find like-minded individuals or communities Financial Predict market trends given the current opinions

Features Which features to use? ◦Words (unigrams) ◦Phrases/n-grams ◦Sentences How to interpret features for sentiment detection? ◦Bag of words ◦ Annotated lexicons (WordNet, SentiWordNet) ◦Syntactic patterns ◦Paragraph structure

Challenges Harder than topical classification, with which bag of words features perform well Must consider other features due to… ◦Ambiguity of sentiment expression  irony  expression of sentiment using neutral words  … many others ◦Domain/context dependence  words/phrases can mean different things in different contexts and domains ◦Effect of syntax on semantics

Formal description Semantic orientation of a sentence expressed by a ternary predicate: O(subject, object, sentiment) sentiment = {bad, neutral, good} ◦i.e., ◦ the subject of assessment considers the object of assessment to be good or bad (or neutral = not a sentiment)

Sentiment expression in NL Predicate O may be expressed explicitly: (Vania likes Masha) - only the surface syntactic analysis is needed: Vania (subj) likes (sentiment) Masha (obj) to determine its semantic orientation (SO). The common case is quite different: (Vania suffers from Masha’s absence) – both suffer and absence are negative but the sense is equivalent.

Bag of words vs. syntagma an. Bag of words (number of positive and negative words) gives good results for large texts Syntagma = a phrase forming a syntactic unit, say modifier (X) + keyword (Y) i.e. adjective+noun or adverb+verb Signature of syntagma SO = sgn(X,Y,neg/0/pos).

SO Calculus  X,Y.[sgn(X,Y,pos)  dep(mod,X,Y),sgn(X,pos),sgn(Y,pos)].(a) i.e. if X,Y positive then X+Y positive  X,Y,Z.[sgn(X,Y,Z)  dep(mod,X,Y),sgn(X,0),sgn(Y,Z)]. (b) i.e. if X pos., Y neut. then X+Y pos.  X,Y,Z.[sgn(X,Y,Z)  dep(mod,X,Y),sgn(X,Z),sgn(Y,0)]. (c)

Different orientation of syntagma constituent words sgn(безумная,радость,pos)= sgn(mad,happyness,pos), sgn(бешеный,успех,pos)= sgn(furious,success,pos), sgn(солидный,ущерб,neg)= sgn(considerable,damage,neg), sgn(хороший,нагоняй,neg)= sgn(good,scolding,neg). [Kustova, 1]

Ambigoues cases sgn(худой,мир,?), sgn(добрая,война,?) sgn (bad,peace,?), sgn (good,war,?) The expression "a bad peace is better than a good war," establishes an order relation "better" among its member attributive constructions, but one can assume that both are bad, i.e., sgn sgn(bad,peace,neg), sgn(good,war,neg). In some other context, "good war" could be perceived as a positive phenomenon.

Double negative Logical rule of double negation : *  X,Y,Z.[sgn(X,Y,pos)  dep(mod,X,Y),sgn(X,neg),sgn(Y,neg)]. fails in NL: weak opponent, impotent aggressor, toothless criticism (neut.) or bitter sorrow, blatant outrage, brutal torture (neg.)

Syntagma evaluation Methods: expert evaluations performed by several independent experts [Osgood,2], who are asked to mark up SO of isolated words and syntagma, assigning them a label {pos/0/neg} corpus techniques, performed on an sentiment-annotated corpus [Zagibalov,3], SentiWordNet

SentiWordNet Based on WordNet “synsets” ◦http://wordnet.princeton.edu/ Ternary classifier ◦Positive, negative, and neutral scores for each synset Provides means of gauging sentiment for a text

SentiWordNet: Construction Created training sets of synsets, L p and L n ◦Start with small number of synsets with fundamentally positive or negative semantics, e.g., “nice” and “nasty” ◦Use WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand L p and L n over K iterations ◦L o (objective) is set of synsets not in L p or L n Trained classifiers on training set ◦Rocchio and SVM ◦Use four values of K to create eight classifiers with different precision/recall characteristics ◦As K increases, P decreases and R increases

SentiWordNet: Results 24.6% synsets with Objective<1.0 ◦Many terms are classified with some degree of subjectivity 10.45% with Objective<=0.5 0.56% with Objective<=0.125 ◦Only a few terms are classified as definitively subjective Difficult (if not impossible) to accurately assess performance

Corpus-based method Sentiment annotated corpora (English and Russian) of approx. 1500 short utterances concerning popular books. Each utterance contains from 1 to 15 sentences and was marked with a label {neg / pos}.

Corpus processing - Stemming and determination of morphological characters of each word (without morphology disambiguation); - Parse with obtaining the dependency tree for each sentence [Potemkin, 4]; - Joining the particle "no/not" to the associated word (not understand => not_understand) - Selection of constructions modifier+key word (adjective+noun, adverb+verb); - Counting the number of occurrences for each key word = nverb,

Corpus processing (continued) - Counting the number of occurrences in the positive-marked utterances = nvp and negatively labeled utterances = nvn - Calculation of the normalized assessment factor for each key word kv = (nvp-nvn) / nverb; - The same calculations for each modifier to give the normalized assessment factor kd, and for each syntagma in the corpus - the normalized assessment factor ks.

Assessment thresholds Assessment factors ks  [-1,1], ks  [-1, -0.6) = neg; ks  [-0.6, 0.6] = 0; ks  (0.6, 1] = pos

Table of syntagma signatures neg -key0 -keypos -key neg -modneg not_palatable demagogy pos –defeated enemy neg uninteresting book pos forgotten kingdoms neg banal action-film pos secondery pleasure 0 -modneg star fever; pos imminent defeat; neg unexpected level. pos only book. neg. late success pos continues growth pos -modneg happy end pos fine rubbish neg good intentions pos pleasant book neg sweet honey pos best masterpiece

Histogram of syntagma distribution over the texts

Histogram of the 1 st word of syntagma distribution

Histogram of the 2 nd word of syntagma distribution

Conclusion The report presents considerations for determining the sentiment of syntagma on the basis of evaluation of the signature of its constituent words for structures such as adjective+noun, verb+adverb. Logical formulas specifying the calculation of semantic orientations are listed. An experiment over the semantically annotated sentences was performed. The further research concerning predictive syntagma of type subject + verb + object will be undertaken.

References http://dict.ruslang.ru/magn.php Charles E. Osgood, George Suci, & Percy Tannenbaum, The Measurement of Meaning. University of Illinois Press, 1957. http://www.informatics.sussex.ac.uk/users /tz21/ http://www.informatics.sussex.ac.uk/users /tz21/ http://sunsite.informatik.rwth- aachen.de/Publications/CEUR-WS/Vol- 476/paper6.pdf http://sunsite.informatik.rwth- aachen.de/Publications/CEUR-WS/Vol- 476/paper6.pdf

Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University

Similar presentations

Presentation on theme: "Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University

Similar presentations

Presentation on theme: "Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University"— Presentation transcript:

Similar presentations

About project

Feedback