Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Wednesday 18 February 2009 The Languages of Emotion and Financial News Ann Devitt Khurshid Ahmad
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Engle Ng (1993) Asymmetry Curve
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline Current psychological theory of emotion Evaluation of lexical “emotion” resources Corpus analysis of language of “emotion”
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline Current psychological theory of emotion Evaluation of lexical “emotion” resources Corpus analysis of language of “emotion”
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Cognitive Theory of Emotion: Categorical Ekman (1975)
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Cognitive Theory of Emotion: Dimensions Osgood / Russell Evaluation Activity Potency Mehabrian PAD Pleasure Activation Dominance
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Cognitive Theory of Emotion Watson and Tellegen (1985)
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline Current psychological theory of emotion Evaluation of lexical “emotion” resources Corpus analysis of language of “emotion”
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation SentiWordNet Whissel WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Whissel Lexical Resource Evaluation Senti WordNet SentiWordNet WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Senti WordNet WordPositiveValNegativeVal Happy Sad terms Evaluation dimension scale: Low average: Pos=0.18, Neg=0.23 More extreme Neg values Error-prone: rude (pos 0.875), gladsome (neg 0.875)
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation General Inquirer SentiWordNet Whissel WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation General Inquirer ECSTATIC Pos Pleasure SORROWFULNeg Pain Hand-coded, content analysis basis 8641 terms 184 binary categories (including MAB dimensions) Negative > Positive Active > Passive Strong > Weak
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Whissel Dictionary of Affect SentiWordNet Whissel WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Whissel Dictionary of Affect WordEvalActivImag great disastrous Corpus selection, hand-coded 8742 terms Dimensional representation: 1-3 scale Evaluation, Activation, Imagery
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation WordNet Affect SentiWordNet Whissel WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation WordNet Affect WordBinaryFeatures Lonelinesscognitive state, emotion Happinesscognitive state, emotion 5432 terms Domains of emotional experience No Polarity Short-term: Mood, Manner Long-term: Attribute, Trait
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Lexical Overlap Are the lexica consistent? Are they mutually exclusive? Dice, Jaccard, Asymmetric coefficients
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Whissel Lexical Resource Evaluation Lexical Overlap SentiWordNet WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin SentiWordNet Lexical Resource Evaluation Lexical Overlap General Inquirer Whissel WNA 1.Statistically significant agreement for Polarity Assignment (Chi square test) 2.Very weak correlation for activation features.
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin SentiWordNet Lexical Resource Evaluation Lexical Overlap General Inquirer Whissel WNA 1.Weak correlation of SWN with Whissel evaluation 2. No correlation with Whissel activation dimension 3. SWN positive negatively correlated with imageability
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin SentiWordNet Lexical Resource Evaluation Lexical Overlap WNA General Inquirer Whissel 1.SWN tends to negative for short term WNA features 2.SWN tends to positive for long- term WNA features
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Whissel Lexical Resource Evaluation Lexical Overlap SentiWordNet WNA General Inquirer
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Lexical Overlap WNA feature division: Short-termLong-term NegativePositive PhysicalCognitive More activeLess active InternalExternal Less abstractMore concrete
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Some conclusions The lexica: Are quite consistent Can be used in combination SentiWN: Largely unexplored territory
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline Current psychological theory of emotion Evaluation of lexical “emotion” resources Corpus analysis of language of “emotion” General Language
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Emotion in General Language Corpus Study Aims Does “emotion” constitute a distinct sub- language? Is there a polarity bias in General Language? (the Polyanna Hypothesis of Boucher and Osgood) What is the impact of using different lexica?
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis The Data BNC 100 million words Balanced, broad corpus
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Methodology Is emotion a distinct sub-language? Examine distribution type Examine distribution spread Bootstrap sampling distribution
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Distribution Type Zipfian: BNC=Emotion Lexica
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Distribution shape Comparison of means: student t-test BNC ≠ Emotion Lexica (p<0.000) Different sample means 5-30 times more frequent than gen. language Assumptions of test?
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Bootstrap Sampling Distribution Are sentiment-bearing terms a statistically distinct and highly frequent subset of English? 1000 random samples of terms from BNC Sample size = size of sentiment lexicon H0: Observed sample falls inside within 95% of bootstrap random sampling distribution of means
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Bootstrap Sampling Distribution Are sentiment-bearing terms a statistically distinct and highly frequent subset of English? For all lexica: Mean term frequency of lexicon well outside 95% Sentiment lexica are not representative of BNC (p<0.05)
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Sentiment Features Is there a polarity bias in General Language? Positive polarity bias Statistically significant for all lexica (χ2 test of independence)
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Sentiment Features Is there a polarity bias in General Language when you include intensity of polarity? Positive polarity bias Statistically significant for all lexica χ2 = 158.5, df=1, p< for General Inquirer χ2 = 63.6, df=1, p< for Whissel
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Some conclusions Sentiment-bearing terms are a distinct subset of English Positive polarity bias in BNC General Inquirer and Whissel: Low coverage and high frequency SentiwordNet: Wide coverage and much lower frequency
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline Current psychological theory of emotion Evaluation of lexical “emotion” resources Corpus analysis of language of “emotion” Comparative
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Aims Examine affective term use Identify statistically different distributions Is there a dominant feature/polarity?
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis The Data Financial Language 2 million words On-line financial news: Reuters, CNN, Bloomberg Newspapers General Language BNC 100 million words Balanced, broad corpus
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis The Data BNC sub-corpora Imaginative written English 16 million words Informative written English 70 million words
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Methodology Compare proportions of Sentiment Features χ 2 Test of Independence H 0 : π FinCorpus = π BNC
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Methodology Statistical significance of different proportion χ 2 > p >= Features: 41 Lexicon Sentiment Features from 4 lexica Frequency per million words
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Financial Corpus WRT Imaginative: More affective terms WRT Informative: Many more affective terms WRT BNC: Dependent on feature type Distributions are statistically distinct
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Positive GI Features
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Positive GI Features
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Negative GI Features
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Negative GI Features
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Negative GI Features
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Some conclusions Lexical resources for sentiment are consistent Financial news is a sub-language: Affective content is statistically distinct relative to general language Text polarity is asymmetric, positive skew Different skews for different domains
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Something to think about If different language varieties and domains have distinct use of sentiment terms and their own polarity bias: Individual sentiment values are not informative So what do we need??
Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Thank You!