Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Wednesday 18 February 2009 The Languages of Emotion and Financial News Ann Devitt Khurshid.

Slides:



Advertisements
Similar presentations
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Advertisements

AP Statistics Course Review.
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.
On the Importance of Text Analysis for Stock Price Prediction
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
CHI-SQUARE TEST OF INDEPENDENCE
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Effect of Violations of Normality Edgell and Noon, 1984 On the Correlation Coefficient t-Test.
Self-organizing Conceptual Map and Taxonomy of Adjectives Noriko Tomuro, DePaul University Kyoko Kanzaki, NICT Japan Hitoshi Isahara, NICT Japan April.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Chapter 2: The Research Enterprise in Psychology
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Applying Science Towards Understanding Behavior in Organizations Chapters 2 & 3.
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Research and Statistics AP Psychology. Questions: ► Why do scientists conduct research?  answer answer.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
1 Sentiment Polarity Identification in Financial News: A Cohesion-based Approach Author:Ann Devitt Khurshid Ahmad (School of Computer Science & Statistics,
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Keeping it Cool: Emotional Biases in the English Lexicon Amy Beth Warriner & Victor Kuperman.
Final review - statistics Spring 03 Also, see final review - research design.
Linear correlation and linear regression + summary of tests
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Chapter 16 Data Analysis: Testing for Associations.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.
Chapter 6: Analyzing and Interpreting Quantitative Data
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi Square & Correlation
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect? Alexander Osherenko, Elisabeth André University of Augsburg.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Review of Power of a Test
Inference about the slope parameter and correlation
Chapter 2: The Research Enterprise in Psychology
Causality, Null Hypothesis Testing, and Bivariate Analysis
Lecture8 Test forcomparison of proportion
Sentiment analysis algorithms and applications: A survey
Review 1. Describing variables.
Grey Sentiment Analysis

Test Validity.
Psychology 202a Advanced Psychological Statistics
Statistical significance & the Normal Curve
Statistical Evaluation
Chapter 10 Analyzing the Association Between Categorical Variables
PSYB07 Review Questions: Set 4
Analyzing the Association Between Categorical Variables
Statistics II: An Overview of Statistics
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Wednesday 18 February 2009 The Languages of Emotion and Financial News Ann Devitt Khurshid Ahmad

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Global chipmakers, battling slower technology demand, are betting size matters as they pin their hopes for future growth on small and easy to carry mobile devices such as netbooks and smartphones. Specialised Language of Financial News Bloomberg.com, 18/2/09

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Sentiment and the Markets

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Engle Ng (1993) Asymmetry Curve

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline  Current psychological theory of emotion  Evaluation of lexical “emotion” resources  Corpus analysis of language of “emotion”

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline  Current psychological theory of emotion  Evaluation of lexical “emotion” resources  Corpus analysis of language of “emotion”

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Cognitive Theory of Emotion: Categorical Ekman (1975)

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Cognitive Theory of Emotion: Dimensions  Osgood / Russell  Evaluation  Activity  Potency  Mehabrian PAD  Pleasure  Activation  Dominance

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Cognitive Theory of Emotion Watson and Tellegen (1985)

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline  Current psychological theory of emotion  Evaluation of lexical “emotion” resources  Corpus analysis of language of “emotion”

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation SentiWordNet Whissel WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Whissel Lexical Resource Evaluation Senti WordNet SentiWordNet WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Senti WordNet WordPositiveValNegativeVal Happy Sad  terms  Evaluation dimension scale:  Low average: Pos=0.18, Neg=0.23  More extreme Neg values  Error-prone: rude (pos 0.875), gladsome (neg 0.875)

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation General Inquirer SentiWordNet Whissel WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation General Inquirer ECSTATIC Pos Pleasure SORROWFULNeg Pain  Hand-coded, content analysis basis  8641 terms  184 binary categories (including MAB dimensions)  Negative > Positive  Active > Passive  Strong > Weak

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Whissel Dictionary of Affect SentiWordNet Whissel WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Whissel Dictionary of Affect WordEvalActivImag great disastrous  Corpus selection, hand-coded  8742 terms  Dimensional representation: 1-3 scale  Evaluation, Activation, Imagery

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation WordNet Affect SentiWordNet Whissel WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation WordNet Affect WordBinaryFeatures Lonelinesscognitive state, emotion Happinesscognitive state, emotion  5432 terms  Domains of emotional experience  No Polarity  Short-term: Mood, Manner  Long-term: Attribute, Trait

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Lexical Overlap  Are the lexica consistent?  Are they mutually exclusive?  Dice, Jaccard, Asymmetric coefficients

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Whissel Lexical Resource Evaluation Lexical Overlap SentiWordNet WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin SentiWordNet Lexical Resource Evaluation Lexical Overlap General Inquirer Whissel WNA 1.Statistically significant agreement for Polarity Assignment (Chi square test) 2.Very weak correlation for activation features.

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin SentiWordNet Lexical Resource Evaluation Lexical Overlap General Inquirer Whissel WNA 1.Weak correlation of SWN with Whissel evaluation 2. No correlation with Whissel activation dimension 3. SWN positive negatively correlated with imageability

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin SentiWordNet Lexical Resource Evaluation Lexical Overlap WNA General Inquirer Whissel 1.SWN tends to negative for short term WNA features 2.SWN tends to positive for long- term WNA features

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Whissel Lexical Resource Evaluation Lexical Overlap SentiWordNet WNA General Inquirer

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Lexical Overlap  WNA feature division: Short-termLong-term NegativePositive PhysicalCognitive More activeLess active InternalExternal Less abstractMore concrete

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Lexical Resource Evaluation Some conclusions  The lexica:  Are quite consistent  Can be used in combination  SentiWN: Largely unexplored territory

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline  Current psychological theory of emotion  Evaluation of lexical “emotion” resources  Corpus analysis of language of “emotion” General Language

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Emotion in General Language Corpus Study Aims  Does “emotion” constitute a distinct sub- language?  Is there a polarity bias in General Language? (the Polyanna Hypothesis of Boucher and Osgood)  What is the impact of using different lexica?

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis The Data BNC  100 million words  Balanced, broad corpus

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Methodology Is emotion a distinct sub-language?  Examine distribution type  Examine distribution spread  Bootstrap sampling distribution

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Distribution Type  Zipfian: BNC=Emotion Lexica

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Distribution shape  Comparison of means: student t-test BNC ≠ Emotion Lexica (p<0.000)  Different sample means  5-30 times more frequent than gen. language  Assumptions of test?

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Bootstrap Sampling Distribution  Are sentiment-bearing terms a statistically distinct and highly frequent subset of English?  1000 random samples of terms from BNC  Sample size = size of sentiment lexicon  H0: Observed sample falls inside within 95% of bootstrap random sampling distribution of means

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Bootstrap Sampling Distribution  Are sentiment-bearing terms a statistically distinct and highly frequent subset of English?  For all lexica:  Mean term frequency of lexicon well outside 95%  Sentiment lexica are not representative of BNC (p<0.05)

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Sentiment Features  Is there a polarity bias in General Language?  Positive polarity bias  Statistically significant for all lexica (χ2 test of independence)

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Sentiment Features  Is there a polarity bias in General Language when you include intensity of polarity?  Positive polarity bias  Statistically significant for all lexica  χ2 = 158.5, df=1, p< for General Inquirer  χ2 = 63.6, df=1, p< for Whissel

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Corpus Analysis Some conclusions  Sentiment-bearing terms are a distinct subset of English  Positive polarity bias in BNC  General Inquirer and Whissel:  Low coverage and high frequency  SentiwordNet:  Wide coverage and much lower frequency

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Outline  Current psychological theory of emotion  Evaluation of lexical “emotion” resources  Corpus analysis of language of “emotion” Comparative

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Aims  Examine affective term use  Identify statistically different distributions  Is there a dominant feature/polarity?

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis The Data  Financial Language  2 million words  On-line financial news: Reuters, CNN, Bloomberg Newspapers  General Language BNC  100 million words  Balanced, broad corpus

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis The Data  BNC sub-corpora  Imaginative written English  16 million words  Informative written English  70 million words

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Methodology  Compare proportions of Sentiment Features  χ 2 Test of Independence  H 0 : π FinCorpus = π BNC

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Methodology  Statistical significance of different proportion  χ 2 >  p >=  Features:  41 Lexicon Sentiment Features from 4 lexica  Frequency per million words

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Financial Corpus  WRT Imaginative: More affective terms  WRT Informative: Many more affective terms  WRT BNC:  Dependent on feature type  Distributions are statistically distinct

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Positive GI Features

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Positive GI Features

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Negative GI Features

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Negative GI Features

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Comparative Corpus Analysis Negative GI Features

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Some conclusions  Lexical resources for sentiment are consistent  Financial news is a sub-language:  Affective content is statistically distinct relative to general language  Text polarity is asymmetric, positive skew  Different skews for different domains

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Something to think about  If different language varieties and domains have distinct use of sentiment terms and their own polarity bias:  Individual sentiment values are not informative  So what do we need??

Wednesday 18 February 2009 Ann Devitt Trinity College Dublin Thank You!