Textural sentiment in finance

Slides:



Advertisements
Similar presentations
Business Development Suit Presented by Thomas Mathews.
Advertisements

Biomarkers Data Center Product Overview Partnership between DMS Data Systems and Cambridge Healthtech Institute.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Web Intelligence Text Mining, and web-related Applications
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Text Categorization Moshe Koppel Lecture 1: Introduction Slides based on Manning, Raghavan and Schutze and odds and ends from here and there.
Farag Saad i-KNOW 2014 Graz- Austria,
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Social Media Intro to Business & Marketing. The most three most trusted forms of advertising are: Recommendations from people I know - 90% Consumer opinions.
Information Retrieval in Practice
Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian.
Ch 4: Information Retrieval and Text Mining
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Overview of Search Engines
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Increasing HG awareness on the web. Aim “cost-effective use of the internet to increase awareness, understanding and take-up of Human Givens ideas”
SUPERIOR HONDA OF OMAHA Social Media Strategy. Research – By the Numbers DealershipFacebookTwitterYouTube Woodhouse910N/A19,100 H&H Chevrolet7601,47321,340.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Text Classification, Active/Interactive learning.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Ontology-Based Information Extraction: Current Approaches.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Emily Puleston. Wordpress is a free blogging website It is the #1 Content Management System site today First released in May, 2003 Has been downloaded.
SOCIAL MEDIA The Value of Social Networks in Advocacy By: Rachel A. Adler #Sorrow2Strength.
All Words Are Not Made Equal Sid Bhattacharyya John Pendergrass Nitish Ranjan Sinha Presented by Nitish Ranjan Sinha 1Nitish R. Sinha.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
LECTURE 10: TEXT AS DATA April 13, 2015 SDS 136 Communicating with Data Portions of this slide deck adapted from J.Chuang University of Washington.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.
Internet Reporting: Availability of Financial and Environmental Information for the Environmentally Responsible Investor Kathy Lancaster, Cal Poly – San.
Thomson Reuters’ Solution for Triple Ranking in the FEIII Challenge
Information Retrieval in Practice
A Simple Approach for Author Profiling in MapReduce
CSCE 590 Web Scraping – Information Extraction II
Sentiment Analysis of Twitter Messages Using Word2Vec
Large-Scale Content-Based Audio Retrieval from Text Queries
The Internet Industry Week Two.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
School of Computer Science & Engineering
Review on financial document sentiments
Topic 5: Online Communities Press F5 to view!
University of Computer Studies, Mandalay
MID-SEM REVIEW.
Basic Information Retrieval
Text Categorization Assigning documents to a fixed set of categories
Deep Learning Research & Application Center
Hong Kong English in Students’ Writing
Statistical n-gram David ling.
Text Mining & Natural Language Processing
Introduction to Text Analysis
Mining Anchor Text for Query Refinement
Ngram frequency smooting
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Word embeddings (continued)
10-K filing annual report word and document statistics
Clinically Significant Information Extraction from Radiology Reports
INF 141: Information Retrieval
Introduction to Sentiment Analysis
Text Analytics Solutions with Azure Machine Learning
Big Data Big Data first appeared towards the end of the 1990’s and has become a buzz word in the last few years.
Natural Language Processing Is So Difficult
Presentation transcript:

Textural sentiment in finance Text sources used in textural sentiment Some sentiment methods David Ling 17-7-2017

Textural sentiment in finance - Sources Many previous works (Colm Kearney listed about 40 papers) Sources for stock prediction: 3 main types (Colm Kearney 2015) Corporation-expressed sentiment corporate annual reports, earning press releases, earning conference calls Li (2006), Feldman et. al(2008), Li(2010), Loughran and McDonald(2011), … Twitter (Gabriele 2015) Media-expressed sentiment Newspapers, the wall Street Journal and the New York Times (Tetlock 2007) Reuters newscope sentiment engine Internet-expressed sentiment Messages posted on Yahoo!Finance (Antweiler 2004)

Textural sentiment in finance - Sources Comments Corporate reports: directly related to the firm (firm specific) Not ideal for time-series modeling (low frequency, quarterly, annually) HKEX, WRDS News: Usually hindsight rather than foresight More frequent, suitable for weekly or daily prediction Online comments and messages: Little new information (incremental to public news) Noisy and less reliable

Textural sentiment in finance - Sources Hong Kong company anural reports are available on HKEX One of the financial centers in the world Data can be used for initial trials or start Easy to acquire and understand, companies are more familiar (compared to WRDS) Annual report links are extracted for all companies (urllib + beautifulsoup)

Textural sentiment in finance - methods Dictionary-based Detecting keywords in user defined word list (bag-of-words) Usual dictionaries : Harvard psychosocial dictionary Loughran and McDonald’s positive and negative financial word lists (Loughran and McDonald 2011) DICTION (software) Machine learning neural network (Reuters NewScope Sentiment Engine) SVM (Gabriele Ranco et al 2015)

Dictionary-based: Harvard psychosocial dictionary Html on the official website One example category: Positive, negative, strong, week, active, passive Tagged with part of speech: Noun, verb, adjective

Dictionary-based: Loughran and McDonald’s positive and negative financial word lists In financial view (more accurate) positive, uncertainty, litigious, strong modal, and weak modal Downloaded word list from the official site WRDS provides word list counting

Dictionary-based: DICTION Dictionary based, text analysing software, using 33 dictionaries  Certainty – resoluteness, inflexibility, and completeness and a tendency to speak ex cathedra.  Activity – movement, change, the implementation of ideas and the avoidance of inertia.  Optimism – endorsing some person, group, concept or event, or highlighting their positive entailments.  Realism – describing tangible, immediate, recognizable matters that affect people’s everyday lives.  Commonality – highlighting the agreed-upon values of a group and rejecting idiosyncratic modes of engagement.

ML: The Effects of Twitter Sentiment on Stock Price Returns (Gabriele Ranco et al 2015) Supervised learning using Support Vector Machine (SVM) 15 months, Twitter volume and sentiment about 30 stock companies (eg. McDonald’s, Visa, Coca-colar) Over 100,000 tweets were labeled by 10 financial experts with three sentiment labels: negative, neutral, or positive Tokenization, Lemmatization, n-gram construction Lemmatization: (eg. had -> have, takes -> take) Bag-of-words, using unigram and bigrams as the feature set, with Term Frequency Inverse Document Frequency (TFIDF) weighting scheme

ML: The Effects of Twitter Sentiment on Stock Price Returns (Gabriele Ranco et al 2015) Corpus: Document 1: “cat sat mat” Document 2: “cat hate cat” Unigram and bigram feature vector (Term Freq. = count doc length ) Weighted by IDF = log ( no. of documents no. of documents with that term ) Eg. IDF(cat) = log(2/2)=0 (no extra information) Eg. IDF(hate) = log(2/1) = 0.3 . (Term Freq.) cat sat mat hate Cat sat Sat mat Cat hate Hate cat Document 1 1/3 1/2 Document 2 2/3 (TFIDF) cat sat mat hate Cat sat Sat mat Cat hate Hate cat Document 1 0.1 0.15 Document 2

ML: The Effects of Twitter Sentiment on Stock Price Returns (Gabriele Ranco et al 2015) Many other kinds of TFIDF Term weighting “has an enormous impact on the effectiveness of a retrieval system (Jurafsky and Martin 2009, p. 771) nt: number of document with that term N: total number of documents in corpus From wikipedia

∅ 𝑊 = 1 𝑛 𝑖 𝑛 max⁡(0,1− 𝑦 𝑖 (𝑊 ∙ 𝑋 𝑖 −𝑏𝑖𝑎𝑠)) + regulatory_term ML: The Effects of Twitter Sentiment on Stock Price Returns (Gabriele Ranco et al 2015) Feature vectors: X1 = [0,0.1,0.1,0,0.15,0.15,0,0] X2 = [0,0,0,0.1,0,0,0.15,0.15] SVM loss function (find the weight vector W to minimize below): ∅ 𝑊 = 1 𝑛 𝑖 𝑛 max⁡(0,1− 𝑦 𝑖 (𝑊 ∙ 𝑋 𝑖 −𝑏𝑖𝑎𝑠)) + regulatory_term n: no. of data sets, 𝑦 𝑖 = ±1 (belong to the class or not) SVM probability (one against all): 𝑝 𝑐𝑙𝑎𝑠𝑠 𝑋 = exp⁡( 𝑊 𝑐𝑙𝑎𝑠𝑠 𝑋+ 𝑏𝑖𝑎𝑠 𝑐𝑙𝑎𝑠𝑠 ) 𝑐𝑙𝑎𝑠𝑠′ exp⁡( 𝑊 𝑐𝑙𝑎𝑠 𝑠 ′ 𝑋+ 𝑏𝑖𝑎𝑠 𝑐𝑙𝑎𝑠𝑠 ′) loss function is roughly the distance of the misclassifying point from the separating line

ML: Reuters NewScope Sentiment Engine Commercial product (Non-free) 3-layer neural network (at this moment still cannot find what kinds of feature used) Output: Positive, negative, or neural Tagged with the related company Use Reuters global news, and scan across 35,000 companies in real time

References Colm Kearney and Sha Liu, Textual sentiment in finance: A survey of methods and models, International Review of Financial Analysis, Volume 33, Pages 171-185 (May 2014) TIM LOUGHRAN and BILL MCDONALD, When Is a Liability Not a Liability? THE JOURNAL OF FINANCE, VOL. LXVI, NO. 1 (2011) Ranco G, Aleksovski D, Caldarelli G, Grčar M, Mozetič I, The Effects of Twitter Sentiment on Stock Price Returns, PLoS ONE, 10(9): e0138441 (2015)