Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

ECG Signal processing (2)
An Introduction of Support Vector Machine
RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
An Introduction to the new course: Language and Literature A1.2.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
An Introduction of Support Vector Machine
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Optimizing RDF Chain Queries using Genetic Algorithms DBDBD 2010 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus University.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Towards Ad-hoc Situation Determination Graham Thomson, Paddy Nixon and Sotirios Terzis.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Welcome to English III Wednesday Week Determine the meaning of unknown words using textual clues.
Sentiment Analysis of Social Media Content using N-Gram Graphs Authors: Fotis Aisopos, George Papadakis, Theordora Varvarigou Presenter: Konstantinos Tserpes.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Kim Schouten, Flavius Frasincar, and Rommert Dekker
Aspect-Based Sentiment Analysis Using Lexico-Semantic Patterns
Erasmus University Rotterdam
A Unifying View on Instance Selection
iSRD Spam Review Detection with Imbalanced Data Distributions
Presentation transcript:

Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University Rotterdam Daniella Bal Erasmus University Rotterdam Alexander Hogenboom Erasmus University Rotterdam July 12, 2012

Introduction (1) The Web offers an overwhelming amount of multi- lingual textual data, containing traces of sentiment Information monitoring tools for tracking sentiment are crucial for today’s businesses KMO

Introduction (2) Language-specific sentiment analysis typically produces sentiment scores Sentiment scores are incomparable across languages as they originate from language-specific phenomena Star ratings are universal classifications of people’s intended sentiment and thus enable comparison As star ratings are not always available, we need a mapping from language-specific scores of sentiment conveyed by words to universal star ratings KMO

Cross-Language Sentiment Analysis (1) Sentiment analysis is typically focused on determining the polarity of natural language text Applications exist in summarizing reviews, determining a general mood (consumer confidence, politics) State-of-the-art approaches classify polarity of natural language text by analyzing vector representations using, e.g., machine learning techniques Alternative approaches are lexicon-based, which renders them robust across domains and texts and enables linguistic analysis at a deeper level Challenge: other languages require other approaches KMO

Cross-Language Sentiment Analysis (2) Typical solutions: –Machine translation into reference language –Lexicon creation for new languages –Construction of new sentiment analysis frameworks Most approaches yield incomparable sentiment scores Sentiment-carrying text is an artifact of the use of language rather than a culture-free analytical tool Capturing the relation between the use of language and universal star ratings reflecting intended sentiment is crucial for cross-language sentiment analysis KMO

From Sentiment Scores to Star Ratings (1) Assumptions: –Star ratings are universal classifications of intended sentiment –Classes are defined on an ordinal scale –Higher language-specific sentiment scores are associated with higher star ratings We model the relation between sentiment scores and star ratings as a monotonically increasing step function We thus construct language-specific sentiment maps, where subsets of texts with (largely) similar star ratings are separated by boundaries KMO

From Sentiment Scores to Star Ratings (2) The challenge is to find optimal set of boundaries, minimizing the total number of misclassifications Boundaries must be non-overlapping and ordered Non-trivial optimization can be performed by means of: –Greedy algorithms –Machine learning –Brute force KMO

Evaluation (1) Exploration of how sentiment scores can be converted into universal star ratings and how such mappings differ across languages (10-fold cross-validated) Data set of short movie reviews in Dutch (1,759) and English (46,315), rated by their authors, with normally distributed ratings over five star classes Sentiment analysis by means of an existing lexicon- based multi-lingual framework, yielding language- specific sentiment scores in the interval [-1.5, 1.5] Paired observations of sentiment scores and star ratings can be used to construct sentiment maps by means of a brute force approach (step size 0.1) KMO

Evaluation (2) KMO

Evaluation (3) Sentiment map for Dutch: Sentiment map for English: Dutch sentiment map yields a star rating classification accuracy for Dutch documents of about 20% Star rating classification accuracy for English documents equals about 40% when using the English sentiment map KMO

Conclusions We propose to compare the sentiment conveyed by words in unstructured text across languages through universal star ratings for intended sentiment The way in which words in natural language text reveal people's intended sentiment differs across languages Language-specific sentiment scores can separate universal classes of intended sentiment from one another only to a limited extent KMO

Future Work Allow for a non-linear relation between sentiment scores and star ratings Drop the monotonicity constraint Assess other proxies for star ratings, e.g., emoticons or rhetorical structure KMO

Questions? Alexander Hogenboom Erasmus School of Economics Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands KMO