Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis Jiang Fei State Key Laboratory.

Slides:



Advertisements
Similar presentations
The Extended Cohn-Kanade Dataset(CK+):A complete dataset for action unit and emotion-specified expression Author:Patrick Lucey, Jeffrey F. Cohn, Takeo.
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Sentiment Analysis on Twitter Data
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Scalable Text Mining with Sparse Generative Models
Lie Detection using NLP Techniques
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Facial Feature Detection
Performance Evaluation of Grouping Algorithms Vida Movahedi Elder Lab - Centre for Vision Research York University Spring 2009.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Webpage Understanding: an Integrated Approach
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Tsinghua University 1 Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation Wei Qiao,
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Sentiment Analysis of Social Media Content using N-Gram Graphs Authors: Fotis Aisopos, George Papadakis, Theordora Varvarigou Presenter: Konstantinos Tserpes.
Sentiment and Affect analysis of Dark Web Forums: Measuring Radicalization on the Internet Hsinchun Chen, Fellow, IEEE.
1 Emotion Classification Using Massive Examples Extracted from the Web Ryoko Tokuhisa, Kentaro Inui, Yuji Matsumoto Toyota Central R&D Labs/Nara Institute.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Copyright  2009 by CEBT Meeting  Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적  정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정 수식 설명 추가 요구  STFSSD 발표자료.
Software Quality in Use Characteristic Mining from Customer Reviews Warit Leopairote, Athasit Surarerks, Nakornthip Prompoon Department of Computer Engineering,
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
CSC 594 Topics in AI – Text Mining and Analytics
1 Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA IBM Research, Tokyo.
A New Threat Evaluation Method Based on Cloud Model Wang Bailing 1*, Guo Shi 1, Qu Yun 1, Wang Xiaopeng 1, Liu Yang 1 1 Harbin Institute of Technology,
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Sentiment Analysis on Tweets. Thumbs up? Sentiment Classification using Machine Learning Techniques Classify documents by overall sentiment. Machine Learning.
A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.
Automatically Labeled Data Generation for Large Scale Event Extraction
Mark Cieliebak Jan Deriu Dominik Egger Fatih Uzdilli
Topical Authority Detection and Sentiment Analysis on Top Influencers
University of Computer Studies, Mandalay
Sentiment Analysis Study
The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression By: Patrick Lucey, Jeffrey F. Cohn, Takeo.
Yunzhi Tan, Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma
Text Mining & Natural Language Processing
Presentation transcript:

Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis Jiang Fei State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University

Outline Introduction Main work Sentiment lexicon construction Feature extraction Classification Experiments Conclusion Future work

Introduction Objective Automatically sentiment lexicon construction. Doc-level classification: positive, negative and neutral. Existing problems & Solutions Limited coverage of human constructed sentiment lexicons (automatically lexicon construction). Lack of labeled data (using emoticon signals, or use noisy data provided by some websites) Our contribution No need for large amount of neutral corpora Using proper emoticons Every word has potential sentiment Multi-view of features

Main work Sentiment lexicon construction based on emoticons Feature extraction based on sentiment lexicon Sentiment classification

Main work Sentiment lexicon construction based on emoticons Feature extraction based on sentiment lexicon Sentiment classification

Investigation on emoticons Statistics of quantity distribution with emoticons: ~32% With one emoticon: ~18% With more than one emoticons: ~14%

Investigation on emoticons Statistics of sentiment distribution

Approach I: Label Propagation Based on our previous work: Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. AIRS’11 (2011)

Approach II: Frequency Statistics for Sufficient Corpus

OOV/phrase extraction Word segmentation n-gram Concatenate adjacent words To reduce computation complexity , n<=4 Compute two metrics Motivation: 说真的,这款手机太次了,不给力!

Sentiment lexicon construction 60,000 words/OOVs/phrases/emoticons in total

Main work Sentiment lexicon construction based on emoticons Feature extraction based on sentiment lexicon Sentiment classification

Feature extraction Microblog structure features Number of mentioning labels Number of URLs Number of hashtags … Sentence structure features Number of“ ; ” Number of“%” Existence of continuous serial numbers …

Feature extraction Word segmentation/part-of-speech tagging Negations Constructed a negation list A negation word modifies the first v/a/p after it Invalidation window Greedy longest match 这 /rzv 位 /q 先生 /noun , /wd 您 /rr 真 /d 是 /vshi 站 /n 着 /uzhe 说 /v 话 /n 不 /d 腰 /n 疼 /v [ 鄙视 ] 这位先生,您真是站着说话不腰疼 [ 鄙 视 ] 真、是、站、 着、说、话、 不、腰、疼、 您、真是、站 着、说话、腰 疼、这位、先生 [ 鄙视 ] 这位,先生,您,真是,站着,说话,腰疼 (-1) , [ 鄙视 ]

Feature extraction Sentiment lexicon features (Maximum, Product) of (positive, negative) score of words/phrases Emoticon features (Maximum, Product) of (positive, negative) score of emoticons MDA (Modified by degree adv) features (Maximum, Product) of (positive, negative) score of MDA

Main work Sentiment lexicon construction based on emoticons Feature extraction based on sentiment lexicon Sentiment classification

Sentiment classification with SVM One-stage three-class classification (libsvm) Two-stage two-class classification (hierarchical) neutral VS non-neutral positive VS negative Two-stage two-class classification (parallel) positive VS non-positive negative VS non-negative

Experiments – Lexicon construction Define lexicon error rate as Explanation The frequency of a word. The degree of sentiment bias of a word. Labeled words from 《学生褒贬义词典》

Experiments – Lexicon construction

Experiments – Sentiment classification Dataset NLP&CC 2013 evaluation, task II, sample data Preprocess positive (happiness, like) negative (sadness, anger, disgust) neutral (none) fear and surprise discarded Size 968 for each class, a balanced set

Experiments – Sentiment classification

Method Ⅰ : Our lexicon replaced with “ 情感词汇本体 ” Method Ⅱ : Barbosa, etc [2010]. Our model almost(-0.1%) performs the best in related task of COAE 2013 Experiments – Sentiment classification

Conclusion Sentiment lexicon construction Different strength of emoticon signals Every term has potential sentiment No need for large amount of neutral corpus Sentiment features Different, multi-views of microblog’s characteristics

Further work Large amount of noisy neutral corpora may help e.g. Output of current classifier Syntactic/Semantic features Relation between words (i.e. skip gram)

References Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Coling 2010: Posters. pp. 36–44. Beijing, China (2010) Cui, A., Zhang, M., Liu, Y., Ma, S.: Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. In: Proceedings of the 7th Asia conference on Information Retrieval Technology. pp. 238–249. AIRS’11 (2011) Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC. vol (2010) Zhang, W., Liu, J., Guo, X.: Positive and Negative Words Dictionary for Students. Encyclopedia of China Publishing House (2004) Chang, C.C., Lin, C.J.: Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (May 2011) Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Coling 2010: Posters. pp. 36–44. Beijing, China (2010) Xu, L., Lin, H., Pan, Y., Ren, H., Chen, J.: Constructing the affective lexicon ontology. Journal of the China Society for Scientific and Technical Information 27(2), 180–185 (2008)

Thanks!