VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

Strategic decision making with exploratory search Toby Mostyn CTO Polecat.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Dynamic Network Visualization in 1.5D
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
A Presentation for the Enterprise Architect © 2008 IBM Corporation IBM Technology Day - SOA SOA Governance Miroslav Petrek IT Software Architect
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Information Retrieval in Practice
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Search Engines and Information Retrieval
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Information Retrieval in Practice
REACTION REACTION Workshop Overview Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd sourced Content Author - Anindya Ghose, Panagiotis G.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Nan Yang Chinese Terminologist Microsoft Language Excellence Shanghai, August 2008.
Understanding Text Corpora with Multiple Facets Lei Shi, Furu Wei, Shixia Liu, Xiaoxiao Lian, Li Tan and Michelle X. Zhou IBM Research.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Search Engine Architecture
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
© 2007 IBM Corporation SOA on your terms and our expertise Software WebSphere Process Server and Portal Integration Overview.
© 2012 IBM Corporation Introducing IBM Cognos Insight.
© Copyright 2008 STI INNSBRUCK TrustYou Ioan Toma.
VAST 2010 Mini Challenge #1 Award: VisWorks Text and Network Visual Analytics Lei Shi, Weihong Qian, Furu Wei and Li Tan IBM Research - China Visualizations.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Information Retrieval in Practice
Taking a Tour of Text Analytics
Information Retrieval (in Practice)
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
Search Engine Architecture
Thawatchai Piyawat Jantawan Noiwan Anthony F. Norcio
MID-SEM REVIEW.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Extraction, aggregation and classification at Web Scale
Aspect-based sentiment analysis
iSRD Spam Review Detection with Imbalanced Data Distributions
CSE 635 Multimedia Information Retrieval
Search Engine Architecture
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

VISA: A VIsual Sentiment Analysis System Sept Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J. Watson Research Center 3 Institute of Software Chinese Academy of Sciences 4 Tsinghua University

2 What is Sentiment Analysis Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials From Wikipedia A survey of sentiment analysis works by Pang and Lee in 2008: “Opinion mining and sentiment analysis”, cited 1189 times in Google Scholar, including 326 references A probably earliest study:

3 Motivation The truth: sentiment analysis is becoming even more important – Corporate * Brand analysis, sales campaign design, etc. * Crisis relationship management – Government As we all know.. Observations: – Sentiment analysis technologies are going deeper and versatile: * Aspect-oriented, domain-specific lexicon expansion, MT technology – The average users are still leveraging rather simple sentiment results It’s hard for them (even domain expert) to understand sophisticated SA results – There is big gap and huge potential for sentiment visualization (visual opinion mining)

4 Agenda Related Works Research Problem and Challenges Sentiment-Tuple based Data Model VISA System Framework Visualization Optimizations Cases User Studies Summary

Basic Sentiment Representation Raw text/table or simple visualization

Brand Association Map

COBRA (COrporate Brand and Reputation Analysis) Behal et al. (HCI 2009)

Opinion Observer Liu et al. (KDD 2005); Liu et al. (IW3C2 2005)

Visual Sentiment Analysis of RSS News Feeds Wanner et al. (VISSW 2009)

Pulse: Mining Customer Opinions from Free Text Gamon et al. (IDA 2005)

Visualizing Sentiments in Financial Texts Ahmad and Almas (IV2005)

Visual Analysis of Conflicting Opinions Chen et al. (VAST 2006)

Who Votes For What? A Visual Query Language for Opinion Data Draper and Riesenfeld (Vis 2008)

Visual Opinion Analysis of Customer Feedback Data Summary Report of printers Scatterplot of customer reviews on printers Circular Correlation Map Oelke et al. (VAST 2009)

OpinionSeer: Interactive Visualization of Hotel Customer Feedback Wu et al. (InfoVis 2010)

Taking the Pulse of the Web: Assessing Sentiment on Topics in Online Media Brew et al. (WebSci 2010)

Understanding Text Corpora with Multiple Facets Shi et al. (VAST 2010)

18 Research Problem Can we design a sentiment visualization system that: – Show how the sentiment evolves over time (trend) – Visualize both the sentiment analysis results and the structured facet data, e.g. profile of the reviewer (facet) – Rather than only showing which document or feature tends to be positive or negative, also demonstrate how the positives/ negatives are described in documents (context) Most existing sentiment visualization fails to meet all the requirements simultaneously – Our VISA design is based on the TIARA prototype, which already brings together most features (trend, context, facet switching)

19 Retrospect on TIARA Visualization (Emergency Room Record)

20 Challenges for TIARA Sentiment Visualization Failure of the document trend visualization – Binary/ternary/scored classification of document-level sentiments will drop valuable pieces BUT: It has BED BUGS and they BITE me!!!

21 Challenges for TIARA Sentiment Visualization Keyword Summarization – Content visualized are keywords summarized from all the text, not echoing the sentiment-centric design Structured Facet – Sentiment-aware facet associations and distributions – Spatial (location) information Comparison – Categorical, temporal comparison, and sentiment comparison as well Compatibility with sentiment analysis engines – Consumability of all kinds of sentiment analysis results

Sentiment Tuple {Aspect, feature, opinion, polarity} – Aspect: a sub-topic shared by some document In a hotel review, the room, the view, or the service – Feature: specific object the users are commenting Entity, person, location, or abstract concepts – An opinion is a particular word or phrase describing a feature – Polarity of the opinion word/phrase in the context …… Sentiment Analysis Model aspect: feature: opinion: polarity …… aspect: feature: opinion: polarity …… aspect: feature: opinion: polarity …… { “view”, + } Aggregate

Keyword Summarization (TIARA) A set of topics {T 1, …T i,… T N } A set of keywords {W 1, …, W j, …, W M } A set of topic probabilities {…, P(T i | D k ), …} A set of word probabilities {…, P(W j | T i ), …} kth document in the collection Rank the topics to present most valuable ones first Select keyword sub-set for each time segment for content summary {…} t-1, {…, W j, …} t, {…} t+1,

VISA Sentiment Keyword Summarization {C 1, …C i,… C N } A set of sentiment keywords (opinions/features) {W 1, …, W j, …, W M } A set of topic probabilities {…, P(T i | D k ), …} A set of word probabilities {…, P(W j | T i ), …} kth document in the collection Let user select to compare aspects of a hotel or an aspect of several hotels Select keyword sub-set for each time segment for sentiment summary {…} t-1, {…, W j, …} t, {…} t+1, Aspects/Hotels

VISA Mashup Visualization Sentiment Tuple Trend Sentiment Tuple Trend Facet Correlations Facet Correlations Sentiment Snippets Sentiment Snippets Search Sentiment- Centric Document Ranking Sentiment- Centric Document Ranking Filters

26 VISA Sentiment Visualization Framework Offline: – Document pre-processing – Sentiment analysis – Meta data parsing – Indexing Online: – Data Retrieval – Visualization – Interactions

Offline Analysis Raw Data Reader Extractor StatisticManager Dictionary IndexWriter Index Meta Data Sentiment Data Segment Extractor Sentence Extractor Text Extractor Entity Policy Filter OpenNLP Sentiment Entity Class No/Not aspect: feature: opinion: polarity Data Analysis Framework

Offline Analysis Raw Data Reader 3 rd Party Sentiment Analysis Framework IndexWriter Index Meta Data Sentiment Data aspect: feature: opinion: polarity

Data Server Query Parser Data Retrieval Lucene Hermes Index HttpServlet VISA Data Adapter

Sentiment Trend Optimizations Sentiment tuple based negative/positive/(neutral) trends Positive Negative Y axis: sentiment value X axis: time Time Sensitive Feature/Opinion words

Sentiment-Centric Interactions

32 Case Study ---- Summarizing Hotel Reviews Initial View

33 Case Study ---- Summarizing Hotel Reviews Switch to ”Family” type only (traveling in this type)

34 Case Study ---- Summarizing Hotel Reviews Click on the “Free” sentiment word (want to enjoy the free time or free breakfast?) It’s 30 min distance from the harbor!

35 Case Study ---- Summarizing Hotel Reviews For two selected hotels Drill down to the “cleanliness” and “room” aspects Switch to the negative sentiments

36 Case Study ---- Summarizing Hotel Reviews Comparing the recent reviews

37 Case Study ---- NFL on Twitter Crawling tweets from Twitter on the topic of National Football League (NFL), from 03/2011 to 08/2011. (when the famous lock out happened) tweets from users, with an average length of 16.8 words. Tweet collection pre-processing: – Classify into 5 content topics: “season play”, “player draft”, “lockout bad”, “lockout end” and “football return”. – Categorize according to the subject of the sentiments – 32 NFL teams, by manually creating relevant subject keyword list for each team (full/nick name, city, stadium, head, owner and super stars)

38 Case Study ---- NFL on Twitter Overview of sentiments on content topics – Reach peak in July when the new CBA signed

39 Case Study ---- NFL on Twitter Subject-comparing view on 4 NFL Teams – “Green Bay Packers”, “Pittsburgh Steelers”, “New York Jets”, “New England Patriots” – A very large RED “CBA” for the Steelers: the only team to vote “NO” to CBA – “Brett Favre” for the Packers: the former NFL all-star quarterback in Packers, who has claimed to return for several times. The fans are tired of the similar news at all.

40 User Study ---- Setup Subject – VISA System with all functionalities – TripAdvisor.com – A plain text editor with search function Data – HK hotel cases with 3 hotels’ reviews – Both structured (ratings) and unstructured (review comments) data inputs User – 12 users (7 male, 5 female), age 26~35 – Each is given a gift as incentive Task – TI: look up specific sentiment-related information of a hotel (e.g. traveler’s ratings). – T2: summarize opinions on a general aspect of a hotel (e.g. the view of a hotel) Procedure – Within-subject design: user perform all tasks with all the systems – Record user demographics, time of completion and satisfactions and open-ended questions TripAdvisor Text Editor VISA

41 User Study ---- Objective Results Three metrics: Elapsed time (in minutes), task completion rate and task correctness. Significant advantages of VISA over the compared systems (t-test significance p< 0.004~ 0.034)

42 User Study ---- Subjective Results Three metrics: Usefulness, userability and satisfaction.

43 User Study ---- Open Surveys Why VISA is thought better than the baseline systems: –“mash-up visualizations” and “rich interactions” –“Mash-up visualizations provide more information and it’s quite intuitive”, “rich interactions make it easy to search what I want to know” –Improvements to VISA: “it now needs some learning efforts to use VISA”, “It could introduce better UI design and richer interactions”.

44 Summary We have presented the VISA system for generic sentiment visualization purpose – The backend core is the new sentiment-tuple definition, as well as the faceted data model – In visualization, we introduce several critical optimizations over TIARA in sentiment visualization scenarios: sentiment-tuple based trending, sentiment keywords, comparison, sentiment in document context, interactions – Evaluated with two real-life case studies – Conduct formal user study to compare with two baseline systems and demonstrate the clear advantage

45 Thank You Merci Grazie Gracias Obrigado Danke Japanese English French Russian German Italian Spanish Brazilian Portuguese Arabic Traditional Chinese Simplified Chinese Hindi Tamil Thai Korean