Opinion Mining and Sentiment Analysis

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
LINGUISTICA GENERALE E COMPUTAZIONALE SENTIMENT ANALYSIS.
Sentiment Analysis and Opinion Mining
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Sentiment Analysis Bing Liu University Of Illinois at Chicago
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Anindya Ghose Panos Ipeirotis Arun Sundararajan Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation.
CSE 538 Bing Liu Book Chapter 11: Opinion Mining and Sentiment Analysis.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
807 - TEXT ANALYTICS Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining)
Peiti Li 1, Shan Wu 2, Xiaoli Chen 1 1 Computer Science Dept. 2 Statistics Dept. Columbia University 116th Street and Broadway, New York, NY 10027, USA.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Data Mining By Archana Ketkar.
Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences Bing Liu Department of Computer Science University Of Illinois at Chicago
Mining and Searching Opinions in User-Generated Contents Bing Liu Department of Computer Science University of Illinois at Chicago.
A Holistic Lexicon-Based Approach to Opinion Mining
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
Chapter 11: Opinion Mining
Mining and Summarizing Customer Reviews
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Opinion Mining : A Multifaceted Problem Lei Zhang University of Illinois at Chicago Some slides are based on Prof. Bing Liu’s presentation.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Identifying Comparative Sentences in Text Documents
WSDM’08 Xiaowen Ding 、 Bing Liu 、 Philip S. Yu Department of Computer Science University of Illinois at Chicago Conference on Web Search and Data Mining.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
A Language Independent Method for Question Classification COLING 2004.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Chapter 11: Opinion Mining Bing Liu Department of Computer Science University of Illinois at Chicago
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Chapter 11: Opinion Mining Bing Liu Department of Computer Science University of Illinois at Chicago
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
CSE 538 Bing Liu Book Chapter 11: Opinion Mining and Sentiment Analysis.
Opinion Observer: Analyzing and Comparing Opinions on the Web
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
COMP423 Summary Information retrieval and Web search  Vecter space model  Tf-idf  Cosine similarity  Evaluation: precision, recall  PageRank 1.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.
Sentiment analysis algorithms and applications: A survey
Memory Standardization
University of Computer Studies, Mandalay
Sentiment Analysis.
Aspect-based sentiment analysis
Opinion Mining and Sentiment Analysis
An Overview of Concepts and Selected Techniques
Text Mining & Natural Language Processing
Presentation transcript:

Opinion Mining and Sentiment Analysis Slides from Bing Liu and Ronan Feldman

Introduction Two main types of textual information. Facts and Opinions Note: factual statements can imply opinions too. Most current text information processing methods (e.g., web search, text mining) work with factual information. Sentiment analysis or opinion mining computational study of opinions, sentiments and emotions expressed in text. Why opinion mining now? Mainly because of the Web; huge volumes of opinionated text.

Introduction – user-generated media Importance of opinions: Opinions are useful when making a decision, we want to hear others’ opinions. In the past, Individuals: opinions from friends and family businesses: surveys, focus groups, consultants … Word-of-mouth on the Web User-generated media: One can express opinions on anything in reviews, forums, discussion groups, blogs ... Opinions of global scale: No longer limited to: Individuals: one’s circle of friends Businesses: Small scale surveys, tiny focus groups, etc.

Sentiment analysis applications Businesses and organizations Benchmark products and services; market intelligence. Businesses spend a huge amount of money to find consumer opinions using consultants, surveys and focus groups, etc Individuals Make decisions to purchase products or to use services Find public opinions about political candidates and issues Ad placement: e.g. in social media Place an ad if one praises a product. Place an ad from a competitor if one criticizes a product. Opinion retrieval: provide general search for opinions.

A Fascinating Problem! Intellectually challenging & major applications. A popular research topic in recent years in NLP and Web data mining. 20-60 companies in USA alone It touches every aspect of NLP and yet is restricted and confined. Little research in NLP/Linguistics in the past. Potentially a major technology from NLP But “not yet” and not easy! Data sourcing and data integration are hard too!

Absract Problem Statement It consists of two parts Opinion definition What is an opinion? Opinion summarization Opinions are subjective. An opinion from a single person (unless a VIP) is often not sufficient for action. We need opinions from many people, and thus opinion summarization.

An Example Review “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” What do we see? Opinions, targets of opinions, and opinion holders

Entity and aspect/feature level Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive …” What do we see? Opinion targets: entities and their features Sentiments: positive and negative Opinion holders: persons who hold the opinions Time: when opinions were expressed

Two main types of opinions Regular opinions: Sentiment/opinion expressions on some target entities Direct opinions: “The touch screen is really cool” Indirect opinions: “After taking the drug, my pain has gone” Comparative opinions: Comparisons of more than one entity E.g., “iPhone is better than Blackberry” We focus on regular opinions first, and just call them opinions.

Entity and Aspect (Liu, Web Data Mining book, 2006) Definition (entity): An entity e is a product, person, event, organization, or topic. e is represented as a hierarchy of components, sub-components, and so on. Each node represents a component and is associated with a set of attributes of the component. An opinion can be expressed on any node or attribute of the node. For simplicity, we use the term aspects (features) to represent both components and attributes.

Opinion definition (Liu, a Ch. in NLP handbook) An opinion is a quintuple (oj, fjk, soijkl, hi, tl), where oj is a target object. fjk is a feature of the object oj. soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of object oj at time tl. soijkl is +ve, -ve, or neu, or a more granular rating. hi is an opinion holder. tl is the time when the opinion is expressed.

Our example in quintuples (iPhone, GENERAL, +, Abc123, 5-1-2008) (iPhone, touch_screen, +, Abc123, 5-1-2008)

Alternative terminology Entity is also called object. Aspect is also called feature, attribute, facet, etc Opinion holder is also called opinion source

Structure the unstructured Goal: Given an opinionated document, Discover all quintuples (oj, fjk, soijkl, hi, tl), i.e., mine the five corresponding pieces of information in each quintuple, and Or, solve some simpler problems E.g. classify the sentiment of the entire document With the quintuples, Unstructured Text  Structured Data Traditional data and visualization tools can be used to slice, dice and visualize the results in all kinds of ways Enable qualitative and quantitative analysis.

Sentiment Classification: doc-level (Pang and Lee, et al 2002 and Turney 2002) Classify a document (e.g., a review) based on the overall sentiment expressed by opinion holder Classes: Positive, or negative (and neutral) In the model, (oj, fjk, soijkl, hi, tl), It assumes Each document focuses on a single object and contains opinions from a single opinion holder. It considers opinion on the object, oj (or oj = fjk)

Subjectivity Sentence subjectivity: An objective sentence presents some factual information, while a subjective sentence expresses some personal opinions, beliefs, views, feelings, or emotions. Not the same as emotion

Subjectivity Analysis (Wiebe et al 2004) Sentence-level sentiment analysis has two tasks: Subjectivity classification: Subjective or objective. Objective: e.g., I bought an iPhone a few days ago. Subjective: e.g., It is such a nice phone. Sentiment classification: For subjective sentences or clauses, classify positive or negative. Positive: It is such a nice phone. However. (Liu, Chapter in NLP handbook) subjective sentences ≠ +ve or –ve opinions E.g., I think he came yesterday. Objective sentence ≠ no opinion Imply –ve opinion: My phone broke in the second day.

Rational and emotional evaluations Rational evaluation: Many evaluation/opinion sentences express no emotion E.g. “The voice on this phone is clear” Emotional evaluation E.g. “I love this phone” “The voice on this phone is crystal clear” (?) Some emotion sentences express no (positive or negative) opinion/sentiment E.g. “I am so surprised to see you”

Feature-Based Sentiment Analysis Sentiment classification at both document and sentence (or clause) levels are not sufficient, they do not tell what people like and/or dislike A positive opinion on an object does not mean that the opinion holder likes everything. An negative opinion on an object does not mean ….. Objective: Discovering all quintuples (oj, fjk, soijkl, hi, tl) With all quintuples, all kinds of analyses become possible.

Opinion Summary With a lot of opinions, a summary is necessary. A multi-document summarization task For factual texts, summarization is to select the most important facts and present them in a sensible order while avoiding repetition 1 fact = any number of the same fact But for opinion documents, it is different because opinions have a quantitative side & have targets 1 opinion ≠ a number of opinions Aspect-based summary is more suitable Quintuples form the basis for opinion summarization

Feature-Based Opinion Summary (Hu & Liu, KDD-2004) Feature Based Summary: Feature1: Touch screen Positive: 212 The touch screen was really cool. The touch screen was so easy to use and can do amazing things. … Negative: 6 The screen is easily scratched. I have a lot of difficulty in removing finger marks from the touch screen. Feature2: battery life Note: We omit opinion holders “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …” ….

Visual Comparison (Liu et al. WWW-2005) Summary of reviews of Cell Phone 1 Voice Screen Size Weight Battery + _ Comparison of reviews of Cell Phone 1 Cell Phone 2 _ +

Aspect-based opinion summary

Google Product Search

Comparing 3 GPSs on different features Each bar shows the proportion of +ve opinion

Demo 1: Detail opinion sentences You can click on any bar to see the opinion sentences. Here are negative opinion sentences on the maps feature of Garmin. The pie chart gives the proportions of opinions.

# of feature mentions People talked more about prices than other features. They are quite positive about price, but not bout maps and software.

Aggregate opinion trend More complains in July - Aug, and in Oct – Dec!

Sentiment Analysis is Challenging! “This past Saturday, I bought a Nokia phone and my girlfriend bought a Motorola phone with Bluetooth. We called each other when we got home. The voice on my phone was not so clear, worse than my previous phone. The battery life was long. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday.”

Senti. Analysis Requires solving several IE problems (oj, fjk, soijkl, hi, tl), oj - a target object: Named Entity Extraction (more) fjk - a feature of oj: Information Extraction soijkl is sentiment: Sentiment determination hi is an opinion holder: Information/Data Extraction tl is the time: Data Extraction Co-reference resolution Relation extraction Synonym match (voice = sound quality) … None of them is a solved problem!

Easier and harder problems Tweets from Twitter are probably the easiest short and thus usually straight to the point Stocktwits are much harder! (more on that later) Reviews are next entities are given (almost) and there is little noise Discussions, comments, and blogs are hard. Multiple entities, comparisons, noisy, sarcasm, etc Extracting entities and aspects, and determining sentiments/opinions about them are hard. Combining them is harder.

Extraction of competing objects The user first gives a few objects/products as seeds, e.g., BMW and Ford. The system then identifies other competing objects from the opinion corpus. The problem can be tackled with PU learning (Learning from positive and unlabeled examples) (Liu et al 2002, 2003). See (Li et al. ACL-2010)

Feature/Aspect-based sentiment analysis

Aspect-based sentiment analysis Much of the research is based on online reviews For reviews, aspect-based sentiment analysis is easier because the entity (i.e., product name) is usually known Reviewers simply express positive and negative opinions on different aspects of the entity. For blogs, forum discussions, etc., it is harder: both entity and aspects of entity are unknown, there may also be many comparisons, and there is also a lot of irrelevant information.

Find entities Although similar, it is somewhat different from the traditional named entity recognition (NER). E.g., one wants to study opinions on phones given Motorola and Nokia, find all phone brands and models in a corpus, e.g., Samsung, Moto.

Feature/Aspect extraction Extraction may use: frequent nouns and noun phrases Sometimes limited to a set known to be related to the entity of interest or using part discriminators e.g., for a scanner entity “of scanner”, “scanner has” opinion and target relations Proximity or syntactic dependency Standard IE methods Rule-based or supervised learning Often HMMs or CRFs (like standard IE)

Double Propagation Proposed in (Qiu et al. IJCAI-2009) Like co-training It exploits the dependency relations of opinions and features to extract features. Opinions words modify object features, e.g., “This camera takes great pictures” The algorithm bootstraps using a set of seed opinion words (no feature input). To extract features (and also opinion words)

This phone has good screen The DP method DP is a bootstrapping method Input: a set of seed opinion words, no aspect seeds needed Based on dependency tree (Tesniere 1959) This phone has good screen

Rules from dependency grammar

Group synonym features (Zhai et al. 2010) Features that are domain synonyms should be grouped together. Many techniques can be used to deal with the problem, e.g., Topic modeling, distributional similarity, etc We proposed a semi-supervised learning method Z. Zhai, B. Liu, H. Xu and P. Jia. Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints. COLING-2010.

Coreference resolution (Ding and Liu 2010) Different from traditional coreference resolution Important to resolve objects and features E.g.., “I bought a Canon S500 camera yesterday. It looked beautiful. I took a few photos last night. They were amazing”. Some specific characteristics of opinions can be exploited for better accuracy. See X. Ding and B. Liu, Resolving Object and Attribute Coreference in Opinion Mining. COLING-2010.

Coreference Resolution by Parse Analysis and Similarity Clustering (Attardi et al. 2010) best performing at SemEval 2010 Identifies mentions from dependency trees Uses eager classifier to cluster mentions Positive and negative instances are created by pairing each mention with each of the preceding ones Features extracted from pairs of mentions: Lexical: Same, Prefix, Suffix, Acronym … Distance: Edit, mention, token, sentence Syntax: same HeadPoS, pair of HeadPos Pair of counts of mention occurrences Same NE type For pronouns: type, pair of genders, pair of numbers

Accuracy at SemEval 2010 Mention CEAF B3 Catalan 82.7 57.1 64.6 German   Mention CEAF B3 Catalan 82.7 57.1 64.6 German 59.2 49.5 50.7 English 73.9 57.3 61.3 Spanish 83.1 59.3 66.0

Coreference resolution Method of (Lee, Peirsman et al. 2011), which was the best-performing in CoNLL-2011 Based on locating all noun phrases, identifying their properties, and then clustering them in several deterministic iterations (called sieves), starting with the highest-confidence rules and moving to lower-confidence higher-recall ones. Eager approach: any matching noun phrases with matching properties are immediately clustered together.

Identify opinion orientation For each feature, we identify the sentiment or opinion orientation expressed by a reviewer. Almost all approaches make use of opinion words and phrases. But notice again (a simplistic way): Some opinion words have context independent orientations, e.g., “great”. Some other opinion words have context dependent orientations, e.g., “small” Many ways to use opinion words. Machine learning methods for sentiment classification at the sentence and clause levels are also applicable.

Aggregation of opinion words (Ding and Liu, 2008) Input: a pair (f, s), where f is a product feature and s is a sentence that contains f. Output: whether the opinion on f in s is positive, negative, or neutral. Two steps: Step 1: split the sentence if needed based on BUT words (but, except that, etc). Step 2: work on the segment sf containing f. Let the set of opinion words in sf be w1, .., wn. Sum up their orientations (1, -1, 0), and assign the orientation to (f, s) accordingly. In (Ding et al, WSDM-08), step 2 is changed to with better results. wi.o is the opinion orientation of wi. d(wi, f) is the distance from f to wi.

Basic Opinion Rules (Liu, Ch. in NLP handbook) Opinions are governed by some rules, e.g., Neg  Negative Pos  Positive Negation Neg  Positive Negation Pos  Negative Desired value range  Positive Below or above the desired value range  Negative

Basic Opinion Rules (Liu, Ch. in NLP handbook) Decreased Neg  Positive Decreased Pos  Negative Increased Neg  Negative Increased Pos  Positive Consume resource  Negative Produce resource  Positive Consume waste  Positive Produce waste  Negative

Two Main Types of Opinions Direct Opinions: direct sentiment expressions on some target objects, e.g., products, events, topics, persons. E.g., “the picture quality of this camera is great.” (many are much more complex). Comparative Opinions: Comparisons expressing similarities or differences of more than one object. Usually stating an ordering or preference. E.g., “car x is cheaper than car y.”

Comparative Opinions (Jindal and Liu, 2006) Gradable Non-Equal Gradable: Relations of the type greater or less than Ex: “optics of camera A is better than that of camera B” Equative: Relations of the type equal to Ex: “camera A and camera B both come in 7MP” Superlative: Relations of the type greater or less than all others Ex: “camera A is the cheapest camera available in market”

Mining Comparative Opinions Objective: Given an opinionated document d,. Extract comparative opinions: (O1, O2, F, po, h, t), where O1 and O2 are the object sets being compared based on their shared features F, po is the preferred object set of the opinion holder h, and t is the time when the comparative opinion is expressed. Note: not positive or negative opinions.

Sentiment Lexicon

Sentiment (opinion) lexicon Sentiment lexicon: lists of words and expressions used to express people’s subjective feelings and sentiments/opinions sentiments/opinions. Not just individual words, but also phrases and idioms, e.g. “costs an arm and a leg” Many sentiment lexica can be found on the web They often have thousands of terms, and are quite useful

Sentiment lexicon Sentiment words or phrases (also called polar words, opinion bearing words, etc). E.g., Positive: beautiful, wonderful, good, amazing, Negative: bad, poor, terrible, cost an arm and a leg. Many of them are context dependent, not just application domain dependent. Three main ways to compile such lists: Manual approach: not a bad idea for a one-time effort Corpus-based approach Dictionary-based approach

Corpus vs Dictionary-based method Corpus-based approaches Often use a double propagation between opinion words and the items they modify require a large corpus to get good coverage Dictionary-based methods Typically use WordNet’s synsets and hierarchies to acquire opinion words usually do not give domain or context dependent meanings

Corpus-based approaches Rely on syntactic patterns in large corpora. (Hazivassiloglou and McKeown, 1997; Turney, 2002; Yu and Hazivassiloglou, 2003; Kanayama and Nasukawa, 2006; Ding, Liu and Yu, 2008) Can find domain dependent orientations (positive, negative, or neutral). (Turney, 2002) and (Yu and Hazivassiloglou, 2003) are similar. Assign opinion orientations (polarities) to words/phrases. (Yu and Hazivassiloglou, 2003) is slightly different from (Turney, 2002) use more seed words (rather than two) and use log-likelihood ratio (rather than PMI).

The Double Propagation method The DP method can also use dependency of opinions & aspects to extract new opinion words. Based on dependency relations Knowing an aspect can find the opinion word that modifies it E.g. “The rooms are spacious” Knowing some opinion words can find more opinion words E.g. “The rooms are spacious and beautiful” Jijkoun, Rijke and Weerkamp (2010) did similarly

Opinions implied by objective terms Most opinion words are adjectives and adverbs, e.g., good, bad, etc There are also many subjective and opinion verbs and nouns, e.g., hate (VB), love (VB), crap (NN). But objective nouns can imply opinions too E.g. “After sleeping on the mattress for one month, a body impression has formed in the middle” How to discover such nouns in a domain or context?

Pruning For an aspect with an implied opinion, it has a fixed opinion, either +ve or -ve, but not both. We find two direct modification relations using a dependency parser. Type 1: O → O-Dep → A This TV has good picture quality Type 2: O → O-Dep → H ← A-Dep ← A E.g. The springs of the mattress are bad If an aspect has mixed opinions based on the two dependency relations, prune it.

Dictionary-based methods Typically use WordNet’s synsets and hierarchies to acquire opinion words Start with a small seed set of opinion words. Bootstrap the set to search for synonyms and antonyms in WordNet iteratively (Hu and Liu, 2004; Kim and Hovy, 2004; Valitutti, Strapparava and Stock, 2004; Mohammad, Dunne and Dorr, 2009). Kamps et al., (2004) proposed a WordNet distance method to determine the sentiment orientation of a given adjective

Semi-supervised learning (Esuli and Sebastiani, 2005) Use supervised learning Given two seed sets: positive set P, negative set N The two seed sets are then expanded using synonym and antonymy relations in an online dictionary to generate the expanded sets P’ and N’ P’ and N’ form the training sets Using all the glosses in a dictionary for each term in P’  N’ and converting them to a vector Build a binary classifier SentiWordnet

Which approach to use? Both corpus and dictionary based approaches are needed. Dictionary usually does not give domain or context dependent meanings Corpus is needed for that Corpus-based approach is hard to find a very large set of opinion words Dictionary is good for that In practice, corpus, dictionary and manual approaches are all needed.

Spam Detection

Opinion Spam Detection (Jindal and Liu, 2007, 2008) Fake/untruthful reviews: Write undeserving positive reviews for some target objects in order to promote them. Write unfair or malicious negative reviews for some target objects to damage their reputations. Increasing number of customers wary of fake reviews (biased reviews, paid reviews)

An Example Practice of Review Spam Belkin International, Inc Top networking and peripherals manufacturer | Sales ~ $500 million in 2008 Posted an ad for writing fake reviews on amazon.com (65 cents per review) Jan 2009

Experiments with Amazon Reviews June 2006 5.8mil reviews, 1.2mil products and 2.1mil reviewers. A review has 8 parts <Product ID> <Reviewer ID> <Rating> <Date> <Review Title> <Review Body> <Number of Helpful feedbacks> <Number of Feedbacks> <Number of Helpful Feedbacks> Industry manufactured products “mProducts” e.g. electronics, computers, accessories, etc 228K reviews, 36K products and 165K reviewers.

Some Tentative Results Negative outlier reviews tend to be heavily spammed Those reviews that are the only reviews of some products are likely to be spammed Top-ranked reviewers are more likely to be spammers Spam reviews can get good helpful feedbacks and non-spam reviews can get bad feedbacks

Meeting Social Sciences Extract and analyze political opinions. Candidates and issues Compare opinions across cultures and lang. Comparing opinions of people from different countries on the same issue or topic, e.g., Internet diplomacy Opinion spam (fake opinions) What are social, culture, economic aspects of it? Opinion propagation in social contexts How opinions on the Web influence the real world Are they correlated? Emotion analysis in social context & virtual world

WebSays + Tiscali 17/1/2013

Summary We briefly defined sentiment analysis problem. Direct opinions: focused on feature level analysis Comparative opinions: different types of comparisons Opinion spam detection: fake reviews. Currently working with Google (Google research award). A lot of applications. Technical challenges are still huge. But I am quite optimistic. Interested in collaboration with social scientists opinions and related issues are inherently social.

References B. Liu, “Sentiment Analysis and Subjectivity.” A Chapter in Handbook of Natural Language Processing, 2nd Edition, 2010. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html