Presentation is loading. Please wait.

Presentation is loading. Please wait.

Opinion Analysis Sudeshna Sarkar IIT Kharagpur. Introduction – facts and opinions Two main types of information on the Web. Facts and Opinions Current.

Similar presentations


Presentation on theme: "Opinion Analysis Sudeshna Sarkar IIT Kharagpur. Introduction – facts and opinions Two main types of information on the Web. Facts and Opinions Current."— Presentation transcript:

1 Opinion Analysis Sudeshna Sarkar IIT Kharagpur

2 Introduction – facts and opinions Two main types of information on the Web. Facts and Opinions Current search engines search for facts (assume they are true) Facts can be expressed with topic keywords. Search engines do not search for opinions Opinions are hard to express with a few keywords How do people think of Motorola Cell phones? Current search ranking strategy is not appropriate for opinion retrieval/search.

3 Overview Motivation Definitions Coarse grained vs Fine grained opinion analysis Opinion Lexicons Approaches to document level opinion analysis Lexicon based Supervised learning approaches Mixed approaches Approaches to fine-grained opinion analysis Rule based Learning Opinion mining work at IIT Kharagpur

4 Opinion Mining Search for and aggregate opinions from online sources Many reviews have both positive and negative sentences Many products are liked by some and disliked by others – there must be different reasons Identify different features/ aspects of the target and the opinion on these separately

5 Why do opinion analysis? Opinion search to extract examples of particular types of positive or negative statements on some topic. Opinion question answering What is the reaction to the Left Front’s stand on the nuclear deal? Is support diminishing for the UPA government? Product review mining What features of “Mr Coffee programmable coffee maker” do users like and what they dislike (Microsoft Live) Review classification Tracking sentiment toward topics over time to track the ups and downs of aggregate attitudes to a brand or product

6 Introduction – Applications Businesses and organizations: product and service benchmarking. Market intelligence. Business spends a huge amount of money to find consumer sentiments and opinions. Consultants, surveys and focused groups, etc Individuals: interested in other’s opinions when Purchasing a product or using a service, Finding opinions on political topics, Many other decision making tasks. Ads placements: Placing ads in user-generated content Place an ad when one praises an product. Place an ad from a competitor if one criticizes an product. Opinion retrieval/search: providing general search for opinions.

7 Question Answering Opinion question answering: Q: What is the international reaction to the reelection of Robert Mugabe as President of Zimbabwe? A: African observers generally approved of his victory while Western Governments denounced it.

8 Opinion search (Liu, Web Data Mining book, 2007) Can you search for opinions as conveniently as general Web search? Whenever you need to make a decision, you may want some opinions from others, Wouldn’t it be nice? you can find them on a search system instantly, by issuing queries such as Opinions: “Motorola cell phones” Comparisons: “Motorola vs. Nokia” Cannot be done yet!

9 Typical opinion search queries Find the opinion of a person or organization (opinion holder) on a particular object or a feature of an object. E.g., what is Bill Clinton’s opinion on abortion? Find positive and/or negative opinions on a particular object (or some features of the object), e.g., customer opinions on a digital camera, public opinions on a political topic. Find how opinions on an object change with time. How object A compares with Object B? Gmail vs. Yahoo mail

10 Find the opinion of a person on X In some cases, the general search engine can handle it, i.e., using suitable keywords. Bill Clinton’s opinion on abortion Reason: One person or organization usually has only one opinion on a particular topic. The opinion is likely contained in a single document. Thus, a good keyword query may be sufficient.

11 Find opinions on an object X We use product reviews as an example: Searching for opinions in product reviews is different from general Web search. E.g., search for opinions on “Motorola RAZR V3 ” General Web search for a fact: rank pages according to some authority and relevance scores. The user views the first page (if the search is perfect). One fact = Multiple facts Opinion search: rank is desirable, however reading only the review ranked at the top is dangerous because it is only the opinion of one person. One opinion  Multiple opinions

12 Search opinions (contd) Ranking: produce two rankings Positive opinions and negative opinions Some kind of summary of both, e.g., # of each Or, one ranking but The top (say 30) reviews should reflect the natural distribution of all reviews (assume that there is no spam), i.e., with the right balance of positive and negative reviews. Questions: Should the user reads all the top reviews? OR Should the system prepare a summary of the reviews?

13 User generated content Word of mouth on the web. Review sites Blogs Online forums Shopping comparison sites User reviews Mine opinions expressed in the user- generated content Challenging task Useful to i ndividual consumers and companies.

14 Motivation for Consumer I want to buy a camera. Which model should I pick? Ask my friends Use the internet CEA-CNET Study: Tech-Savvy Consumers Use Internet to Research Products Before Buying Them Wireless News, November, 2007 Wireless NewsNovember, 2007 Seventy Percent of Consumers Use Internet to Research Consumer Packaged Goods, According to Prospectiv Survey Market Wire, January, 2008 Market WireJanuary, 2008

15 Businesses Identify opinions about products – help to position/ adapt products Much of product feedback is web-based provided by customers/critiques online through websites, discussion boards, mailing lists, and blogs, CRM Portals. Market research is becoming unwieldy Sources are heterogeneous and multilingual in nature

16 Facts vs Opinions An opinion is a person's ideas and thoughts towards something. It is an assessment, judgment or evaluation of something. An opinion is not a fact, because opinions are either not falsifiable, or the opinion has not been proven or verified.... en.wikipedia.org/wiki/Opinion en.wikipedia.org/wiki/Opinion Subjectivity: The linguistic expression of somebody’s emotions, sentiments, evaluations, opinions, beliefs, speculations, etc. Polarity: positive and negative This camera is awesome. The movie is too long and boring. Strength of opinion

17 Levels of opinion analysis Coarse to fine grained opinion analysis Document level: At the document (or review) level Subjective vs Objective Sentiment classification: positive, negative or neutral Sentence level, Expression level Task 1: identifying subjective/opinionated sentences (or clauses/ phrases) Classes: objective and subjective (opinionated) Task 2: sentiment classification of sentences Classes: positive, negative and neutral. But a document/ sentence may contain multiple opinions on more than one topic from one or more opinion holder

18 Lexicon Development Manual Semi-automatic Fully automatic  Find relevant words, phrases, patterns that can be used to express subjectivity  Determine the polarity of subjective expressions

19 Opinion Words An opinion lexicon containing lists of positive and negative phrases is very useful for the opinion mining task at different levels Positive: beautiful, wonderful, good, amazing, Negative: bad, poor, terrible, cost someone an arm and a leg How to compile such a list? Dictionary-based approaches Corpus-based approaches Supervised Semi-supervised BUT Some opinion words are context independent (e.g., good). Some are context dependent (e.g., long).

20 Hand created lists Create lists of opinion words appropriate for the domain manually Sentiment term Polarity Strength These approaches, while being interesting, are labor intensive and can be vulnerable to error and high maintenance costs

21 21 Dictionary-based approaches Start from a set of seed opinion words Use WordNet’s synsets and hierarchies to acquire opinion words Use the seeds to search for synonyms and antonyms in WordNet (eg, Hu and Liu, 2004).

22 22 Dictionary-based approaches Use additional information (e.g., glosses) and learning from WordNet (Andreevskaia and Bergler, 2006) (Esuti and Sebastiani, 2005).

23 23 Dictionary-based approaches Advantage: Good to find a lot of such words Weakness: Do not find context dependent opinion words, e.g., small, long, fast.

24 Corpus-based approaches Rely on syntactic rules and co-occurrence patterns to extract from large corpora Use a list of seed words A large domain corpus Machine learning Advantages: This approach can find domain (corpus) dependent opinions. 24

25 How to identify subjective terms? Assume that contexts are coherent Statistical Association: If words of the same orientation like to co-occur together, then the presence of one makes the other more probable Use statistical measures of association to capture this interdependence Assume that contexts are coherent Assume that alternatives are similarly subjective

26 26 Corpus-based approaches (contd) Conjunctions: Conjoined adjectives usually have the same orientation (Hazivassiloglou and McKeown 1997). E.g., “This car is beautiful and spacious.”(conjunction) 1. Start with seed words 2. Use conjunctions to find adjectives with similar orientations 3. Use log-linear regression to aggregate information from various conjunctions 4. Use hierarchical clustering on a graph representation of adjective similarities to find two groups of same orientation

27 nice handsome terrible comfortable painful expensive fun scenic nice handsome terrible comfortable painful expensive fun scenic slow

28 Growing contextual opinion words [Ding, Liu, Wu] Intra-sentence conjunction rule Opinion on both sides of “and” / two consecutive sentences tend to be the same E.g., “This camera takes great pictures and has a long battery life”. But with a “but”-like clause, the opinions tend to be of opposite polarity. Context is important Long battery life vs Long time to focus Growing by applying various conjunctive rules Verifying the results as the system sees more reviews by those conjunctive rules Only keep those opinions which the system is confident about, controlled by a confidence limit. 28

29 Semantic Orientation by Association Labeled semantic orientation of words Pwords = {good, nice, excellent, positive, fortunate, correct, superior} Nwords = {bad, nasty, poor, negative, unfortunate, wrong, inferior}. Various approach to calculate the semantic association of two words Pointwise Mutual Information (PMI) [Church and Hanks 1989] Latent Semantic Indexing (LSI) Dumais et al. 1990] Likelihood Ratios [Dunning 1993]

30 Turney 2002; Turney & Littman 2003 Determine the semantic orientation of each extracted phrase based on their association with seven positive and seven negative seed words

31 Weakly spervised learning Gammon Aue 2005 Given a list of seed words (seed words 1) Get more seed words (seed words 2)– words with low PMI at sentence level Get semantic orientation of (seed words 2) by PMI at document level Get Semantic orientation of all words by PMI with all seed words

32 Document level opinion analysis Polarity classification: Classify documents (e.g., reviews) based on the overall sentiments expressed by authors, Approaches Use opinion lexicon Knowledge Engineering Supervised learning techniques Classifying using the Web as a corpus Semi-supervised

33 Knowledge Engineering Make use of lists of sentiment terms Manually create analysis components based on cognitive linguistic theory: parser, feature structure representation, etc

34 Supervised polarity classifier Requirements: A labeled database of opinion Download ratings from Amazon.com, epinions.com etc. Build a binary opinion classifier From positive and negative ratings Merge 1 and 2 stars to negative and 3, 4 and 5 to positive Use thresholded SVM, maximum entropy, naïve Bayes, etc.

35 Supervised Training 1. Obtain Labeled Sentences: positive, neutral, negative 2. Extract features: words, n-grams, multi word expressions, feature generalization [Kim & Hovy 2007] 3. Feature values: binary/ frequency 4. Run Training algorithm on the features to give a classifier 5. [Optional] Do feature selection (use log-likelihood ratio)

36 Semi-supervised approaches Fully supervised techniques require large amount of labeled data for the given domain Semi-supervised systems Use small amount of domain knowledge 1. From a small set of seed words use domain corpus to get domain relevant opinion words as discussed earlier

37 Semi-supervised approach Gamon & Aue 2005 1. Obtain opinion words by semi-supervised approach 2. Given a domain corpus, label data using average semantic orientation 3. Train classifier on labeled data


Download ppt "Opinion Analysis Sudeshna Sarkar IIT Kharagpur. Introduction – facts and opinions Two main types of information on the Web. Facts and Opinions Current."

Similar presentations


Ads by Google