Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Trends in Sentiments of Yelp Reviews Namank Shah CS 591.
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Farag Saad i-KNOW 2014 Graz- Austria,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Presented by Zeehasham Rasheed
Scalable Text Mining with Sparse Generative Models
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Nikolay Archak,Anindya Ghose,Panagiotis G. Ipeirotis Class Presentation By: Arunava Bhattacharya.
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
Mining and Summarizing Customer Reviews
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Sentiment Detection Naveen Sharma( ) PrateekChoudhary( ) Yashpal Meena( ) Under guidance Of Prof. Pushpak Bhattacharya.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Bo Pang , Lillian Lee Department of Computer Science
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Algorithmic Detection of Semantic Similarity WWW 2005.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
CSC 594 Topics in AI – Text Mining and Analytics
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Opinion Observer: Analyzing and Comparing Opinions on the Web
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423 Summary Information retrieval and Web search  Vecter space model  Tf-idf  Cosine similarity  Evaluation: precision, recall  PageRank 1.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Queensland University of Technology
Semi-Supervised Clustering
Erasmus University Rotterdam
Memory Standardization
Aspect-based sentiment analysis
An Overview of Concepts and Selected Techniques
Presentation transcript:

Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Outline Introduction Previous Work Our Approach Example Challenges and Future Work Milestones Conclusion

Introduction The Web contains a wealth of opinions about products, politics, newsgroup posts, review sites, and elsewhere Our interest: to mine opinions expressed in user generated content

Applications Businesses and Organizations  Market Intelligence: A huge amount of money is spent to find consumer sentiments and opinions Opinion Polls, surveys Individuals interested in other opinions when  Purchasing a product  Finding opinion on political topics  Using a service etc. Smart Ads  Place an ad when one praises a product  Place an ad from a competitor if one criticizes a product Opinion Search  Provide search for opinions  Give me opinions on “gmail”  Give me comparisons between “gmail vs yahoomail”

Types of opinions Direct Opinions: sentiment expressions on objects. E.g. policies, politicians, movies, products  E.g. “I find myself in support of the Senate Judiciary Committee, which approved legislation that clears the way for millions of undocumented workers to continue working in America and seek citizenship.” Comparisons: relations expressing similarities or differences of more than one object.  E.g. “I think Bush will beat Kerry in the presidential elections” or “The lens quality of Camera A is better than Camera B”

Problem Statement Given a object and a collection of reviews on it, the task is  Identification of features  Making hierarchy of features  Sentiment Analysis: Determining the orientation and strength  Provide a visualization (summary)

Previous Work Mainly focused on product and movie reviews Feature Extraction  Opinion Observer (Hu and Liu, 2004)  Opine (Popescu and Etzioni, 2005)  Red Opal (Scaffidi, 2007) Hierarchical Discovery  To be filled by kristi

Previous Work Opinion Observer By Bing Liu and Minqing Hu Feature Extraction  Identify Nouns using POS tagging  Identify Noun phrases by Association Rule Mining  Compactness pruning, redundancy pruning  Opinion word extraction  Infrequent feature identification 72% precision and 80% recall

Previous Work OPINE Feature Extraction  First, extract nouns and noun phrases, retains those with frequency greater than some threshold  Evaluates each noun phrase by computing the PMI (point-wise mutual information) scores between the phrase and meronymy discriminators associated with the product class E.g. “of scanner”, “scanner has”, “scanner come with” etc. for the Scanner class PMI(f,d) = Hits(d+f) / {Hits(d) * Hits(f)}  Then, PMI score are converted to binary features for a Naïve Bayes Classifier, which outputs a probability associated with each fact  Compared to Hu and Liu work, 22% better precision and 3% lower recall

Previous Work Red Opal 3 components:  Feature Extractor  Product Scorer  User Interface Performs better than Opinion Observer

Previous Work Red Opal Feature Extraction  POS tagging, takes noun and noun phrases as potential features  Use lemma frequency to rank the features Product Scoring: Score of feature f of product p  o(r,f) is the number of occurrences of feature f in review r  w(r,f) is the weight of feature f in review r

Previous Work Clustering Conceptual clustering  CLUSTER/2 Places object descriptions and attributes together to obtain domain- dependent goals  COBWEB Favours classes that maximize the information that can be predicted from knowledge of class membership Hierarchical clustering  BIRCH Hierarchically cluster elements in a dataset Level of clustering quality = level in the hierarchy

Previous Work Hierarchy Discovery Han and Fu define formally as “A sequence of mapping from a set of lower-level concepts to their higher-level correspondences”  DBLearn automatically discovered a hierarchy of concepts for the purpose of data mining Ie: birthplace may have the following hierarchy: city, province, country Foreman et al.  Trains categorizers and automatically constructs hierarchy of categories using human trainers  Good GUI  Difficult for novice users and hard to optimize

Previous Work

Hierarchy Discovery Sanderson and Croft  Automatically develop hierarchy in web documents  Organize extracted words/phrases using subsumption  No clustering or training techniques Yang and Lee  Hierarchies of web directories  Text mining to discover relationships between documents and between words Cluster them into document and word maps

Previous Work Sentiment Analysis Esuli and Sebastiani  3 stages: Determine subjective/objective polarity Determine positive/negative polarity Determine strength of the positive/negative polarity  Uses SentiWordNet to assign 3 scores to each word (objectivity, positivity, negativity)

Previous Work Sentiment Analysis Pang and Lee  Only subjective sections of the movie review  Machine learning techniques Pair-wise relations between extracts to build an undirected graph Minimum cut  Efficient and results in higher accuracy rates Agarwal and Bhattacharyya:  SVM classifier  Determine strength of polarity of subjective adjectives in good vs bad classification based on WordNet’s synonymy graph  Applied cut-based graph similar to Pang et al  Reached accuracies of 84%-95.6%

Our proposal Apply feature extraction and opinion mining in political domain Applications in political domain:  Automatic opinion polls  Identification of local/global issues in elections  Target campaigning in elections  Impact of speech Output: Objects are politicians Categories are political organizations Topic may be policies, issues etc In this project, we focus mainly on feature extraction and their hierarchy discovery

Our Approach Observations Two kinds of opinions:  Direct – talks about single object  Comparison – talks about multiple objects Two kinds of information  Facts (objective)  Opinions (subjective) Sentiment Analysis can be done only on subjective information Although, features occur both categories, subjective sentences are noisy

Comparison to product domain Product DomainPolitical Domain CategoryProduct Category (e.g. Camera) Political Organizations (e.g. Democrats) ObjectProduct (e.g. Camera A) Leaders (e.g. Bush) Features/TopicsProperties (e.g. lens) Policies (e.g. Immigration)

Our Approach Sub-features Features Objects Categories Political Organizations PoliticianPolicy Sub- policy PolicyPoliticianPolicy Sub- policy

Our Approach Perform feature extraction Split into objective and subjective phrases Hierarchy discovery on features from objective sentences Sentiment analysis on features from subjective sentences

Our Approach Feature Extraction Extract the features  Extract nouns from POS tagging  Extract noun phrases from Association Rule Mining  Pruning  Rank the features based on lemma frequency Identify the subjectivity of all sentences  Mine the opinion words (adjectives)  Use key phrases dictionary (e.g. “can you believe”, “I think”, “I recommend” etc)  Visual differences – factual data is often represented in quotes

Our Approach Hierarchy Discovery 3 approaches:  Subsumption Sanderson and Croft Look at every pair of terms and apply subsumption X subsumes Y if the documents in which Y occurs are a subset of the documents in which X occurs P(X|Y) = 1 and P(Y|X) < 1  Clustering  Use DBpedia and/or YAGO XY X Y

Our Approach Hierarchy Discovery 3 approaches:  Subsumption  Clustering Yang and Lee Cluster phrases by co-occurrance Using unsiupervised learning algorithm  SOM networks  Organizes phrases into a 2D map of neurons  According to similarity of vectors 3 Steps:  Training process  Assigning phrases to a neuron  Labelling process  Use DBpedia and/or YAGO

Our Approach Hierarchy Discovery 3 approaches:  Subsumption  Clustering Find a group of dominating clusters (neurons) Make these as superclusters and put neighbours one level down Repeat for lower level of hierarchy under each subcluster  Use DBpedia and/or YAGO

Our Approach Hierarchy Discovery 3 approaches:  Subsumption  Clustering  Use DBpedia and/or YAGO DBpedia provides 3 classification schemes:  Wikipedia categories  YAGO classification  Word Net Sysnet Links

Our Approach Hierarchy Discovery 3 approaches:  Subsumption  Clustering  Use DBpedia and/or YAGO

Our Approach Hierarchy Discovery Sub-features Features Objects Category Political Organizations PoliticianPolicy Sub- policy PolicyPoliticianPolicy Sub- policy

Our Approach Sentiment Analysis 2 ways to approach this:  Subjective phrases What does the public think about each policy  Objective phrases What is the policy Rank parties from each policy on a scale from right-wing to left-wing

Our Approach Sentiment Analysis Subjective phrases  What does the public think the policy  Pang and Lee Cut-based classification (Pang and Lee)  Individual scores  Association scores  Partition Cost A cut (S,T) of G is a partition of its nodes into sets S = {s} U S’ and T = {t} U T’, where s not contained in S’ and t is not contained in T’. Its cost cost(S,T) is the sum of the weights of all edges crossing from S to T A minimum cut of G is one of minimum cost.

Our Approach Sentiment Analysis Subjective phrases  What does the public think about each policy  Agarwal and Bhattacharyya Determine adjective strength Cut-based classification between sentences (Pang and Lee) Cut-based classification between documents  Improved accuracy

Our Approach Sentiment Analysis Objective phrases  What is the policy  Rank parties from each policy on a scale from right-wing to left- wing  Definition of polarity would be left/right using a comparison of left-wing and right-wing policies/ideals Instead of traditional positive/negative using the ideal words ‘poor’ and ‘excellent’ Left-wing (Liberal) Right-wing (Conservative)

Example The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields. Political Organization = Republicans Politician = George Bush Topic = War in Iraq Sub-topic = cost Opinion words = absurd, killing, freeing Polarity = negative Ideal case:

Example The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields. Noun phrases: economic cost, war in Iraq, amount, report, amount, money, people, oil fields Proper nouns: White House, Democrats on Congress Joint Economic Committee Frequent features: economic cost, war in Iraq, money, oil fields, White House Feature Extraction:

Example The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields. Opinion words: think, absurd 1 st sentence is objective, and 2 nd is subjective Interesting features: economic cost, war in Iraq Identification of Subjectivity

Example The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields. Identification of category/object for proper nouns using DBpedia Category = Republicans Object = George Bush Hierarchy Discovery – step 1

Example Sub-features Features Object Category Republican George W. Bush Policy Sub- policy

Example The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields. Identification of policy hierarchy using subsumption and clustering Policies are derived from interesting features  economic cost, war in Iraq Hierarchy Discovery – step 2

Example Sub-features Features Object Category Republican George W. Bush War in Iraq Economic Cost

Example The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields. Opinion is the subjective sentence Polar words: absurd, spending, killing, freeing Polarity: Negative Sentiment Analysis

Challenges Difficult to distinguish between objective and subjective information Opinion words also occur in objective sentences Identification of spam blogs Identification of implicit features Mapping politician to the policy in comparison blogs Deciding on a distance measurement for clustering

Future Work Implementation of algorithms Summarization of opinions  Visualization Refinements

Milestones Decide on domain Read previous works Decide on an approach that is best for the domain Write up an example to illustrate it Challenges and future work Presentation Write the paper

Questions?

Previous Work OPINE (Backup Slide) Overall Process

Previous Work Opinion Observer (Backup Slide) By Bing Liu and Minqing Hu

Types of opinions Direct Opinions: sentiment expressions on objects. E.g. policies, politicians, movies, products  E.g. “I find myself in support of the Senate Judiciary Committee, which approved legislation that clears the way for millions of undocumented workers to continue working in America and seek citizenship.” Comparisons: relations expressing similarities or differences of more than one object.  E.g. “I think Bush will beat Kerry in the presidential elections” or “The lens quality of Camera A is better than Camera B”