An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

Slides:



Advertisements
Similar presentations
Ani Nenkova Lucy Vanderwende Kathleen McKeown SIGIR 2006.
Advertisements

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Chapter 5: Introduction to Information Retrieval
Farag Saad i-KNOW 2014 Graz- Austria,
A Brief Overview. Contents Introduction to NLP Sentiment Analysis Subjectivity versus Objectivity Determining Polarity Statistical & Linguistic Approaches.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Patent Search QUERY Log Analysis Shariq Bashir Department of Software Technology and Interactive Systems Vienna.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Mining and Summarizing Customer Reviews
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Sentimental feature selection for sentiment analysis of Chinese online reviews Lijuan Zheng 1,2, Hongwei Wang 2, and Song Gao 2 1 School of Business, Liaocheng.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Queensland University of Technology
Sentiment analysis algorithms and applications: A survey
Text Based Information Retrieval
Erasmus University Rotterdam
Memory Standardization
Compact Query Term Selection Using Topically Related Text
Applying Key Phrase Extraction to aid Invalidity Search
CS 430: Information Discovery
CSE 635 Multimedia Information Retrieval
Introduction to Text Analysis
Introduction to Search Engines
Introduction Dataset search
Presentation transcript:

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

2 Introduction  Blogs have recently emerged as a new grassroots publishing medium.  A key feature that distinguishes blog content from other Web content is their subjective nature.  Bloggers tend to express opinions and comments towards some given targets, such as persons, organizations or products.

3  Under the TREC opinion finding task, only a handful of groups achieved an improvement over their baseline, using techniques such as NLP or SVM classifiers.  These proposed approaches either involve considerable manual efforts in collecting evidence for opinions, or lead to little improvement over a baseline that does not include any opinion finding feature. Introduction

4  This paper proposes a statistical and light- weight automatic dictionary-based approach.  Also shows that despite its apparent simplicity, it provides statistically significant improvements over robust baselines, including the best TREC baseline run, without any manual effort. Introduction

5 The Statistical Dictionary-based Approach to Opinion Retrieval 1. Automatically generates a dictionary from the collection without requiring manual effort. 2. Assigns a weight to each term in the dictionary, which represents how opinionated the term is. 3. Assigns an opinion score to each document in the collection using the top weighted terms from the dictionary as a query. 4. Appropriately combines the opinion score with the initial relevance score produced by the retrieval baseline.

6 Dictionary Generation  To derive the dictionary, we filter out too frequent or too rare terms in the collection.  We remove those terms because if a term appears too many or too few times in the collection, then it probably contains too little or too specific information so that it can not be generalized to different queries in indicating opinion.

7  We firstly rank all terms in the collection by their within-collection frequencies in descending order.  The terms, whose rankings are in the range (s·#terms, u·#terms), are selected in the dictionary.  We apply s = and u = Dictionary Generation

8

9 Term Weighting D(Rel): relevant document set. D(opRel): opinionated relevant document set.  For each term t in the opinion term dictionary, we measure w opn (t), the divergence of the term’s distribution in D(opRel) from that in D(Rel).  This divergence value measures how a term stands out from the opinionated documents, compared with all relevant documents.  The higher the divergence is, the more opinionated the term is.

10 Term Weighting  A commonly used measure for term weighting is the KL divergence from a term’s distribution in a document set to its distribution in the whole collection.

11  KL divergence measure considers only the divergence from one distribution to the other, while ignoring how frequent a term occurs in the opinionated documents.  The weights of the terms in the opinion dictionary might be biased towards the terms with high KL divergence values, but containing low information in the opinionated document set D(opRel). Term Weighting

12 Term Weighting  Another method: Bo1 term weighting model, which measures how informative a term is in the set D(opRel) against D(Rel). λ= tf rel /N rel

13 Generating the Opinion Score  We take the X (in the experiment, set X=100) top weighted terms from the opinion dictionary, and submit them to the retrieval system as a query Q opn.  The retrieval system assigns a relevance score to each document in the collection.  Such a relevance score reflects the extent to which the top weighted opinionated terms are informative in the document, capturing the overall opinionated nature of the document.  This is called the opinion score: Score(d, Q opn ).

14 Score Combination 1. Linear combination: 2. Log. combination:

15 Experiment: Data  Dataset: Blog06 collection.  Use permalinks, which are the blog posts and their associated comments.  Each term is stemmed using Porter’s English stemmer, and standard English stopwords are removed.

16 Experiment: Baseline  InLB document weighting model: b=0.2337

17 Experiment: External Opinion Dictionary  We also manually generate a dictionary compiled from various external linguistic resources.  The dictionary contains approximately 12,000 English words, mostly adjectives, adverbs and nouns, which are supposed to be subjective.  In this paper, we denote the manually edited dictionary by the external dictionary, and we denote the automatically derived one by the internal dictionary.

18 Experiment: External Opinion Dictionary

19 Experiment: Evaluation

20 Experiment: Evaluation Use Bo1 term weighting method. Set a=0.25, k=250.

21  This paper has proposed an effective and practical approach to retrieving opinionated blog posts without the need for manual effort.  The use of the automatically generated internal dictionary provides a retrieval performance that is as good as the use of an external dictionary manually compiled from various linguistic resources. Conclusions and Future Work

22 In the future: 1. Extend the work to detecting the polarity or the orientation of the retrieved opinionated documents. 2. Study the connection of the opinion finding task to question answering.  Ex. Extracting the opinionated sentences within a blog post about a given target. Conclusions and Future Work