1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Information Retrieval Models: Probabilistic Models
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Crawling and Aligning Scholarly Presentations and Documents from the Web By SARAVANAN.S 09/09/2011 Under the guidance of A/P Min-Yen Kan 10/23/
1 Computing Relevance, Similarity: The Vector Space Model.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Chapter 23: Probabilistic Language Models April 13, 2004.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Retrieval of Relevant Opinion Sentences for New Products
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Implementation Issues & IR Systems
Information Retrieval Models: Probabilistic Models
A Markov Random Field Model for Term Dependencies
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Feature Selection for Ranking
Web Information retrieval (Web IR)
Presentation transcript:

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19, 2008 Hyun Duk Kim, Dae Hoon Park, V.G.Vinod Vydiswaran, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Research Questions 1.Can we improve sentence retrieval by assigning more weights to entity terms? 2.Can we optimize the coherence of a summary using a statistical coherence model? 2

Opinionated Relevant Sentences Sentence Filtering Step 2 General Approach 3 Target 1001 Q1 Q2 … Opinion Summary Sentence Organization Step 3 Doc Query 1 Query 2 Relevant Sentences Sentence Retrieval Step 1 ?

Step 1: Sentence Retrieval Target 1001 Question i Doc Indri Toolkit Relevant Sentences Named Entity: 10 Noun phrase : 2 Others : 1 Uniform Term Weighting Non-Uniform Weighting

Step 2: Sentence Filtering 5 Relevant Sentences S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’ S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’ Keep only the same polarity S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’ Keep only Opinionated Sentences Remove redundancy S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’

Step 3: Summary Organization (Method 1: Polarity Ordering) –Paragraph structure by question and polarity –Add guiding phrase 6 The first question is … Following are positive opinions… Following are negative opinions… The second question is … Following are mixed opinions… …

Step 3: Summary Organization (Method 2: Statistical Coherence Optimization) 7 S1 S2 Sn … S1’ S2’ Sn’ … S3 S3’ c(S1’, S2’) c(S2’, S3’) c(Sn- 1 ’, Sn’) + … Coherence function: c(Si, Sj) Use a greedy algorithm to order sentences to maximize the total score c(S1’, S2’)+c(S1’, S2’)+…+c(Sn- 1 ’, Sn’)

Probabilistic Coherence Function (Idea similar to [Lapata 03]) 8 Sentence 1 Sentence 2 u vv v vu u Yes No P(u,v) = ( )/(3*4+1.0) Average coherence probability (Pointwise mutual information) over all word combinations Train with original document

Opinionated Relevant Sentences Sentence Filtering Step 2 General Approach 9 Target 1001 Q1 Q2 … Opinion Summary Sentence Organization Step 3 Query 1 Doc Sentence Retrieval Relevant Sentences Step 1 Query 2 ?

Opinionated Relevant Sentences Sentence Filtering Step 2 Submissions: UIUC1, UIUC2 10 Target 1001 Q1 Q2 … Opinion Summary Sentence Organization Step 3 Query 1 Doc Sentence Retrieval Relevant Sentences Step 1 Query 2 ? UIUC1: non-uniform weighting UIUC2: uniform weighting UIUC1: Aggressive polarity filtering UIUC2: Conservative filtering UIUC1: Polarity ordering UIUC2: Statistical ordering

Evaluation Rank among runs without answer-snippet 11 F-ScoreGrammaticality Non- redundancy Structure/ Coherence Fluency/ Readability Responsiveness UIUC1 Polarity UIUC2 Coherence (Total: 19 runs)

Evaluation Rank among runs without answer-snippet 12 F-ScoreGrammaticality Non- redundancy Structure/ Coherence Fluency/ Readability Responsiveness UIUC1 Polarity UIUC2 Coherence Statistical ordering Nothing (Total: 19 runs) NE/NP retrieval, Polarity filtering Polarity ordering

Evaluation of Named Entity Weighting Target 1001 Question i Doc Indri Toolkit Relevant Sentences Named entity: 10 Noun phrase : 2 Others : 1 Uniform Term Weighting Non-Uniform Weighting Assume a sentence is relevant iff similarity(sentence, nugget description) > threshold

Effectiveness of Entity Weighting Weighting Weighting

Polarity Module Polarity module performance evaluation on the sentiment corpus. [Hu&Liu 04, Hu&Liu 04b] 15 Classification resultPositiveNegative NonOpinionated Positive Negative Mixed Total Exact Match1363/3105= /1591=0.26 ( )/( )=0.38 Exact Opposite383/3105= /1591=0.23 ( )/( )=0.16 (Unit: # of sentence)

Coherence optimization Evaluation methods –Basic assumption the sentence order of original document is coherent –Among given target documents, use 70% as training set, 30% as test set. –Measurement: strict pair matching # of correct sentence pair / # of total adjacent sentence pair 16

Probabilistic Coherence Function 17 Average coherence probability over all word combinations Point-wise Mutual information with smoothing Strict joint probability

Probabilistic Coherence Function 18 Mutual information where, N = c(u,v)+c(not u, v)+c(u, not v)+c(not u, not v) For unseen pairs, p(u,v)=0.5*MIN(seen pairs in training)

Coherence optimization test –Pointwise mutual information effectively penalize common words 19 Selection of training words Strict Joint Probability Mutual Information Pointwise Mutual Information No Omission Omitted stopwords Omitted frequent words (counts > 33) (counts > 11) (counts > 6) (counts > 2) Omitted rare words (counts < 33) (counts < 14) (counts < 6) (counts < 2)

Coherence optimization test Top ranked p(u,v) of strict joint probability –A lot of stopwords are top-ranked. 20 uvp(u,v) the to the of and a the … the to of and the a … 1.75E E E E E E E-003 …

Coherence optimization test –Pointwise Mutual information was better than joint probability and normal mutual information. –Eliminating common words, very rare words improved performance 21 Selection of training wordsCoherence score Baseline: random order Strict Joint Probability Mutual Information Pointwise Mutual Information (UIUC2) Omitted stopwords Omitted non-stopwords Omitted 95% least frequent words (counts < 33): Omitted 90% least frequent words (counts < 14): Omitted 80% least frequent words (counts < 6): Omitted 60% least frequent words (counts < 2):

22 Conclusions Limited improvement in retrieval performance using named entity and noun phrase Need for a good polarity classification module Possibility on the improvement of statistical sentence ordering module with different coherence function and word selection

Thank you 23 University of Illinois at Urbana-Champaign Hyun Duk Kim