LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Ani Nenkova Lucy Vanderwende Kathleen McKeown SIGIR 2006.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Graph-based Text Summarization
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
EUSUM: extracting easy-to-understand english summaries for non-native readers Xiaojun Wan, Huiying Li, Jianguo Xiao pp (SIGIR 2010) EUSUM: Extracting.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Using Social Networking Techniques in Text Mining Document Summarization.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Clustering Unsupervised learning Generating “classes”
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
Automatic Summarization of News using WordNet Concept Graphs
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
LexRank: Graph-based Centrality as Salience in Text Summarization
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
LexRank: Graph-based Centrality as Salience in Text Summarization
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
Algorithmic Detection of Semantic Similarity WWW 2005.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Vector Space Models.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization Ziheng Lin and Min-Yen Kan Department of Computer Science National University.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
NUS at DUC 2007: Using Evolutionary Models of Text Ziheng Lin, Tat-Seng Chua, Min-Yen Kan, Wee Sun Lee, Long Qiu and Shiren Ye Department of Computer Science.
Content Selection: Topics, Graphs, & Supervision
Semantic Processing with Context Analysis
Minimum Spanning Tree 8/7/2018 4:26 AM
John Frazier and Jonathan perrier
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
CS224N: Query Focused Multi-Document Summarization
Text Categorization Berlin Chen 2003 Reference:
Presented by Nick Janus
Presentation transcript:

LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)

Introduction  Text summarization is the process of automatically creating a compressed version of a given text that provides useful information for the user.  Multi-document text summarization is to produce a summary of multiple documents about the same, but unspecified topic.  Our approach is to assess the centrality of each sentence in a cluster and include the most important ones in the summary.

Sentence centrality and centroid-based summarization  Extractive summarization produces summaries by choosing a subset of the sentences in the original documents.  This process can be viewed as choosing the most central sentences in a multi-document cluster.  Centrality of a sentence is often defined in terms of the centrality of the words that it contains.

Sentence centrality and centroid-based summarization  In centroid-based summarization, the sentences that contain more words from the centroid of the cluster (words that have frequency*IDF scores above a predefined threshold) are considered as central.  Formally, the centroid score of a sentence is the cosine of the angle between the centroid vector of the whole cluster and the individual centroid of the sentence.

Prestige-based sentence centrality  We propose a new method to measure sentence centrality based on prestige in social networks.  A cluster of documents can be viewed as a network of sentences that are related to each other.  The sentences that are similar to many of the other sentences in a cluster are more central to the topic.  A cluster may be represented by a cosine similarity matrix where each entry in the matrix is the similarity between the corresponding sentence pair.

Prestige-based sentence centrality

Degree centrality  In a cluster of related documents, many of the sentences are expected to be somewhat similar to each other.  We can eliminate some low values in this matrix by defining a threshold so that the cluster can be viewed as a graph, where each sentence of the cluster is a node, and significantly similar sentences are connected to each other.

Degree centrality

 We define degree centrality as the degree of each node in the similarity graph.  The choice of cosine threshold dramatically influences the interpretation of centrality.  Too low thresholds may mistakenly take weak similarities into consideration while too high thresholds may lose much of the similarity relations in a cluster.

Degree centrality

LexPageRank  When computing degree centrality, we have treated each edge as a vote to determine the overall prestige value of each node.  Assume that the unrelated document contains some sentences that are very prestigious considering only the votes in that document.  These sentences will get artificially high centrality scores by the local votes from a specific set of sentences.

 In PageRank, the score of a page is determined depending on the number of pages that link to that page as well as the individual scores of the linking pages. LexPageRank

 This method can be directly applied to the cosine similarity graph to find the most prestigious sentences in a document.  We use PageRank to weight each vote so that a vote that comes from a more prestigious sentence has a greater value in the centrality of a sentence.

LexPageRank

 The graph-based centrality approach we have introduced has several advantages. 1. it accounts for information subsumption among sentences. 2. it prevents unnaturally high IDF scores from boosting up the score of a sentence that is unrelated to the topic. LexPageRank

Experiment - data  We used DUC 2004 data in our experiments.  There are 2 generic summarization tasks (Tasks 2, 4a, and 4b) in DUC 2004 which are appropriate for the purpose of testing our new feature, LexPageRank.

Experiment - ROUGE  ROUGE is a recall-based metric for fixed-length summaries which is based on n-gram co- occurence.  We show three of the ROUGE metrics in our experiment results: ROUGE-1 (unigram-based), ROUGE-2 (bigram-based), and ROUGE-W (based on longest common subsequence weighted by the length).  We produced 665-byte summaries for each cluster and computed ROUGE scores against human summaries.

Experiment - MEAD  The MEAD summarizer consists of three components. 1. the feature extractor, each sentence in the input document is converted into a feature vector using the user-defined features. 2. the combiner converted the feature vector to a scalar value. 3. the reranker, the scores for sentences included in related pairs are adjusted upwards or downwards based on the type of relation between the sentences in the pair.

Results and discussion  We include two baselines for each data set: random indicates a method where we have picked random sentences from the cluster to produce a summary. lead-based is using only the Position feature without any centrality method.

Results for Task 2

Results for Task 4

Results and discussion  The results provide strong evidence that Degree and LexPageRank are better than Centroid in multi-document text summarization.  The effect of threshold: most of the top ROUGE scores belong to the runs with the threshold 0.1, and the runs with threshold 0.3 are worse than the others most of the time.  This is due to the information loss in the similarity graphs as we move to higher thresholds.

Conclusion  Constructing the similarity graph of sentences provides us with a better view of important sentences.  We have introduced two different methods, Degree and LexPageRank, for computing prestige in similarity graphs.  The results of applying these methods on extractive summarization is quite promising.