LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Chapter 5: Introduction to Information Retrieval
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Graph-based Text Summarization
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Link Analysis, PageRank and Search Engines on the Web
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
HCC class lecture 22 comments John Canny 4/13/05.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Using Social Networking Techniques in Text Mining Document Summarization.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Text Summarization 黄连恩
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
LexRank: Graph-based Centrality as Salience in Text Summarization
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
LexRank: Graph-based Centrality as Salience in Text Summarization
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
MEAD 3.09 A platform for multidocument multilingual text summarization University of Michigan, Smith College, Columbia University University of Pennsylvania,
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
Rohit Yaduvanshi Anurag Meena Yogendra Singh Dabi research and development on the automated creation of summaries of one or more texts.
Algorithmic Detection of Semantic Similarity WWW 2005.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization Ziheng Lin and Min-Yen Kan Department of Computer Science National University.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Extractive Summarisation via Sentence Removal: Condensing Relevant Sentences into a Short Summary Marco Bonzanini, Miguel Martinez-Alvarez, and Thomas.
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
NUS at DUC 2007: Using Evolutionary Models of Text Ziheng Lin, Tat-Seng Chua, Min-Yen Kan, Wee Sun Lee, Long Qiu and Shiren Ye Department of Computer Science.
Content Selection: Topics, Graphs, & Supervision
HITS Hypertext-Induced Topic Selection
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
PageRank algorithm based on Eigenvectors
Approximating the Community Structure of the Long Tail
Information retrieval and PageRank
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
CS224N: Query Focused Multi-Document Summarization
Presented by Nick Janus
Presentation transcript:

LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan ACL 2004

2/22 Abstract This paper consider an approach for computing sentence importance based on the concept of eigenvector centrality (prestige) – LexPageRank In this model, a sentence connectivity matrix is constructed based on cosine similarity The experimental results using DUC2004 show that this approach outperforms centroid-based summarization and is quite successful compared to other summarization systems

3/22 Introduction Text summarization is the process of automatically creating a compressed version of a given text that provides useful information for the user This summarization approach is to assess the centrality of each sentence in a cluster and include the most important ones in the summary –Introduce two new measures for centrality, Degree and LexPageRank, inspired from the prestige concept in social networks

4/22 Sentence centrality and centroid- based summarization Extractive summarization produces summaries by choosing a subset of the sentences in the original documents Centrality of a sentence is often defined in terms of the centrality of the words that it contains The centroid of a cluster is a psuedo-document which consists of words that have frequency*IDF scores above a predefined threshold In centroid-based summarization (Radevet et al., 2000), the sentences that contain more words from the centroid of the cluster are considered central –Centroid-based summarization has given promising results in the past

5/22 Prestige-based sentence centrality We hypothesize that the sentences that are similar to many of the other sentences in a cluster are more central (or prestigious) to the topic There are two issues –How to define similarity between two sentences Cosine –How to compute the overall prestige of a sentence given its similarity to other sentences Degree centrality Eigenvector centrality and LexPageank

6/22 Prestige-based sentence centrality A cluster may be represented by a cosine similarity matrix

7/22 Prestige-based sentence centrality Most of them are nonzero

8/22 Prestige-based sentence centrality Degree centrality –Since we are interested in significant similarities in the matrix, we can eliminate some low values by defining a threshold, so that the cluster can be view as an undirected graph –We define degree centrality as the degree of each node in the similarity graph

9/22 Prestige-based sentence centrality

10/22 Prestige-based sentence centrality

11/22 Prestige-based sentence centrality Issue for degree centrality –Several unwanted sentences vote for each and raise their prestige –This situation can be avoided by considering where the votes come from and taking the prestige of the voting node into account in weight each node Eigenvector centrality and LexPageRank –PageRank (Page et al., 1998) is a method propose for assigning a prestige score to each page in the web independent of a specific query Depending on the number of pages that link to that pages as well as the individual score of the linking pages

12/22 Prestige-based sentence centrality The PageRank of Page A This recursively defined value can be computed by forming the binary adjacency matrix of the web, normalizing this matrix so that row sums equal to 1, and finding the principal eigenvector of the normalized matrix PageRank for ith pages equals to the ith entry in the eigenvector T 1,…,T n : pages that link to page A d: damping factor, C(T i ): the number of outgoing links from page T i

13/22 Prestige-based sentence centrality This method can be easily applied to the cosine similarity graph to find the most prestigious sentences in a document We called this new measure of sentence similarity LexPageRank

14/22 Prestige-based sentence centrality damping factor = 1

15/22 Prestige-based sentence centrality Advantage over Centroid –It accounts for information subsumption among sentences –It prevents unnaturally high IDF scores from boosting up the score of a sentence that is unrelated to the topic

16/22 Experiments on DUC 2004 data DUC 2004 data was used in our experiments Task 2 involves summarization of 50 TDT English clusters Task 4 is to produce summaries of machine translation output (in English) of 24 Arabic TDT documents Recall-based measure – Rouge is adopted and 665-byte summaries for each cluster are produced

17/22 Experiments on DUC 2004 data MEAD summarization toolkit –Extractive multi-document summarization –Consist of three components Feature extractor (document -> feature vector) –Centroid, Position and Length Combiner (feature vector -> scalar value) Reranker (the scores are adjusted upward or downward) –MMR (Maximum Margin Relevance), CSIS (Cross-Sentence Information Subsumption) weight Threshold

18/22 Experiments on DUC 2004 data Centroid

19/22 Experiments on DUC 2004 data

20/22 Experiments on DUC 2004 data

21/22 Experiments on DUC 2004 data

22/22 Conclusions A novel approach to define sentence centrality based on graph-based prestige scoring of sentences We have introduced two different methods, Degree and LexPageRank, for computing prestige in similarity graph The experimental results is quite promising Even the simplest approach, degree centrality, is good enough heuristic to perform better than lead-based and centroid-based summaries