Graph-based Text Summarization

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

Markov Models.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Link Analysis: PageRank
Power Laws: Rich-Get-Richer Phenomena
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
Lecture 9 Measures and Metrics. Structural Metrics Degree distribution Average path length Centrality Degree, Eigenvector, Katz, Pagerank, Closeness,
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
PageRank Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 27, 2014.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
HCC class lecture 22 comments John Canny 4/13/05.
HCC class lecture 14 comments John Canny 3/9/05. Administrivia.
Using Social Networking Techniques in Text Mining Document Summarization.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
WEB SCIENCE: ANALYZING THE WEB. Graph Terminology Graph ~ a structure of nodes/vertices connected by edges The edges may be directed or undirected Distance.
CS246 Web Characteristics. Junghoo "John" Cho (UCLA Computer Science)2 Web Characteristics What is the Web like? Any questions on some of the characteristics.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 8: Information Retrieval II Aidan Hogan
Research Paper Grading Made Easy Underline your one sentence thesis statement. Circle the part of your paper that draws a conclusion “about how studying.
Link Analysis Hongning Wang
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
LexRank: Graph-based Centrality as Salience in Text Summarization
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
LexRank: Graph-based Centrality as Salience in Text Summarization
Measuring the Size of the Web Dongwon Lee, Ph.D. IST 501, Fall 2014 Penn State.
2010 © University of Michigan 1 DivRank: Interplay of Prestige and Diversity in Information Networks Qiaozhu Mei 1,2, Jian Guo 3, Dragomir Radev 1,2 1.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
COMS Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Ranking Link-based Ranking (2° generation) Reading 21.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture IX: 2014/05/05.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Timestamped Graphs: Evolutionary Models of Text for Multi-document Summarization Ziheng Lin and Min-Yen Kan Department of Computer Science National University.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Link Analysis Hongning Wang Standard operation in vector space Recap: formula for Rocchio feedback Original query Rel docs Non-rel docs Parameters.
Micro Topics, Subtopics & Argument in Miel-Con Paragraphs.
Micro Topics, Subtopics & Argument in Miel-Con Paragraphs.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
“Important” Vertices and the PageRank Algorithm Networked Life NETS 112 Fall 2014 Prof. Michael Kearns.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
DeepWalk: Online Learning of Social Representations
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
NUS at DUC 2007: Using Evolutionary Models of Text Ziheng Lin, Tat-Seng Chua, Min-Yen Kan, Wee Sun Lee, Long Qiu and Shiren Ye Department of Computer Science.
CPS : Information Management and Mining
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Generative Model To Construct Blog and Post Networks In Blogosphere
John Frazier and Jonathan perrier
The likelihood of linking to a popular website is higher
Graph and Link Mining.
Presented by Nick Janus
Presentation transcript:

Graph-based Text Summarization Lin Ziheng NUS WING Group Meeting

Aims Build a graph that models the development (for writers) and consumption (for readers) of ideas in text through time Use rhetorical relations to help in recognizing the important sentences in text NUS WING Group Meeting

Random Walk Depends on current state Convergence Google PageRank: 1 2 3 Depends on current state Convergence Google PageRank: 4 5 1 2 3 4 5 1 0 0.4 0.6 0 0 2 0 0 1 0 0 3 0 0 0 0 1 4 0.1 0 0.5 0 0.4 5 0 0.2 0 0.8 0 0<d<1, usually d = 0.85 NUS WING Group Meeting

Citation Network New papers can cite old papers Old papers are not updated New paper Old papers NUS WING Group Meeting

The Internet A new page must have at least one incoming link, may link to existing pages Old pages can update their links New page Old pages NUS WING Group Meeting

Graph-based summarization: LexRank Nodes = sentences Edges = cosine similarity Fully connected Undirected NUS WING Group Meeting

Graph-based summarization: TextRank Nodes = sentences Edges = similarity Backward links Directed s1 s4 s2 New sentence Old sentences s3 NUS WING Group Meeting

Writing/Reading Process Assumption Readers read from the beginning towards the end Writers write from the beginning towards the end NUS WING Group Meeting

Blog Network NUS WING Group Meeting

Building Graph Out degree: prop. to how long the sent. stays in the graph (e.g., 1st:3, 2nd:2, 3rd:1) In degree: importance Edges: cosine, co-occurrence, longest common subsequence, etc.. NUS WING Group Meeting

doc1 doc2 doc3 NUS WING Group Meeting

Sentence Extraction In degree Run PageRank Unbiased Biased towards query d1s1: 2 d2s1: 3 d3s1: 3 d1s2: 1 d2s2: 4 d3s2: 0 d1s3: 4 d2s3: 1 d3s3: 0 NUS WING Group Meeting

Evaluation 1 Dataset: Duc’04 task 2 NUS WING Group Meeting in degree pagerank LexRank t = 1 t = 0.9 t = 0.7 t = 0.5 t = 0.3 t = 0.2 t = 0.1 node start rank=1 rank=cosine ROUGE-1 R avg 0.3602626 0.3570972 0.3504308 0.3528222 0.3570688 0.3623468 0.3563554 0.3614034 0.3627054 0.3588200 ROUGE-1 P avg 0.4002014 0.3963142 0.3972844 0.3951052 0.3946780 0.3926562 0.3903444 0.3915736 0.3881828 0.3914098 ROUGE-1 F avg 0.3778134 0.3744442 0.3713722 0.3716442 0.3735710 0.3756256 0.3714446 0.3745956 0.3739770 0.3733106 ROUGE-2 R avg 0.0899096 0.0912164 0.0895932 0.0893618 0.0891864 0.0900720 0.0876280 0.0879926 0.0902010 0.0894858 ROUGE-2 P avg 0.1008002 0.1017430 0.1019968 0.1007348 0.0994878 0.0988606 0.0968300 0.0963064 0.0967940 0.0978864 ROUGE-2 F avg 0.0946892 0.0958694 0.0951186 0.0944158 0.0937042 0.0939454 0.0917214 0.0916382 0.0931182 0.0932100 ROUGE-L R avg 0.3104592 0.3091442 0.3030530 0.3051274 0.3089822 0.3141960 0.3079836 0.3123842 0.3162914 0.3123360 ROUGE-L P avg 0.3448404 0.3431476 0.3434746 0.3414388 0.3410792 0.3402590 0.3372768 0.3384562 0.3385456 0.3407320 ROUGE-L F avg 0.3255642 0.3241896 0.3211276 0.3212942 0.3230670 0.3256226 0.3209940 0.3237910 0.3261406 0.3249710 NUS WING Group Meeting

Evaluation 2 Dataset: Duc’06 Unbiased / Biased Rearranging doc length # outlinks per sent per timestep ROUGE-2 ROUGE-SU4 Unbiased no 1 0.07563 0.12738 yes 0.07042 0.12416 0.07513 0.13106 0.07752 0.13290 2 0.08181 0.13733 5 0.07789 0.13412 10 0.07799 0.13153 NUS WING Group Meeting

Conclusion from Evaluation 2 Duc’06 is query-based, so biased PageRank gives better results Rearranging doc length is not necessary if there is no extremely long document in the cluster #outlinks is important, different #outlinks gives different inlink density. We need to look at how the dimension of the graph (D * L) is related to the inlink density F(D, L) => #outlinks NUS WING Group Meeting