Download presentation
Presentation is loading. Please wait.
1
Graph-based Text Summarization
Lin Ziheng NUS WING Group Meeting
2
Aims Build a graph that models the development (for writers) and consumption (for readers) of ideas in text through time Use rhetorical relations to help in recognizing the important sentences in text NUS WING Group Meeting
3
Random Walk Depends on current state Convergence Google PageRank:
1 2 3 Depends on current state Convergence Google PageRank: 4 5 0<d<1, usually d = 0.85 NUS WING Group Meeting
4
Citation Network New papers can cite old papers
Old papers are not updated New paper Old papers NUS WING Group Meeting
5
The Internet A new page must have at least one incoming link, may link to existing pages Old pages can update their links New page Old pages NUS WING Group Meeting
6
Graph-based summarization: LexRank
Nodes = sentences Edges = cosine similarity Fully connected Undirected NUS WING Group Meeting
7
Graph-based summarization: TextRank
Nodes = sentences Edges = similarity Backward links Directed s1 s4 s2 New sentence Old sentences s3 NUS WING Group Meeting
8
Writing/Reading Process
Assumption Readers read from the beginning towards the end Writers write from the beginning towards the end NUS WING Group Meeting
9
Blog Network NUS WING Group Meeting
10
Building Graph Out degree: prop. to how long the sent. stays in the graph (e.g., 1st:3, 2nd:2, 3rd:1) In degree: importance Edges: cosine, co-occurrence, longest common subsequence, etc.. NUS WING Group Meeting
11
doc1 doc2 doc3 NUS WING Group Meeting
12
Sentence Extraction In degree Run PageRank Unbiased
Biased towards query d1s1: 2 d2s1: 3 d3s1: 3 d1s2: 1 d2s2: 4 d3s2: 0 d1s3: 4 d2s3: 1 d3s3: 0 NUS WING Group Meeting
13
Evaluation 1 Dataset: Duc’04 task 2 NUS WING Group Meeting in degree
pagerank LexRank t = 1 t = 0.9 t = 0.7 t = 0.5 t = 0.3 t = 0.2 t = 0.1 node start rank=1 rank=cosine ROUGE-1 R avg ROUGE-1 P avg ROUGE-1 F avg ROUGE-2 R avg ROUGE-2 P avg ROUGE-2 F avg ROUGE-L R avg ROUGE-L P avg ROUGE-L F avg NUS WING Group Meeting
14
Evaluation 2 Dataset: Duc’06 Unbiased / Biased Rearranging doc length
# outlinks per sent per timestep ROUGE-2 ROUGE-SU4 Unbiased no 1 yes 2 5 10 NUS WING Group Meeting
15
Conclusion from Evaluation 2
Duc’06 is query-based, so biased PageRank gives better results Rearranging doc length is not necessary if there is no extremely long document in the cluster #outlinks is important, different #outlinks gives different inlink density. We need to look at how the dimension of the graph (D * L) is related to the inlink density F(D, L) => #outlinks NUS WING Group Meeting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.