Presented by Nick Janus LexRank: Graph-based Lexical Centrality as Salience in Text Summarization Presented by Nick Janus
Background Text summarization Intuitively, what sentences do we want? Extractive summarization Chooses subset of the original document's sentences Better results Abstractive summarization Complicated - requires semantic inference and language generation Often uses extractive summarization as a pre- processor Intuitively, what sentences do we want? This presentation is a mix of both!
Problem Statement Multi-document text summarization Documents mostly share unknown topic Cluster of documents represented by a network Nodes in center are more salient to topic How are edges defined? How is centrality computed? Topological clustering of documents is often noisy
Degree Centrality Top degree nodes are the most important Cosine similarity is used to calculate edge weights: A threshold is used to eliminate insignificant relationships Use a bag of words model with N words Each sentence/node is encoded as a N-dimensional vector - covariance matrix Results in undirected graph
Degree Centrality Example The threshold has a considerable impact on graph structure and ranking.
LexRank with threshold So far, all nodes hold equal votes More important sentences should have greater centrality (p()): The vectors of each sentence form a stochastic matrix To guarantee that the matrix converges to a stationary distribution, use a damping factor d, giving us PageRank: For a graph to converge it must be irreducible and aperiodic
LexRank with threshold (cont.) So how do we calculate LexRank? Algorithm redux: Get the same cosine covariance matrix as before Bin the values of the covariance matrix with the threshold Regularize each value with neighbor degrees Apply the power method so that the covariance matrix converges The power method returns an eigenvector which contains the scores of all the sentences where U is a square matrix with all elements being equal to 1/N. The transition kernel [dU + (1 − d)B] of the resulting Markov chain is a mixture of two kernels U and B. A random walker on this Markov chain chooses one of the adjacent states of the current state with probability 1− d, or jumps to any state in the graph, including the current state, with probability d.
Continuous LexRank What’s the problem with LexRank? Discretizes node probabilities with a Throws out information about sentences Instead keep the data and use cosine similarity values, normalized to form a stochastic matrix:
Performance Baseline: Centroid-based summarizer Evaluation setting: Compares sentences with a centroid meta-sentence containing high idf scoring words from document. Evaluation setting: Implemented with MEAD Summarization toolkit Document Understanding Conference data sets Model summaries and document clusters Rouge metric Measures unigram co-occurence
curated sets DUC Data Sets
17% noisy Noisy Data Sets
Summing Up Centrality methods may be more resilient in the when dealing with noisy document sets Multi-document case Difficult evaluation Better performance Relation to PageRank