CS224N: Query Focused Multi-Document Summarization Surabhi Gupta Mayukh Bhaowal Konstantin Davydov
Problem A set of documents for a particular query. Goal: Create a summary that best answers the query. First step: Find relevant sentences to the query from the input set of documents. Second step: Construct a summary using these sentences.
Sentence Weighting We go through all the sentences. Weight of each sentence j: Weight computed using Frequency TFIDF: term frequency inverse document frequency
Clustering 25-50 documents => redundancy Cluster the sentences based on similarity Unigram Sentence alignment Put “best” sentence from each cluster in the summary
Results using ROUGE-1 TFIDF with C2 performs best (38.8%; best DUC system had a score of 45.85%) C1: clustering using unigram C2: clustering using sentence alignment
Query Expansion Try to expand the query by adding more words which are relevant to the original query. Train a logistic regression model with features: Wordnet similarity Part of speech Location within document Results were not satisfactory, but we plan to use better features such as co-occurrence with query terms, distributional similarity.