Automatic summarization Dragomir R. Radev University of Michigan
Outline What is summarization Genres of summarization (Single-doc, Multi- doc, Query-based, etc.) Extractive vs. non-extractive summarization Evaluation metrics Current systems –Marcu/Knight –MEAD/Lemur –NewsInEssence/NewsBlaster What is possible and what is not
Goal of summarization Preserve the “most important information” in a document. Make use of redundancy in text Maximize information density Compression Ratio = |S| |D| Retention Ratio = i (S) i (D) Goal: i (S) i (D) |S| |D| >
Sentence-extraction based (SE) summarization Classification problem Approximation
Typical approaches to SE summarization Manually-selected features: position, overlap with query, cue words, structure information, overlap with centroid Reranking: maximal marginal relevance [Carbonell/Goldstein98]
Non-SE summarization Discourse-based [Marcu97] Lexical chains [Barzilay&Elhadad97] Template-based [Radev&McKeown98]
Evaluation metrics Intrinsic measures –Precision, recall –Kappa –Relative utility [Radev&al.00] –Similarity measures (cosine, overlap, BLEU) Extrinsic measures –Classification accuracy –Informativeness for question answering –Relevance correlation
Precision and recall Precision(J1)= Recall(J2)= Recall(J1)= Precision(J2)=
Kappa N: number of items (index i) n: number of categories (index j) k: number of annotators
Cosine Overlap Similarity measures
Relevance correlation (RC)
Properties of evaluation metrics
Case study Multi-document News User-centered NewsInEssence [HLT 01] NewsBlaster [HLT02]
Web resources
Generative probabilistic models for summarization Wessel Kraaij TNO TPD
Summarization architecture What do human summarizers do? –A: Start from scratch: analyze, transform, synthesize (top down) –B: Select material and revise: “cut and paste summarization” (Jing & McKeown-1999) Automatic systems: –Extraction: selection of material –Revision: reduction, combination, syntactic transformation, paraphrasing, generalization, sentence reordering complexity Extracts Abstracts
Required knowledge
Examples of generative models in summarization systems Sentence selection Sentence / document reduction Headline generation
Ex. 1: Sentence selection Conroy et al (DUC 2001): HMM on sentence level, each state has an associated feature vector (pos,len, #content terms) Compute probability of being a summary sentence Kraaij et al (DUC 2001) Rank sentences according to posterior probability given a mixture model +Grammaticality is OK –Lacks aggregation, generalization, MDS
Ex. 2: Sentence reduction
Knight & Marcu (AAAI2000) Compression: delete substrings in an informed way (based on parse tree) –Required: PCFG parser, tree aligned training corpus –Channel model: probabilistic model for expansion of a parse tree –Results: much better than NP baseline +Tight control on grammaticality +Mimics revision operations by humans
Daumé & Marcu (ACL2002) Document compression, noisy channel –Based on syntactic structure and discourse structure (extension of Knight & Marcu model) –Required: Discourse & syntactic parsers –Training corpus where EDU’s in summaries are aligned with the documents –Cannot handle interesting document lengths (due to complexity)
Ex. 3: Headline generation
Berger & Mittal (sigir2000) Input: web pages (often not running text) –Trigram language model –IBM model 1 like channel model: Choose length, draw word from source model and replace with similar word, independence assumption – Trained on Open Directory +Non-extractive –Grammaticality and coherence are disappointing: indicative
Zajic, Dorr & Schwartz (duc2002) Headline generation from a full story: P(S|H)P(H) Channel model based on HMM consisting of a bigram model of headline words and a unigram model of story words, bigram language model Decoding parameters are crucial to produce good results (length, position, strings) +Good results in fluency and accuracy
Conclusions Fluent headlines within reach of simple generative models High quality summaries (coverage, grammaticality, coherence) require higher level symbolic representations Cut & paste metaphor divides the work into manageable sub-problems Noisy channel method effective, but not always efficient
Open issues Audience (user model) Types of source documents Dealing with redundancy Information ordering (e.g., temporal) Coherent text Cross-lingual summarization (Norbert Fuhr) Use summaries to improve IR (or CLIR) - relevance correlation LM for text generation Possibly not well-defined problem (low interjudge agreement) Develop models with more linguistic structure Develop integrated models, e.g. by using priors (Rosenfeld) Build efficient implementations Evaluation: Define a manageable task