Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.

Similar presentations


Presentation on theme: "1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006."— Presentation transcript:

1 1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006

2 2 From lecture notes by Nachum Dershowitz & Dan Cohen Contents Introduction and Applications Types of summarization tasks Basic paradigms Single document summarization Evaluation methods

3 3 From lecture notes by Nachum Dershowitz & Dan Cohen Introduction The problem – Information overload –4 Billion URLs indexed by Google –200 TB of data on the Web [Lyman and Varian 03] –Information is created every day in enormous amounts One solution – summarization Abstracts promote current awareness save reading time facilitate selection facilitate literature searches aid in the preparation of reviews But what is an abstract??

4 4 From lecture notes by Nachum Dershowitz & Dan Cohen abstract: brief but accurate representation of the contents of a document goal: take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs. compression: the amount of text to present or the length of the summary to the length of the source. Introduction

5 5 From lecture notes by Nachum Dershowitz & Dan Cohen The problem has been addressed since the 50’s[Luhn 58] Numerous methods are currently being suggested Most methods still rely on 50’s-70’s algorithms Problem is still hard yet there are some applications: MS Word, www.newsinessence.comwww.newsinessence.com by Drago Radev’s research group History

6 6 From lecture notes by Nachum Dershowitz & Dan Cohen

7 7 MSWord AutoSummarize

8 8 From lecture notes by Nachum Dershowitz & Dan Cohen Applications Abstracts for Scientific and other articles News summarization (mostly multiple document summarization) Classification of articles and other written data Web pages for search engines Web access from PDAs, Cell phones Question answering and data gathering

9 9 From lecture notes by Nachum Dershowitz & Dan Cohen Types of Summaries Indicative vs Informative Informative: a substitute for the entire document Indicative: give an idea of what is there Background Does the reader have the needed prior knowledge? Expert reader vs Novice reader Query based or General Query based – a form is being filled, answers should be answered General – General purpose summarization

10 10 From lecture notes by Nachum Dershowitz & Dan Cohen Types of Summaries (input) Single document vs multiple documents Domain specific (chemistry) or general Genre specific (newspaper items) of general

11 11 From lecture notes by Nachum Dershowitz & Dan Cohen Types of Summaries (output) extract vs abstract Extracts – representative paragraphs/sentences/ phrases/words, fragments of the original text Abstracts – a concise summary of the central subjects in the document. Research shows that sometimes readers prefer Extracts! language chosen for summarization format of the resulting summary (table/paragraph/key words)

12 12 From lecture notes by Nachum Dershowitz & Dan Cohen Methods Quantitative heuristics, manually scored Machine-learning based statistical scoring methods Higher semantic/syntactic structures Network (graph) based methods Other methods (rhetorical analysis, lexical chains, co-reference chains) AI methods

13 13 From lecture notes by Nachum Dershowitz & Dan Cohen Quantitative Heuristics General method: score each entity (sentence, word) ; combine scores; choose best sentence(s) Scoring techniques: Word frequencies throughout the text (Luhn 58) Position in the text (Edmunson 69, Lin&Hovy 97) Title method (Edmunson 69) Cue phrases in sentences (Edmunson 69)

14 14 From lecture notes by Nachum Dershowitz & Dan Cohen Using Word Frequencies (Luhn 58) Very first work in automated summarization Assumptions: Frequent words indicate the topic Frequent means with reference to the corpus frequency Clusters of frequent words indicate summarizing sentence Stemming based on similar prefix characters Very common words and very rare words are ignored

15 15 Ranked Word Frequency Zipf’s curve

16 16 From lecture notes by Nachum Dershowitz & Dan Cohen Find consecutive sequences of high- weight keywords Allow a certain number of gaps of low-weight terms Sentences with highest sum of cluster weights are chosen Word frequencies (Luhn 58)

17 17 From lecture notes by Nachum Dershowitz & Dan Cohen Claim : Important sentences occur in specific positions “lead-based” summary inverse of position in document works well for the “news” Important information occurs in specific sections of the document (introduction/conclusion) Position in the text (Edmunson 69)

18 18 From lecture notes by Nachum Dershowitz & Dan Cohen Claim : title of document indicates its content Unless editors are being cute Not true for novels usually What about blogs …? words in title help find relevant content create a list of title words, remove “stop words” Use those as keywords in order to find important sentences (for example with Luhn’s methods) Title method (Edmunson 69)

19 19 From lecture notes by Nachum Dershowitz & Dan Cohen method (Edmunson 69) Cue phrases method (Edmunson 69) Claim : Important sentences contain cue words/indicative phrases “The main aim of the present paper is to describe…” (IND) “The purpose of this article is to review…” (IND) “In this report, we outline…” (IND) “Our investigation has shown that…” (INF) Some words are considered bonus others stigma bonus: comparatives, superlatives, conclusive expressions, etc. stigma: negatives, pronouns, etc. Claim : Important sentences contain cue words/indicative phrases “The main aim of the present paper is to describe…” (IND) “The purpose of this article is to review…” (IND) “In this report, we outline…” (IND) “Our investigation has shown that…” (INF) Some words are considered bonus others stigma bonus: comparatives, superlatives, conclusive expressions, etc. stigma: negatives, pronouns, etc.

20 20 From lecture notes by Nachum Dershowitz & Dan Cohen Linear contribution of 4 features title, cue, keyword, position the weights are adjusted using training data with any minimization technique Evaluated on a corpus of 200 chemistry articles Length ranged from 100 to 3900 words Judges were told to extract 25% of the sentences, to maximize coherence, minimize redundancy. Features –Position (sensitive to types of headings for sections) –cue –title –keyword Best results obtained with: –cue + title + position Feature combination ( Edmundson ’69)

21 21 From lecture notes by Nachum Dershowitz & Dan Cohen Statistical learning method Feature set sentence length –|S| > 5 fixed phrases –26 manually chosen paragraph –sentence position in paragraph thematic words –binary: whether sentence is included in manual extract uppercase words –not common acronyms Corpus –188 document + summary pairs from scientific journals Bayesian Classifier (Kupiec at el 95)

22 22 From lecture notes by Nachum Dershowitz & Dan Cohen Uses Bayesian classifier: Assuming statistical independence: Bayesian Classifier (Kupiec at el 95)

23 23 From lecture notes by Nachum Dershowitz & Dan Cohen Each Probability is calculated empirically from a corpus Higher probability sentences are chosed to be in the summary Performance: For 25% summaries, 84% precision Bayesian Classifier (Kupiec at el 95)

24 24 From lecture notes by Nachum Dershowitz & Dan Cohen When a manual summary is available: 1. choose a granularity (clause; sentence; paragraph), 2. create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match), 3. measure the similarity of each unit in the new to the most similar unit(s) 4. measure Recall and Precision. Otherwise 1. Intrinsic –how good is the summary as a summary? 2. Extrinsic – how well does the summary help the user? Evaluation methods

25 25 From lecture notes by Nachum Dershowitz & Dan Cohen Intrinsic measures (glass-box): how good is the summary as a summary? Problem: how do you measure the goodness of a summary? Studies: compare to ideal (Edmundson, 69; Kupiec et al., 95; Salton et al., 97; Marcu, 97) or supply criteria— fluency, informativeness, coverage, etc. (Brandow et al., 95). Summary evaluated on its own or comparing it with the source Is the text cohesive and coherent? Does it contain the main topics of the document? Are important topics omitted? Intrinsic measures

26 26 From lecture notes by Nachum Dershowitz & Dan Cohen (Black box): how well does the summary help a user with a task? Problem: does summary quality correlate with performance? Studies: GMAT tests (Morris et al., 92); news analysis (Miike et al. 94); IR (Mani and Bloedorn, 97); text categorization (SUMMAC 98; Sundheim, 98). Evaluation in an specific task Can the summary be used instead of the document? Can the document be classified by reading the summary? Can we answer questions by reading the summary? Extrinsic measures

27 27 The Document Understanding Conference (DUC) This is really the Text Summarization Competition Started in 2001 Task and Evaluation (for 2001-2004): Various target sizes were used (10-400 words) Both single and multiple-document summaries assessed Summaries were manually judged for both content and readability. Each peer (human or automatic) summary was compared against a single model summary –using SEE (http://www.isi.edu/ cyl/SEE/) –estimates the percentage of information in the model thatwas covered in the peer. –Also used ROUGE (Lin ’04) in 2004  Recall-Oriented Understudy for Gisting Evaluation  Uses counts of n-gram overlap between candidate and gold-standard summary, assumes fixed-length summaries

28 28 The Document Understanding Conference (DUC) Made a big change in 2005 Extrinsic evaluation proposed but rejected (write a natural disaster summary) Instead: a complex question-focused summarization task that required summarizers to piece together information from multiple documents to answer a question or set of questions as posed in a DUC topic. Also indicated a desired granularity of information

29 29 The Document Understanding Conference (DUC) Evaluation metrics for new task: Grammaticality Non-redundancy Referential clarity Focus Structure and Coherence Responsiveness (content-based evaluation) This was a difficult task to do well in.

30 30 Let’s make a summarizer! Each person (or pair) write code for one small part of the problem, using Kupiec et al’s method. We’ll combine the parts in class.

31 31 Next Time More on Bayesian classification Other summarization approaches (Marcu paper) Multi-document summarization (Goldstein et al. paper) In-class summarizer!


Download ppt "1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006."

Similar presentations


Ads by Google