Download presentation
Presentation is loading. Please wait.
Published byCamron Maxwell Modified over 9 years ago
1
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research Center + Department of Electrical Engineering * Indian Institute of Science Bangalore-INDIA
2
Agenda Introduction to Text Summarization Need for summarization, types of summaries Evaluating Extract Summaries Challenges in manual and automatic evaluation Fuzzy Summary Evaluation Complexity Scores
3
What is Text Summarization Reductive transformation of source text to summary text by content generalization and/or selection Loss of information What can be lost and what should not be lost How much can be lost What is the size of the summary Types of Summaries Extracts and Abstracts Influence of genre on the performance of a summarization algorithm Newswire stories are favorable to sentence position
4
Need for Summarization Explosive growth in availability of digital textual data Books in digital libraries, mailing-list archives, on-line news portals Duplication of textual segments in books E.g.: 10 introductory books on quantum physics have a number of paragraphs common to all of them (syntactically different but semantically the same) Hand-held devices Small screens and limited memory Low power devices and hence limited processing capability E.g.: Stream a book from a digital library to a hand-held device Production of information is faster than consumption
5
Types of Summaries Extracts Text selection E.g: Paragraphs from books, sentences from editorials, phrases from e-mails Application of statistical techniques Abstracts Text selection followed by generalization Need for linguistic processing E.g.: Convert a sentence to a phrase Generic Summaries Independent of genre Indicative Summaries Gives a general idea as to the topic of discussion in the text being summarized Informational Summaries Serves as a surrogate to the original text
6
Evaluating Extract Summaries Manual evaluation Human judges are allowed to score a summary on a well defined scale based on a well defined criteria Subject to judge’s understanding of the subject Depends on judge’s opinions Guidelines constrain opinions Individual judges’ scores are combined to generate the final score Re-evaluation might result in different scores Logistic problems for researchers
7
Automatic Evaluation Machine-based evaluation Consistent over multiple runs Fast, avoids logistic problems Suitable for researchers experimenting with new algorithms Flip-side Not as accurate as human evaluation Should be used as precursor to a detailed human evaluation Algorithmically handles various sentence constructs and linguistic variants
8
Fuzzy Summary Evaluation: FuSE Proposing the use of Fuzzy union theory to quantify the similarity of two extract summaries Similarity between the reference (human generated) summary and candidate (machine generated) summary is evaluated Each sentence is a fuzzy set Each sentence in the reference summary has a membership grade in every sentence of the candidate machine generated summary Membership grade of a reference summary sentence in the candidate summary is the union of membership grades across all candidate summary sentences Use membership grades to compute an f-score value Membership grade is the hamming distance between two sentences based on collocations
9
Fuzzy F-score Candidate summary sentence set Union function Reference summary sentence set Membership grade of candidate sentence in reference sentence Fuzzy Precision Fuzzy Recall
10
Choice of Union operator Propose the use of Frank’s S-norm operator Allows combining partial matches non- linearly Membership grade of a sentence in a summary is dependent on its length Automatically includes brevity-bonus into the scheme
11
Frank’s S-norm operator Damping Coefficient Mean of non-zero membership grades for a sentence Sentence length Length of the longest sentence
12
Characteristics of Frank’s base
13
Performance of FuSE for various sentence lengths
14
Dictionary-enhanced Fuzzy Summary Evaluation:DeFuSE FuSE does not understand sentence similarity based on synonymy and hypernymy Identifying synonymous words makes evaluation more accurate Identifying hypernymous word relationships allows consideration of “gross information” during evaluation Note: Very deep hypernymy trees could result in topic drift and hence improper evaluation
15
Use of Word Net
16
Example: Use of hypernymy HURRICANE GILBERT DEVASTATED DOMINICAN REPUBLIC AND PARTS OF CUBA (PHYSICAL PHENOMENON) GILBERT (DESTROY,RUIN) (REGION) AND PARTS OF (REGION) TROPICAL STORM GILBERT DESTROYED PARTS OF HAVANA TROPICAL (PHYSICAL PHENOMENON) GILBERT DESTROYED PARTS OF (REGION)
17
Complexity Score Attempts to quantify the summarization algorithm based on the difficulty in generating a summary of a particular accuracy Generating a 9 sentence summary from a 10 sentence document is very easy. An algorithm which randomly selects 9 sentences will have a worst case accuracy of 90% A complicated AI+NLP based algorithm cannot do any better If a 2 sentence summary is to be generated from a 10 sentence document, we have 45 possible candidates out of which one is accurate
18
Computing Complexity Score Probability of generating a summary of a length m1 with accurate sentences l 1 when human summary has h sentences and the document being summarized has n sentences
19
Complexity Score (Cont..) To compare two summaries of equal length the performance of one relative to the baseline is given by
20
Complexity Score (Cont..) Complexity in generating a 10% extract with 12 correct sentences is higher than generating a 30% extract with 12 correct sentences
21
Conclusion Summary evaluation is as complicated as summary generation Fuzzy schemes are ideal for evaluating extract summaries Use of synonymy and hypernymy relations improve evaluation accuracy Complexity score is a new way of looking at summary evaluation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.