Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.

Slides:



Advertisements
Similar presentations
Predicting sentence specificity, with applications to news summarization Ani Nenkova, joint work with Annie Louis University of Pennsylvania.
Advertisements

Yansong Feng and Mirella Lapata
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Ani Nenkova Lucy Vanderwende Kathleen McKeown SIGIR 2006.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Random Forest Predrag Radenković 3237/10
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Automatic summarization Dragomir R. Radev University of Michigan
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, International Computer Science.
Evaluating Search Engine
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
1 I256: Applied Natural Language Processing Marti Hearst Oct 2, 2006.
Rutgers’ HARD Track Experiences at TREC 2004 N.J. Belkin, I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. L. Smith, Y. Sun, X.-J. Yuan,
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
1 Multi-document Summarization and Evaluation. 2 Task Characteristics  Input: a set of documents on the same topic  Retrieved during an IR search 
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
Search and Information Extraction Lab IIIT Hyderabad.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
1 Towards Automated Related Work Summarization (ReWoS) HOANG Cong Duy Vu 03/12/2010.
1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
1 Reference Julian Kupiec, Jan Pedersen, Francine Chen, “A Trainable Document Summarizer”, SIGIR’95 Seattle WA USA, Xiaodan Zhu, Gerald Penn, “Evaluation.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Automatic Evaluation of Linguistic Quality in Multi- Document Summarization Pitler, Louis, Nenkova 2010 Presented by Dan Feblowitz and Jeremy B. Merrill.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Language Identification and Part-of-Speech Tagging
Notes on Logistic Regression
Semantic Processing with Context Analysis
Presentation transcript:

Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011

 Texts are a mix of general and specific sentences  Recently, we have developed a classifier that can distinguish general vs. specific sentences  The notion of specificity could be useful for a number of applications In this work, we consider automatic summarization Summaries cannot include all specific content because of the space constraint Understand the role of general/specific content in summaries and how it impacts quality Specificity: amount of detail 2

 Seismologists said the volcano had plenty of built-up magma and even more severe eruptions could come later. [overview]  The volcano's activity -- measured by seismometers detecting slight earthquakes in its molten rock plumbing system -- is increasing in a way that suggests a large eruption is imminent, Lipman said. [details] Example general and specific sentences 3

Prior studies of general-specific content in summaries  Humans use generalization and specification of source sentences to create abstract sentences  One generation task is to fuse information from key (general) sentence and specific sentence on the same topic to create an abstract sentence  Subtitles of news broadcasts are often generalized compared to original text 4 [Jing & McKeown (2000)] [Wan et al. (2008)] [Marsi et al. (2010)]

Overview of our study  Quantitative analysis of specificity in inputs and summaries using a general/specific classifier 1. Human abstracts have much more general content than system extracts 2. Amount of specific content is related to content quality of system summaries More general ~ better 3. Preliminary study on properties of summary- worthy general sentences 5

Data: DUC 2002  Generic multidocument summarization task  59 input sets 5 to 15 news documents  3 types of summaries 200 words Manually assigned content and linguistic quality scores 1. Human abstracts 6 2. Human extracts 3. System extracts 2 assessors * 59 9 systems * 59

General vs. specific sentence classifier: prior work 7  Sentence level  Features 1. Words 2. Named entities, numbers 3. Likelihood under language model 4. Word specificity 5. Adjectives/adverbs, length of phrases 6. Polar words 7. Sentence length  Training Binary: General or specific Logistic regression: can get probability for a class [Louis & Nenkova (2011)]

Classification performance 8  75% accurate Validated on human annotations On examples with high annotator agreement – 90%  The probability is indicative of annotator agreement on class Sentences with high agreement ~ high confidence predictions

Computing specificity for a text  Sentences in summary are of varying length, so we compute a score on word level “Average specificity of words in the text” 9 S1:S1:w 12 w 11 …w 13 S2:S2:w 22 w 21 …w 23 S3:S3:w 32 w 31 …w 33 Confidence for being in specific class Average score on tokens Specificity score

Average specificity of different types of summaries 1. More general content is preferred in abstracts 2. Simply the process of extraction makes summaries more specific 3. System summaries are overly specific Inputs (0.65) H. Abs (0.62) S.ext (0.74) H.ext (0.72) specific Is the difference related to summary quality? general

Analysis of ‘system summaries’: specificity and quality 1. Content quality Importance of content included in the summary More general ~ better 2. Linguistic quality How well-written the summary is perceived to be More specific ~ better 3. Quality of general/specific summaries When a summary is intended to be general or specific 11

1. Specificity and content quality  Coverage score: manually judged at NIST Similarity to a human summary  Correlation with specificity (p-value )  More specific ~ decreased content quality 12

But the correlation is not very high  Specificity is related to realization of content Different from importance of the content  Content quality = content importance + appropriate specificity level  Content importance: ROUGE scores N-gram overlap of system summary and human summary Standard evaluation of automatic summaries 13

System summary quality: Specificity as one of the predictors  Coverage score ~ ROUGE-2 (bigrams) + specificity  Linear regression  Weights for predictors in the regression model 14 Mean β Significance (hypothesis β = 0) (Intercept) e-11 ROUGE < 2.0e-16 Specificity e-05 Is the combination a better predictor than ROUGE alone?

2. Specificity and linguistic quality  Used different data: TAC 2009 DUC 2002 only reported number of errors Were also specified as a range: 1-5 errors  TAC 2009 linguistic quality score Manually judged: scale 1 – 10 Combines different aspects  coherence, referential clarity, grammaticality, redundancy 15

System summaries: What is the avg specificity in different score categories?  More general ~ lower score! General content is useful but need proper context! 16 Ling scoreNo. summaries Poor (1, 2)202 Mediocre (5)400 Best (9, 10)79  If a summary starts as follows: “We are quite a ways from that, actually.” As ice and snow at the poles melt, … Specificity = low Linguistic quality = 1 Average specificity

3. Specificity and quality of general/specific summaries  DUC 2005: General-specific summary task Create general summaries for some inputs, specific summaries for others  How specificity is related to scores of these summaries? 17

System summaries: Correlation between specificity and content scores  Further hints that specificity alone is not predictive of summary quality Once a summary is general, level of generality is not longer predictive of quality 18 Summary type Pearson correlation General Specific0.18* Content scores were measured using the pyramid method

Analysis of general sentences in human summaries 1. Generalization operation performed in human abstracts Frequency of operations, amount of deletions 2. How general sentences are used in human extracts Position, type of sentence 19

Data for analysing generalization operation  Aligned pairs of abstract and source sentences conveying the same content Traditional data used for compression experiments  Ziff-Davis corpus sentence pairs used in Galley & McKeown, 2007 Any number of deletions, up to 7 substitutions  Only 25% abstract sentences are mapped But beneficial to observe the trends 20

Generalization operation in human abstracts Transition SS SG GG GS 21 One-third of all transformations are specific to general  Human abstracts involve a lot of generalization No. pairs% pairs

How specific sentences get converted to general? SG SS GG GS 22 Orig. length New/orig length Avg. deletions (words) Choose long sentences and compress heavily!  A measure of generality would be useful to guide compression Currently only importance and grammaticality are used

Use of general sentences in human extracts  Details of Maxwell’s death were sketchy.  Folksy was an understatement.  “Long live democracy!”  Instead it sank like the Bismarck.  Example use of a general sentence in a summary … With Tower’s qualifications for the job, the nominations should have sailed through with flying colors. [Specific] Instead it sank like the Bismarck. [General] … …

Simple categorization  75 top general sentences according to classifier confidence 24 Type First sentence Last sentence Attributions Comparisons  General sentences are used as topic/ emphasis sentences Proportion 6 (8%) 13 (17%) 14 (18%) 4 (5%)

Conclusion  General sentences are useful content for summaries People use them in summaries for emphasis and topic  They can improve the content quality Choosing good general sentences or generating them will be an interesting task  But linguistic quality should also be considered General sentences difficult to understand out of context Content planning should consider the order of general content

Thank you 26

Histogram of specificity scores