The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Ani Nenkova Lucy Vanderwende Kathleen McKeown SIGIR 2006.
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Improved TF-IDF Ranker
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.
Information Retrieval Models: Probabilistic Models
The ICSI Summarization System Dan Gillick, Benoit Favre, and Dilek Hakkani-Tür {dgillick, favre, International Computer Science.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.
Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 Multi-document Summarization and Evaluation. 2 Task Characteristics  Input: a set of documents on the same topic  Retrieved during an IR search 
Learning to Rank for Information Retrieval
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Text Classification, Active/Interactive learning.
Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
Search and Information Extraction Lab IIIT Hyderabad.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Vector Space Models.
1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Statistical NLP Spring 2010 Lecture 22: Summarization Dan Klein – UC Berkeley Includes slides from Aria Haghighi, Dan Gillick.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Automatic Labeling of Multinomial Topic Models
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Web News Sentence Searching Using Linguistic Graph Similarity
CRF &SVM in Medication Extraction
John Frazier and Jonathan perrier
CS224N: Query Focused Multi-Document Summarization
Presentation transcript:

The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007

DUC Main Task Results Automatic Evaluations (30 participants) Human Evaluations Did pretty well on both measures CriterionRankScore ROUGE ROUGE-SU CriterionRank Pyramid1= Content5=

Overview of P YTHY Linear sentence ranking model Learns to rank sentences based on: ROUGE scores against model summaries Semantic Content Unit (SCU) weights of sentences selected by past peers Considers simplified sentences alongside original sentences

Feature inventory Targets ROUGE Oracle Pyramid/ SCU ROUGE X 2 Ranking/ Training Model Sentences Simplified Sentences Docs PYTHY Training

Sentences Docs Feature inventory Simplified Sentences Docs Model PYTHY Testing Search Dynamic Scoring Summary

Sentence Simplification Extension of simplification method for DUC06 Provides sentence alternatives, rather than deterministically simplify a sentence Uses syntax-based heuristic rules Simplified sentences evaluated alongside originals In DUC 2007: Average new candidates generated: 1.38 per sentence Simplified sentences generated: 61% of all sents Simplified sentences in final output: 60% Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training

Sentence-Level Features SumFocus features: SumBasic ( Nenkova et al 2006 ) + Task focus cluster frequency and topic frequency only these used in MSR DUC06 Other content word unigrams: headline frequency Sentence length features (binary features) Sentence position features (real-valued and binary) N-grams (bigrams, skip bigrams, multiword phrases) All tokens (topic and cluster frequency) Simplified Sentences (binary and ratio of relative length) Inverse document frequency (idf) Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training

Pairwise Ranking Define preferences for sentence pairs Defined using human summaries and SCU weights Log-linear ranking objective used in training Maximize the probability of choosing the better sentence from each pair of comparable sentences Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training [Ofer et al. 03], [Burges et al. 05]

R OUGE Oracle Metric Find an oracle extractive summary the summary with the highest average ROUGE-2 and ROUGE-SU4 scores All sentences in the oracle are considered “better” than any sentence not in the oracle Approximate greedy search used for finding the oracle summary Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training

Pyramid-Derived Metric University of Ottawa SCU-annotated corpus (Copeck et al 06) Some sentences in 05 & 06 document collections are: known to contain certain SCUs known not to contain any SCUs Sentence score is sum of weights of all SCUs for un-annotated sentences, the score is undefined A sentence pair is constructed for training s 1 > s 2 iff w(s 1 ) >w(s 2 ) Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training Feature inventory

Model Frequency Metrics Based on unigram and skip bigram frequency Computed for content words only Sentence s i is “better” than s j if Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training Feature inventory

Combining multiple metrics From ROUGE oracle all sentences in oracle summary better than other sentences From SCU annotations sentences with higher avg SCU weights better From model frequency sentences with words occurring in models better Combined loss: adding the losses according to all metrics Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training Feature inventory Ranki ng Traini ng

Sentences Docs Feature inventory Simplified Sentences Docs Model PYTHY Testing Search Dynamic Scoring Summary

Dynamic Sentence Scoring Eliminate redundancy by re-weighting Similar to SumBasic (Nenkova et al 2006), re- weighting given previously selected sentences Discounts for features that decompose into word frequency estimates Search Dynamic Scoring

Search The search constructs partial summaries and scores them: The score of a summary does not decompose into an independent sum of sentence scores Global dependencies make exact search hard Used multiple beams for each length of partial summaries [McDonald 2007] Search Dynamic Scoring

Impact of Sentence Simplification No SimplifiedSimplified R-2R-SU4R-2R-SU4 SumFocus PYTHY Trained on 05 data, tested on O6 data

Impact of Sentence Simplification No SimplifiedSimplified R-2R-SU4R-2R-SU4 SumFocus PYTHY Trained on 05 data, tested on O6 data

Impact of Sentence Simplification No SimplifiedSimplified R-2R-SU4R-2R-SU4 SumFocus PYTHY Trained on 05 data, tested on O6 data

Evaluating the Metrics CriterionNum Pairs Train AccContent OnlyAll Words R-2R-SU4R-2R-SU4 Oracle941K SCUs430K Model Freq.6.3M All7.7M Trained on 05 data, tested on 06 data Includes simplified sentences

Evaluating the Metrics CriterionNum Pairs Train AccContent OnlyAll Words R-2R-SU4R-2R-SU4 Oracle941K SCUs430K Model Freq.6.3M All7.7M Trained on 05 data, tested on 06 data Includes simplified sentences

Update Summarization Pilot SVM novelty classifier trained on TREC 02 & 03 novelty track ROUGE 2ROUGE-SU4 PYTHY + Novelty (1) PYTHY + Novelty (.5) PYTHY + Novelty (.1) PYTHY SumFocus

Summary and Future Work Summary Combination of different target metrics for training Many sentence features Pair-wise ranking function Dynamic scoring Future work Boost robustness Sensitive to cluster properties (e.g., size) Improve grammatical quality of simplified sentences Reconcile novelty and (ir)relevance Learn features over whole summaries rather than individual sentences

Thank You