Download presentation
Presentation is loading. Please wait.
Published byWilfred Miller Modified over 8 years ago
1
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007
2
DUC Main Task Results Automatic Evaluations (30 participants) Human Evaluations Did pretty well on both measures CriterionRankScore ROUGE-220.12028 ROUGE-SU430.17074 CriterionRank Pyramid1= Content5=
3
Overview of P YTHY Linear sentence ranking model Learns to rank sentences based on: ROUGE scores against model summaries Semantic Content Unit (SCU) weights of sentences selected by past peers Considers simplified sentences alongside original sentences
4
Feature inventory Targets ROUGE Oracle Pyramid/ SCU ROUGE X 2 Ranking/ Training Model Sentences Simplified Sentences Docs PYTHY Training
5
Sentences Docs Feature inventory Simplified Sentences Docs Model PYTHY Testing Search Dynamic Scoring Summary
6
Sentence Simplification Extension of simplification method for DUC06 Provides sentence alternatives, rather than deterministically simplify a sentence Uses syntax-based heuristic rules Simplified sentences evaluated alongside originals In DUC 2007: Average new candidates generated: 1.38 per sentence Simplified sentences generated: 61% of all sents Simplified sentences in final output: 60% Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training
7
Sentence-Level Features SumFocus features: SumBasic ( Nenkova et al 2006 ) + Task focus cluster frequency and topic frequency only these used in MSR DUC06 Other content word unigrams: headline frequency Sentence length features (binary features) Sentence position features (real-valued and binary) N-grams (bigrams, skip bigrams, multiword phrases) All tokens (topic and cluster frequency) Simplified Sentences (binary and ratio of relative length) Inverse document frequency (idf) Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training
8
Pairwise Ranking Define preferences for sentence pairs Defined using human summaries and SCU weights Log-linear ranking objective used in training Maximize the probability of choosing the better sentence from each pair of comparable sentences Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training [Ofer et al. 03], [Burges et al. 05]
9
R OUGE Oracle Metric Find an oracle extractive summary the summary with the highest average ROUGE-2 and ROUGE-SU4 scores All sentences in the oracle are considered “better” than any sentence not in the oracle Approximate greedy search used for finding the oracle summary Feature inventory Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training
10
Pyramid-Derived Metric University of Ottawa SCU-annotated corpus (Copeck et al 06) Some sentences in 05 & 06 document collections are: known to contain certain SCUs known not to contain any SCUs Sentence score is sum of weights of all SCUs for un-annotated sentences, the score is undefined A sentence pair is constructed for training s 1 > s 2 iff w(s 1 ) >w(s 2 ) Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training Feature inventory
11
Model Frequency Metrics Based on unigram and skip bigram frequency Computed for content words only Sentence s i is “better” than s j if Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training Feature inventory
12
Combining multiple metrics From ROUGE oracle all sentences in oracle summary better than other sentences From SCU annotations sentences with higher avg SCU weights better From model frequency sentences with words occurring in models better Combined loss: adding the losses according to all metrics Targets ROUGE Oracle Pyramid / SCU ROUGE X 2 Ranki ng Traini ng Model Sentences Simplified Sentences Do cs PYTHY Training Feature inventory Ranki ng Traini ng
13
Sentences Docs Feature inventory Simplified Sentences Docs Model PYTHY Testing Search Dynamic Scoring Summary
14
Dynamic Sentence Scoring Eliminate redundancy by re-weighting Similar to SumBasic (Nenkova et al 2006), re- weighting given previously selected sentences Discounts for features that decompose into word frequency estimates Search Dynamic Scoring
15
Search The search constructs partial summaries and scores them: The score of a summary does not decompose into an independent sum of sentence scores Global dependencies make exact search hard Used multiple beams for each length of partial summaries [McDonald 2007] Search Dynamic Scoring
16
Impact of Sentence Simplification No SimplifiedSimplified R-2R-SU4R-2R-SU4 SumFocus0.0780.1320.0780.134 PYTHY0.0890.1400.0960.147 Trained on 05 data, tested on O6 data
17
Impact of Sentence Simplification No SimplifiedSimplified R-2R-SU4R-2R-SU4 SumFocus0.0780.1320.0780.134 PYTHY0.0890.1400.0960.147 Trained on 05 data, tested on O6 data
18
Impact of Sentence Simplification No SimplifiedSimplified R-2R-SU4R-2R-SU4 SumFocus0.0780.1320.0780.134 PYTHY0.0890.1400.0960.147 Trained on 05 data, tested on O6 data
19
Evaluating the Metrics CriterionNum Pairs Train AccContent OnlyAll Words R-2R-SU4R-2R-SU4 Oracle941K93.10.0760.1070.093 0.143 SCUs430K62.00.0780.1080.0860.134 Model Freq.6.3M96.9 0.0760.1060.0960.147 All7.7M94.20.0760.1070.0960.147 Trained on 05 data, tested on 06 data Includes simplified sentences
20
Evaluating the Metrics CriterionNum Pairs Train AccContent OnlyAll Words R-2R-SU4R-2R-SU4 Oracle941K93.10.0760.1070.093 0.143 SCUs430K62.00.0780.1080.0860.134 Model Freq.6.3M96.9 0.0760.1060.0960.147 All7.7M94.20.0760.1070.0960.147 Trained on 05 data, tested on 06 data Includes simplified sentences
21
Update Summarization Pilot SVM novelty classifier trained on TREC 02 & 03 novelty track ROUGE 2ROUGE-SU4 PYTHY + Novelty (1)0.071350.11164 PYTHY + Novelty (.5)0.078790.12929 PYTHY + Novelty (.1)0.087210.12958 PYTHY0.086860.12876 SumFocus0.070020.11033
22
Summary and Future Work Summary Combination of different target metrics for training Many sentence features Pair-wise ranking function Dynamic scoring Future work Boost robustness Sensitive to cluster properties (e.g., size) Improve grammatical quality of simplified sentences Reconcile novelty and (ir)relevance Learn features over whole summaries rather than individual sentences
23
Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.