Download presentation
Presentation is loading. Please wait.
Published byJanel Rosalind Gibson Modified over 9 years ago
1
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute of Computing Technology, CAS
2
Outline Motivation Our Approach Experiments Conclusion
3
Outline Motivation Our Approach Experiments Conclusion
4
Motivation Summarization Extractive Summarization Experience-based Clustering-based Classification Ranking-based TMDS: The most important aspects of the given topic
5
Motivation TMDS RelevanceSalience Diversity Multi-criteria ranking problem
6
Motivation Previous ranking approaches –Metric-based: heuristically predefined –Graph-based: pairwise relationships, implicitly captured, high computing complexity
7
Learning-to-rank techniques Advantages –The capability of combining a large number of features –Automatic learning based on the training data –Widely used in commercial search engines Disadvantages –Ignore relations among candidate objects TMDS Relevance Salience Diversity Content Info LTR + Relationship?
8
Outline Motivation Our Approach Experiments Conclusion
9
Our Approach Relational Learning-to-rank approach (R-LTR) –Considering both content of individual objects and relations among objects. Formalization –Four key components: input space, out space, ranking function f, loss function L Difference
10
Challenges for R-LTR How to define ranking function How to define loss function
11
Definition of Ranking function How human beings extract a summary Sequential Ranking Process
12
Definition of Ranking function Definition Relational function Content-based score Relation-based score
13
Definition of Loss Function Sequential ranking process Model the generation of the summary in a Sequential way Loss function: as the likelihood loss of the generation probability
14
Definition of Loss Function Plackett-Luce Model Detailed definition maximize the sum of the likelihood function
15
Leaning Unconstrained optimization problem –Stochastic Gradient Descent
16
Prediction Sequential Prediction Process
17
Outline Motivation Our Approach Experiments Conclusion
18
Experiments Dataset: set A in TAC2008 and TAC2009 Data Processing –Indri toolkit (version 5.2) –Porter stemmer and stopwords removing Training –Training data: a manual summarization results on TAC2009. –4-fold cross-validation for TAC2009 –Train model on TAC2009, test on TAC2008
19
Feature Vectors Content-based features –Weighing features: VSM, BM25, LM.. –Term dependency features: MRF –Length –Pos –…–… Relation-based features –Cosine diversity –Jaccard diversity –subtopic diversity –document-level co-occurrence –…–…
20
Baseline Methods Experience-based –Leading sentence selection (LEAD) Metric-based –Maximal Marginal Relevance (MMR) (SIGIR’98) Graph-based –Personalized PageRank (PPR) (WWW’02) –Manifold Ranking (MR) (NIPS’04) –DivRank (DR) (KDD’10) –GrassHopper (GH) (HLT-NAACL’07) –Supervised Lazy random walk (SuperLazy) (ICDM’11) Classification-based –Support Vector Machine (SVM)
21
Evaluation Quantitative Evaluation –Evaluation metric: ROUGE-1, ROUGE-2, ROUGE- SU4
22
Quantitative Evaluation TAC2009 –Our approaches outperform all the baseline methods in all the ROUGE evaluation metrics, and all these improvements are statistically significant
23
Quantitative Evaluation TAC2008 –The results is consistent with what we obtained on TAC2009 –Good generalization ability
24
Qualitative Evaluation Randomly choose a topic on dataset TAC2009 –“glendale–describe the glendale train crash, the cause of the crash, and the arrest and trial of the man accused of causing it.” Reference summaries –Four manual summaries provided by NIST Generated summaries –Our approach and some representative baseline methods
25
Reference Summaries Four manual summaries provided by NIST
26
Generated summaries Summaries generated by some representative methods
27
Statistics of Fact Coverage The statistics of the facts coverage of different methods
28
Outline Motivation Our Approach Experiments Conclusion
29
Contributions –Propose a novel relational learning-to-rank framework for the TMDS task. –The R-LTR can be easily extended to other fields, such as diverse ranking in Web search –Completed experiment validation from a quantitative and qualitative aspects. Future work –Test our approach on other types of dataset such as prevalent short text in social media
30
Thanks ! zhuyadong@software.ict.ac.cn
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.