Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.

Similar presentations


Presentation on theme: "A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute."— Presentation transcript:

1 A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute of Computing Technology, CAS

2 Outline Motivation Our Approach Experiments Conclusion

3 Outline Motivation Our Approach Experiments Conclusion

4 Motivation Summarization Extractive Summarization Experience-based Clustering-based Classification Ranking-based TMDS: The most important aspects of the given topic

5 Motivation TMDS RelevanceSalience Diversity Multi-criteria ranking problem

6 Motivation Previous ranking approaches –Metric-based: heuristically predefined –Graph-based: pairwise relationships, implicitly captured, high computing complexity

7 Learning-to-rank techniques Advantages –The capability of combining a large number of features –Automatic learning based on the training data –Widely used in commercial search engines Disadvantages –Ignore relations among candidate objects TMDS Relevance Salience Diversity Content Info LTR + Relationship?

8 Outline Motivation Our Approach Experiments Conclusion

9 Our Approach Relational Learning-to-rank approach (R-LTR) –Considering both content of individual objects and relations among objects. Formalization –Four key components: input space, out space, ranking function f, loss function L Difference

10 Challenges for R-LTR  How to define ranking function  How to define loss function

11 Definition of Ranking function How human beings extract a summary Sequential Ranking Process

12 Definition of Ranking function Definition Relational function Content-based score Relation-based score

13 Definition of Loss Function Sequential ranking process Model the generation of the summary in a Sequential way Loss function: as the likelihood loss of the generation probability

14 Definition of Loss Function Plackett-Luce Model Detailed definition maximize the sum of the likelihood function

15 Leaning Unconstrained optimization problem –Stochastic Gradient Descent

16 Prediction Sequential Prediction Process

17 Outline Motivation Our Approach Experiments Conclusion

18 Experiments Dataset: set A in TAC2008 and TAC2009 Data Processing –Indri toolkit (version 5.2) –Porter stemmer and stopwords removing Training –Training data: a manual summarization results on TAC2009. –4-fold cross-validation for TAC2009 –Train model on TAC2009, test on TAC2008

19 Feature Vectors Content-based features –Weighing features: VSM, BM25, LM.. –Term dependency features: MRF –Length –Pos –…–… Relation-based features –Cosine diversity –Jaccard diversity –subtopic diversity –document-level co-occurrence –…–…

20 Baseline Methods Experience-based –Leading sentence selection (LEAD) Metric-based –Maximal Marginal Relevance (MMR) (SIGIR’98) Graph-based –Personalized PageRank (PPR) (WWW’02) –Manifold Ranking (MR) (NIPS’04) –DivRank (DR) (KDD’10) –GrassHopper (GH) (HLT-NAACL’07) –Supervised Lazy random walk (SuperLazy) (ICDM’11) Classification-based –Support Vector Machine (SVM)

21 Evaluation Quantitative Evaluation –Evaluation metric: ROUGE-1, ROUGE-2, ROUGE- SU4

22 Quantitative Evaluation TAC2009 –Our approaches outperform all the baseline methods in all the ROUGE evaluation metrics, and all these improvements are statistically significant

23 Quantitative Evaluation TAC2008 –The results is consistent with what we obtained on TAC2009 –Good generalization ability

24 Qualitative Evaluation Randomly choose a topic on dataset TAC2009 –“glendale–describe the glendale train crash, the cause of the crash, and the arrest and trial of the man accused of causing it.” Reference summaries –Four manual summaries provided by NIST Generated summaries –Our approach and some representative baseline methods

25 Reference Summaries Four manual summaries provided by NIST

26 Generated summaries Summaries generated by some representative methods

27 Statistics of Fact Coverage The statistics of the facts coverage of different methods

28 Outline Motivation Our Approach Experiments Conclusion

29 Contributions –Propose a novel relational learning-to-rank framework for the TMDS task. –The R-LTR can be easily extended to other fields, such as diverse ranking in Web search –Completed experiment validation from a quantitative and qualitative aspects. Future work –Test our approach on other types of dataset such as prevalent short text in social media

30 Thanks ! zhuyadong@software.ict.ac.cn


Download ppt "A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute."

Similar presentations


Ads by Google