A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.

A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute of Computing Technology, CAS

Outline Motivation Our Approach Experiments Conclusion

Motivation Summarization Extractive Summarization Experience-based Clustering-based Classification Ranking-based TMDS: The most important aspects of the given topic

Motivation TMDS RelevanceSalience Diversity Multi-criteria ranking problem

Motivation Previous ranking approaches –Metric-based: heuristically predefined –Graph-based: pairwise relationships, implicitly captured, high computing complexity

Learning-to-rank techniques Advantages –The capability of combining a large number of features –Automatic learning based on the training data –Widely used in commercial search engines Disadvantages –Ignore relations among candidate objects TMDS Relevance Salience Diversity Content Info LTR + Relationship?

Our Approach Relational Learning-to-rank approach (R-LTR) –Considering both content of individual objects and relations among objects. Formalization –Four key components: input space, out space, ranking function f, loss function L Difference

Challenges for R-LTR  How to define ranking function  How to define loss function

Definition of Ranking function How human beings extract a summary Sequential Ranking Process

Definition of Ranking function Definition Relational function Content-based score Relation-based score

Definition of Loss Function Sequential ranking process Model the generation of the summary in a Sequential way Loss function: as the likelihood loss of the generation probability

Definition of Loss Function Plackett-Luce Model Detailed definition maximize the sum of the likelihood function

Leaning Unconstrained optimization problem –Stochastic Gradient Descent

Prediction Sequential Prediction Process

Experiments Dataset: set A in TAC2008 and TAC2009 Data Processing –Indri toolkit (version 5.2) –Porter stemmer and stopwords removing Training –Training data: a manual summarization results on TAC2009. –4-fold cross-validation for TAC2009 –Train model on TAC2009, test on TAC2008

Feature Vectors Content-based features –Weighing features: VSM, BM25, LM.. –Term dependency features: MRF –Length –Pos –…–… Relation-based features –Cosine diversity –Jaccard diversity –subtopic diversity –document-level co-occurrence –…–…

Baseline Methods Experience-based –Leading sentence selection (LEAD) Metric-based –Maximal Marginal Relevance (MMR) (SIGIR’98) Graph-based –Personalized PageRank (PPR) (WWW’02) –Manifold Ranking (MR) (NIPS’04) –DivRank (DR) (KDD’10) –GrassHopper (GH) (HLT-NAACL’07) –Supervised Lazy random walk (SuperLazy) (ICDM’11) Classification-based –Support Vector Machine (SVM)

Evaluation Quantitative Evaluation –Evaluation metric: ROUGE-1, ROUGE-2, ROUGE- SU4

Quantitative Evaluation TAC2009 –Our approaches outperform all the baseline methods in all the ROUGE evaluation metrics, and all these improvements are statistically significant

Quantitative Evaluation TAC2008 –The results is consistent with what we obtained on TAC2009 –Good generalization ability

Qualitative Evaluation Randomly choose a topic on dataset TAC2009 –“glendale–describe the glendale train crash, the cause of the crash, and the arrest and trial of the man accused of causing it.” Reference summaries –Four manual summaries provided by NIST Generated summaries –Our approach and some representative baseline methods

Reference Summaries Four manual summaries provided by NIST

Generated summaries Summaries generated by some representative methods

Statistics of Fact Coverage The statistics of the facts coverage of different methods

Contributions –Propose a novel relational learning-to-rank framework for the TMDS task. –The R-LTR can be easily extended to other fields, such as diverse ranking in Web search –Completed experiment validation from a quantitative and qualitative aspects. Future work –Test our approach on other types of dataset such as prevalent short text in social media

Thanks ！ zhuyadong@software.ict.ac.cn

A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.

Similar presentations

Presentation on theme: "A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.

Similar presentations

Presentation on theme: "A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute."— Presentation transcript:

Similar presentations

About project

Feedback