Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/09/05
Outline Introduction CRF-based Summarization Experiments and Result Conclusion and Future work 2
Introduction(1/2) Text document summarization has attracted much attention since the original work by Luhn (1958) Text mining tasks such as document classification [Shen 2004] –Help readers to catch the main points of a long document with less effort Summarization tasks can be grouped into different categories –Input Single document summary Multiple documents summary –Purpose Generic summary Query-oriented summary [Goldstein 1999] –Output [Mani 1999] Extractive summary Abstractive summary 3
Introduction(2/2) Extractive document summarization –Supervised algorithms a two class classification problem classify each sentence individually without leveraging the relation ship among sentences –Unsupervised algorithms use heuristic rules to select the most informative sentences into a summary directly, which are hard to generalize Conditional Random Fields (CRF) –avoid two disadvantages –as a sequence labeling problem instead of a simple classification problem –Solve to fail to predict the sequence labels given the observation sequences in many situations because they inappropriately use a generative joint model P(D|S) in order to solve a discriminative conditional problem when observations are given 4
CRF-based Summarization(1/3) 5
CRF-based Summarization(2/3) 6
CRF-based Summarization(3/3) 7
Experiment(1/5) Basic Feature –Position –Thematic word : most frequent word –Upper case word : authors want to emphasize –Similary to Neighboring sentence Complex Feature –LSA score –HIT score – document must be treat as a graph 8
Experiment(2/5) 147 document summary pairs from Document Understanding Conference (DUC) 2001 Supervise Method –Naive Bayes(NB) –Logistic Regression (LR) –Support Vector Machine (SVM) –Hidden Markov Model(HMM) –Conditional Random Fields (CRF) Unsupervise Method –Select sentences randomly from the document is denoted as RANDOM –Select the lead sentence in each paragraph is denoted as LEAD –LSA –Graph based ranking algorithm such as HITS 9
Experiment(3/5) Random is worst CRF is best HMM and LR improve the performance as compared to NB due to the advantages of leveraging sequential information CRF makes a further improvement by 8.4% and 11.1%, over both HMM and LR in terms of ROUGE-2 and F1 CRF outperforms HITS by 5.3% and 5.7% in terms of ROUGE-2 and F1 10
Experiment(4/5) CRF is still the best method, which improves the values of ROUGE-2 and F1 achieved by the best baselines by more than 7.1% and 8.8% Compared with the best unsupervised method HITS,the CRF based on both kinds of features improves the performance by 12.1% and 13.9% in terms of ROUGE-2 and F1 we compared CRF to the linear combination method used to combine the results of LSA, HITS and CRF based only on the basic features, the best result we can obtain on DUC01 is and in terms of ROUGE-2 and F1 11
Experiment(5/5) 10-fold cross validation procedure, where one fold is for training and the other nine folds for test we can obtain more precise parameters of the models with more training data CRF-based methods and the other four supervised methods is clearly larger when the size of the training data is small HMM is are not particularly relevant to the task of inferring the class labels The bad performance of NB, LR and SVM overfitting with a small amount of training data. 12
Conclusion We provided a framework to consider all available features that include the interactions between sentences We plan to exploit more features, especially the linguistic features which are not covered in this paper, such as the rhetorical structures 13