Summarizing Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia
2 Motivations of Summarization overloading – 40~60 s per day or even more… Personal information repository summarization can be helpful – Two examples Meeting Access s from mobile devices.
3 Outline Characteristics of Related work Our summarization approach Experimental results Conclusions and future work
4 Characteristics of s Conversation structure – Context related: reply to the previous messages. (>60%) Hidden – A hidden is an quoted by at least one in a folder but is not present itself in the same folder. Writing style – Short length, informal writing, multiple authors, etc. ABAB > A > B C E > D > > A > > B > C F > > A > > B G m1m1 m2m2 m3m3 m4m4
5 Requirements for Summarization Conversation structure – Context information is provided. Information completeness – Include hidden s as well as existing messages. Informative summarization – Cover the core points of the discussion. – Replacement of the original s.
6 Outline Characteristics of Related work Our summarization approach Result Conclusions and future work
7 Related Work Multi-Document Summarization (MDS) – Extractive: MEAD, MMR-MD. – Abstractive/Generative: MultiGen, SEA summarization – Single summarization(Muresan et al. ) – Summarizing threads by sentence selection (Rambow et al. and Wan et al.)
8 Related Work MDS methods summarizationOur method MEAD & MMR-MD MultiGenSEAMuresan et al. Rambow et al. Wan et al. Hidden Hidden x Conv. Structure Thread xxx Quotation analysis x informative summary Sentence selection xxxxx Lang. gen. xx
9 Outline Characteristics of Related work Our summarization approach – Fragment quotation graph – ClueWordSummarizer (CWS) Result Conclusions and future work
10 Framework Input: a set of s Output: summaries Process: – Discover and represent conversations as fragment quotation graphs – ClueWordSummarizer generates summaries.
11 Conversation Structure - Fragment Quotation Graph Complications of conversation: – Header information E.g., subject, in-reply-to, and references. Not accurate enough. – Quotation A good indication for conversation(Yeh et al.). Selective quotations reflect the conversation in detail. – Assumption: quotation conversation Build a fragment quotation graph conversation.
12 Fragment Quotation Graph Create nodes – Compare quotations and new messages – a, b, c, d, e, f, g, h, i, j. Create edges – Neighbouring quotations
13 Outline Characteristics of Related work Our summarization approach – Fragment quotation graph – ClueWordSummarizer (CWS) Result Conclusions and future work
14 ClueWordSummarizer Clue words in the fragment quotation graph – A clue word in node (fragment) F is a word which also appears in a semantically similar form in a parent or a child node of F in the fragment quotation graph. – E.g.,
15 ClueWordSummarizer Three types of clue words – Root/stem: settle vs. settlement – Synonym/antonym: war vs. peace – Loose semantic meaning: Friday vs. deadline
16 ClueWordSummarizer 1. ClueScore(CW) – A word CW is in a sentence S of a fragment F – ClueScore(discussed, a )=1 – ClueScore(settle, b ) = 2
17 ClueWordSummarizer For each conversation, rank all of the sentences based on their ClueScores. 4. Select the top-k sentences as the summary.
18 Outline Characteristics of Related work Our summarization approach Result – User study – Empirical experiments Conclusions and future work
19 Result 1: User Study Objective: – Gold standard – How human summarize conversations Setup – Dataset: 20 conversations from Enron dataset – Human reviewers: 25 grads/ugrads in UBC – Each sentence is evaluated by 5 different human reviewers. – Select important sentences and mark crucial important ones. Gold standard – 4 selections and at least 2 are essentially important. – 88 “gold” sentences out of the 20 conversations (12%).
20 Result 1: User Study Information completeness – 18% gold sentences from hidden s. – Hidden s carry crucial information as well. Significance of clue words – Clue words appears more frequently in the 88 gold sentences. – Average ratio of ClueScore in gold sentences & ClueScore in non-gold sentences 3.9
21 Result 2: Empirical Experiments RIPPER A machine learning classifier In the summary or not. 14 features(Rambow et al.): linguistic and specific. Sentence/conversation level training 10-fold cross validation CWS & MEAD The same summary length(2%) as that of RIPPER.
22 Result 2: Empirical Experiments (CWS v.s MEAD) sumLen = 15% CWS has a higher accuracy. P-value: – (precision) – (recall) – (F-measure)
23 Result 2: Empirical Experiments (CWS v.s MEAD) CWS has a higher accuracy when sumLen <= 30%. MEAD is more accurate when sumLen = 40% and higher. Clue words are significant in important sentences.
24 Result 2: Empirical Experiments (Fragment quotation graph)
25 Outline Characteristics of Related work Our conversation-based approach Result Conclusions and future work
26 Conclusions and Future Work Conclusions – The conversation structure is important and should be paid more attention. – Fragment quotation graph – Clue Words and ClueWordSummarizer – Empirical evaluation Clue words frequently appears in important sentences. CWS is accurate.
27 Future Work Refine the fragment quotation graph User study on different dataset Try other ML classifiers Integrate CWS and other methods … …
Thank you! Questions?