LOGO Summarizing Email Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.

Slides:



Advertisements
Similar presentations
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Advertisements

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Project Fragmentation The Project Fragmentation Problem in Personal Information Management Bergman, et al CHI 2006 proceedings.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Querying Structured Text in an XML Database By Xuemei Luo.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
LexRank: Graph-based Centrality as Salience in Text Summarization
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
1 Towards Automated Related Work Summarization (ReWoS) HOANG Cong Duy Vu 03/12/2010.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Summarizing Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Algorithmic Detection of Semantic Similarity WWW 2005.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
LOGO 1 Mining Templates from Search Result Records of Search Engines Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hongkun Zhao, Weiyi.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Classification Results for Folder Classification on Enron Dataset.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
Summarizing Contrastive Viewpoints in Opinionated Text Michael J. Paul, ChengXiang Zhai, Roxana Girju EMNLP ’ 10 Speaker: Hsin-Lan, Wang Date: 2010/12/07.
Explorative Thread-based Analysis of Patterns of Collaborative Interaction in Chat Nan Zhou Murat Cakir.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Compact Query Term Selection Using Topically Related Text
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date :

2 Outline  Introduction.  Building The Fragment Quotation Graph.  Creating nodes  Creating edges  Summarization Methods.  CWS  MEAD  RIPPER  Result 1 : User Study.  Result 2 : Evaluation Of CWS.

3 Introduction  With the ever increasing popularity of s, overload becomes a major problem for many users.  In this paper, proposing a different form of support - summarization.  The goal is to provide a concise, informative summary of s contained in a folder, thus saving the user from browsing through each one by one.

4 Introduction  The summary is intended to be multi-granularity in that the user can specify the size of the concise summary.  summarization can also be valuable for users reading s with mobile devices, given the small screen size of handheld devices, efforts have been made to re-design the user interface.

5 Introduction  Many s are asynchronous responses to some previous messages and as such they constitute a conversation, which may be hard to reconstruct in detail.  A conversation may involve many users, many of whom may have different writing styles.  A hidden is an quoted by at least one in the folder but is not present itself in the user’s folders, and hidden may carry important information to be part of the summary.

6 Building The Fragment Quotation Graph  Here assuming that if one quotes another , they belong to the same conversation.  Using a fragment quotation graph to represent conversations, and the graph G = (V,E) is a directed graph, where each node u ∈ V is a text unit in the folder, and an edge (u, v) means node u is in reply to node v.

7 Building The Fragment Quotation Graph  Identifying quoted and new fragment :  Here we assume that there exist one or more quotation markers (e.g., “>”) that are used as a prefix of every quoted line.  Quotation depth : the number of quotation markers “>” in the prefix, it reflects the number of times that this line has been quoted since the original message containing this line was sent.

8 Building The Fragment Quotation Graph  Identifying quoted and new fragment : (cont.)  Quoted fragment : is a maximally contiguous block of quoted lines having the same quotation depth.  New fragment : is a maximally contiguous block of lines that are not prefixed by the quotation markers.  An can be viewed as an alternating sequence of quoted and new fragments.

9 Building The Fragment Quotation Graph  Creating nodes :  Given an folder Fdr = {M 1,...,M n }, the first step is to identify distinct fragments, each of which will be represented as a node in the graph.  Quoted and new fragments from all s in Fdr are matched against each other to identify overlaps.  There is an overlap between two fragments if there is a common substring that is sufficiently long. (given an overlap threshold)

10 Building The Fragment Quotation Graph

11 Building The Fragment Quotation Graph  Creating edges :  Assuming that any new fragment is a potential reply to neighboring quotations – quoted fragments immediately preceding or following it.  Because of possible differences in quotation depth, a block may contain multiple fragments.  For the general situation when QS p precedes NS, which is then followed by QS f, we create an edge (v, u) for each fragment u ∈ (QS p ∪ QS f ) and v ∈ NS.

12 Building The Fragment Quotation Graph  Creating edges : (cont.)  For a hidden fragment, additional edges are created within the quoted block, following the same neighboring quotation assumption.  Using the minimum equivalent graph as the fragment quotation graph, which is transitively equivalent to the original graph.  In the fragment quotation graph, each of these conversations will be reflected as weakly connected components.

13 Summarization Methods  Clue words :  A clue word in node (fragment) F is a word which also appears in a semantically similar form in a parent or a child node of F in the fragment quotation graph.  In this paper, we only apply stemming to the identification of clue words, using the Porter’s stemming algorithm to compute the stem of each word, and use the stems to judge the reoccurrence.

14 Summarization Methods  Fragments (a) and (b) are two adjacent nodes with (b) as the parent node of (a).

15 Summarization Methods  Clue words : (cont.)  Here observing 3 major kinds of reoccurrence : The same root (stem) with different forms, e.g., “settle” vs. “settlement” and “discuss” vs. “discussed” as in the example above. Synonyms/antonyms or words with similar/contrary meaning, e.g., “talk” vs. “discuss” and “peace” vs. “war”. Words that have a looser semantic link, e.g., “deadline” with “Friday morning”.

16 Summarization Methods  Algorithm CWS :  Algorithm ClueWordSummarizer (CWS) uses clue words as the main feature for summarization.  The assumption is that if those words reoccur between parent and child nodes, they are more likely to be relevant and important to the conversation.

17 Summarization Methods

18 Summarization Methods  Algorithm CWS : (cont.)  To evaluate the significance of the clue words :  To the sentence level :

19 Summarization Methods 1 21 ClueScore(s) = 3

20 Summarization Methods  MEAD :  A centroid-based multi-document summarizer.  MEAD computes the centroid of all s in one conversation, a centroid is a vector of words’ average TFIDF values in all documents.  MEAD compares each sentence s with the centroid and assigns it a score as the sum of all the centroid values of the common words shared by the sentence s and the centroid. .

21 Summarization Methods  MEAD : (cont.)  All sentences are ranked based on the MEADScore, and the top ones are included in the summary.  Compared with MEAD, CWS may appear to use more “local” features, and MEAD may capture more “global” salience related to the whole conversation.

22 Summarization Methods  Hybrid ClueScore with MEADScore :  ClueScore and MEADScore tend to represent different aspects of the importance of a sentence, so it is natural to combine both methods together.  Using the linear combination of ClueScore and MEADScore : LinearClueMEAD(s) = α ∗ ClueScore(s) + (1 − α) ∗ MEADScore(s), where α ∈ [0, 1].

23 Summarization Methods  RIPPER :  Using the RIPPER system to induce a classifier for determining if a given sentence should be included in the summary.  RIPPER is trained on a corpus in which each sentence is described by a set of 14 features and annotated with the correct classification (i.e., whether it should be included in the summary or not). 8 “basic” linguistic features 2 “basic+” features 4 “basic++” features

24 Result 1 : User Study  Dataset setup :  Here collected 20 conversations from the Enron dataset as the testbed and recruited 25 human summarizers to review them.  The threads could be divided into two types, one is the single chain type, and the other is the thread hierarchy type.  We randomly select 4 single chains and 16 trees.

25 Result 1 : User Study  Dataset setup : (cont.)  Each summarizer reviewed 4 distinct conversations in one hour, each conversation were reviewed by 5 different human summarizers.  The generated summary contained about 30% of the original sentences.  The human summarizers were asked to classify each selected sentence as either essential or optional.

26 Result 1 : User Study  Result of the user study :  We assign a GSValue for each sentence to evaluate its importance according to human summarizers’ selection, for each sentence s, one essential selection has a score of 3, one optional selection has a score of 1.  Out of the 741 sentences in the 20 conversations, 88 are overall essential sentences which is about 12% of the overall sentences.

27 Result 1 : User Study  Result of the user study : (cont.)  Sorting all sentences by their GSValue and about 17% sentences in the top-30% sentences are from hidden s, among the 88 overall essential sentences, about 18% sentences are from hidden s.  Comparing the average ClueScore of overall essential sentences in the gold standard with the average of all other sentences.

28 Result 2 : Evaluation Of CWS  Comparing CWS with MEAD and RIPPER :  Sentence-level training.  Conversation-level training.

29 Result 2 : Evaluation Of CWS  Comparing CWS with MEAD :

30 Result 2 : Evaluation Of CWS  Hybrid methods : 100 % CWS

31 Conclusion  In this paper, we study how to generate accurate summaries, and build a novel structure : the fragment quotation graph, to represent the conversation structure.  The experiments with the Enron dataset not only indicate that hidden s have to be considered for summarization, but also shows that CWS can generate more accurate summaries when compared with other methods.