Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Problem Semi supervised sarcasm identification using SASI
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Yunhua Hu 1, Guomao Xin 2, Ruihua Song, Guoping Hu 3, Shuming.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Deep Belief Networks for Spam Filtering
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
09 / 23 / Predicting Protein Function Using Machine-Learned Hierarchical Classifiers Roman Eisner Supervisors: Duane Szafron.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
India Research Lab Auto-grouping s for Faster eDiscovery Sachindra Joshi, Danish Contractor, Kenney Ng*, Prasad M Deshpande, and Thomas Hampp* IBM.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Mining and Summarizing Customer Reviews
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Contact Prediction, Routing and Fast Information Spreading in Social Networks Kazem Jahanbakhsh Computer Science Department University of Victoria August.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Sharad Oberoi and Susan Finger Carnegie Mellon University DesignWebs: Towards the Creation of an Interactive Navigational Tool to assist and support Engineering.
Maryam Sadeghi 1,3, Majid Razmara 1, Martin Ester 1, Tim K. Lee 1,2,3 and M. Stella Atkins 1 1: School of Computing Science, Simon Fraser University 2:
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.
Classical Music for Rock Fans?: Novel Recommendations for Expanding User Interests Makoto Nakatsuji, Yasuhiro Fujiwara, Akimichi Tanaka, Toshio Uchiyama,
Maryam Sadeghi 1,3, Majid Razmara 1, Martin Ester 1, Tim K. Lee 1,2,3 and M. Stella Atkins 1 1: School of Computing Science, Simon Fraser University 2:
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
1 Towards Automated Related Work Summarization (ReWoS) HOANG Cong Duy Vu 03/12/2010.
Developing Trust Networks based on User Tagging Information for Recommendation Making Touhid Bhuiyan et al. WISE May 2012 SNU IDB Lab. Hyunwoo Kim.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Measuring Behavioral Trust in Social Networks
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.
BY DR. HAMZA ABDULGHANI MBBS,DPHC,ABFM,FRCGP (UK), Diploma MedED(UK) Associate Professor DEPT. OF MEDICAL EDUCATION COLLEGE OF MEDICINE June 2012 Writing.
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
P.Demestichas (1), S. Vassaki(2,3), A.Georgakopoulos(2,3)
Forecasting Project Format
Intent-Aware Semantic Query Annotation
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Panagiotis G. Ipeirotis Luis Gravano
Graph Attention Networks
Presentation transcript:

Summarizing Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia

2 Motivations of Summarization overloading – 40~60 s per day or even more… Personal information repository summarization can be helpful – Two examples Meeting Access s from mobile devices.

3 Outline Characteristics of Related work Our summarization approach Experimental results Conclusions and future work

4 Characteristics of s Conversation structure – Context related: reply to the previous messages. (>60%) Hidden – A hidden is an quoted by at least one in a folder but is not present itself in the same folder. Writing style – Short length, informal writing, multiple authors, etc. ABAB > A > B C E > D > > A > > B > C F > > A > > B G m1m1 m2m2 m3m3 m4m4

5 Requirements for Summarization Conversation structure – Context information is provided. Information completeness – Include hidden s as well as existing messages. Informative summarization – Cover the core points of the discussion. – Replacement of the original s.

6 Outline Characteristics of Related work Our summarization approach Result Conclusions and future work

7 Related Work Multi-Document Summarization (MDS) – Extractive: MEAD, MMR-MD. – Abstractive/Generative: MultiGen, SEA summarization – Single summarization(Muresan et al. ) – Summarizing threads by sentence selection (Rambow et al. and Wan et al.)

8 Related Work MDS methods summarizationOur method MEAD & MMR-MD MultiGenSEAMuresan et al. Rambow et al. Wan et al. Hidden Hidden x Conv. Structure Thread xxx Quotation analysis x informative summary Sentence selection xxxxx Lang. gen. xx

9 Outline Characteristics of Related work Our summarization approach – Fragment quotation graph – ClueWordSummarizer (CWS) Result Conclusions and future work

10 Framework Input: a set of s Output: summaries Process: – Discover and represent conversations as fragment quotation graphs – ClueWordSummarizer generates summaries.

11 Conversation Structure - Fragment Quotation Graph Complications of conversation: – Header information E.g., subject, in-reply-to, and references. Not accurate enough. – Quotation A good indication for conversation(Yeh et al.). Selective quotations reflect the conversation in detail. – Assumption: quotation  conversation  Build a fragment quotation graph  conversation.

12 Fragment Quotation Graph Create nodes – Compare quotations and new messages – a, b, c, d, e, f, g, h, i, j. Create edges – Neighbouring quotations

13 Outline Characteristics of Related work Our summarization approach – Fragment quotation graph – ClueWordSummarizer (CWS) Result Conclusions and future work

14 ClueWordSummarizer Clue words in the fragment quotation graph – A clue word in node (fragment) F is a word which also appears in a semantically similar form in a parent or a child node of F in the fragment quotation graph. – E.g.,

15 ClueWordSummarizer Three types of clue words – Root/stem: settle vs. settlement – Synonym/antonym: war vs. peace – Loose semantic meaning: Friday vs. deadline

16 ClueWordSummarizer 1. ClueScore(CW) – A word CW is in a sentence S of a fragment F – ClueScore(discussed, a )=1 – ClueScore(settle, b ) = 2

17 ClueWordSummarizer For each conversation, rank all of the sentences based on their ClueScores. 4. Select the top-k sentences as the summary.

18 Outline Characteristics of Related work Our summarization approach Result – User study – Empirical experiments Conclusions and future work

19 Result 1: User Study Objective: – Gold standard – How human summarize conversations Setup – Dataset: 20 conversations from Enron dataset – Human reviewers: 25 grads/ugrads in UBC – Each sentence is evaluated by 5 different human reviewers. – Select important sentences and mark crucial important ones. Gold standard – 4 selections and at least 2 are essentially important. – 88 “gold” sentences out of the 20 conversations (12%).

20 Result 1: User Study Information completeness – 18% gold sentences from hidden s. – Hidden s carry crucial information as well. Significance of clue words – Clue words appears more frequently in the 88 gold sentences. – Average ratio of ClueScore in gold sentences & ClueScore in non-gold sentences  3.9

21 Result 2: Empirical Experiments RIPPER A machine learning classifier  In the summary or not. 14 features(Rambow et al.): linguistic and specific. Sentence/conversation level training 10-fold cross validation CWS & MEAD The same summary length(2%) as that of RIPPER.

22 Result 2: Empirical Experiments (CWS v.s MEAD) sumLen = 15% CWS has a higher accuracy. P-value: – (precision) – (recall) – (F-measure)

23 Result 2: Empirical Experiments (CWS v.s MEAD) CWS has a higher accuracy when sumLen <= 30%. MEAD is more accurate when sumLen = 40% and higher. Clue words are significant in important sentences.

24 Result 2: Empirical Experiments (Fragment quotation graph)

25 Outline Characteristics of Related work Our conversation-based approach Result Conclusions and future work

26 Conclusions and Future Work Conclusions – The conversation structure is important and should be paid more attention. – Fragment quotation graph – Clue Words and ClueWordSummarizer – Empirical evaluation Clue words frequently appears in important sentences. CWS is accurate.

27 Future Work Refine the fragment quotation graph User study on different dataset Try other ML classifiers Integrate CWS and other methods … …

Thank you! Questions?