Ranking Users for Intelligent Message Addressing

Slides:



Advertisements
Similar presentations
Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Yusuf Simonson Title Suggesting Friends Using the Implicit Social Graph.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
& A Recommendation System for Recipients Vitor R. Carvalho and William W. Cohen, Carnegie Mellon University March 2007 Preventing Leaks.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
Chapter 5: Information Retrieval and Web Search
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,
Cover Letters. Cover letters Cover letter: A cover letter is a document sent with your resume to provide additional information on your skills and experience.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.
FAIRTRADE FOUNDATION OCR Nationals in ICT Unit 1 ICT Skills for Business AO2.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
System Initialization 1)User starts application. 2)Client loads settings. 3)Client loads contact address book. 4)Client displays contact list. 5)Client.
Chapter 6: Information Retrieval and Web Search
Outlook Web App Crash course. Outlook Agenda Login Login Reset Password Reset Password Getting Started in Outlook Web App Getting Started in Outlook Web.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Modeling Intention in Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University.
CSE 534 Final Project Internet Outage Analysis Name: Guanyu Zhu, Wei-Ting Lin, Zhaowei Sun Professor: Phillipa Gill.
and the business environment Explain what is and how is it used in a business environment A02 .
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
1 Modeling Intention in Vitor R. Carvalho Ph.D. Thesis DefenseThesis Committee: Language Technologies Institute William W. Cohen (chair) School of.
Predicting Leadership Roles in Workgroups Vitor R. Carvalho, Wen Wu and William W. Cohen Carnegie Mellon University CEAS-2007, Aug 2 nd 2007.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
User Modeling for Personal Assistant
Ranking and Learning 290N UCSB, Tao Yang, 2014
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Contextual Intelligence as a Driver of Services Innovation
Web News Sentence Searching Using Linguistic Graph Similarity
An Empirical Study of Learning to Rank for Entity Search
Preventing Information Leaks in
A Case Study for Adaptive News Systems with Open User Model
Orders & Shipment Tracking
Centenary University Proxy Authorization
Task recorder in Dynamics AX
Lecture 5: Leave no relevant data behind: Data Search
Navi 下一步工作的设想 郑 亮 6.6.
Towards a Personal Briefing Assistant
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Learning to Rank Typed Graph Walks: Local and Global Approaches
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Strategies for mobilizing research knowledge: A conceptual model and its application KM research team OISE.
Deciding the mixed-mode design WP1
Connecting the Dots Between News Article
Presentation transcript:

Ranking Users for Intelligent Message Addressing Vitor R. Carvalho and William Cohen Carnegie Mellon University Glasgow, April 2nd 2008

Outline Intelligent Message Addressing Models Data & Experiments Email Auto-completion Mozilla Thunderbird Extension* Learning to Rank Results*

Ramesh Nallapati <ramesh@cs.cmu.edu> [Add] William Cohen <wcohen@cs.cmu.edu> [Add] Akiko Matsui <akiko@cs.cmu.edu> [Add] Yifen Huang <hyfen@andrew.cmu.edu> [Add]

Ramesh Nallapati <ramesh@cs.cmu.edu> [Add] William Cohen <wcohen@cs.cmu.edu> [Add] Akiko Matsui <akiko@cs.cmu.edu> [Add] Yifen Huang <hyfen@andrew.cmu.edu> [Add]

Ramesh Nallapati <ramesh@cs.cmu.edu> [Add] Akiko Matsui <akiko@cs.cmu.edu> [Add] Yifen Huang <hyfen@andrew.cmu.edu> [Add]

einat <einat@cs.cmu.edu> [Add] Ramesh Nallapati <ramesh@cs.cmu.edu> [Add] Jon Elsas <jelsas@cs.cmu.edu> [Add] Andrew Arnold <aard@andrew.cmu.edu> [Add]

einat <einat@cs.cmu.edu> [Add] Ramesh Nallapati <ramesh@cs.cmu.edu> [Add] Jon Elsas <jelsas@cs.cmu.edu> [Add] Andrew Arnold <aard@andrew.cmu.edu> [Add]

Ramesh Nallapati <ramesh@cs.cmu.edu> [Add] Jon Elsas <jelsas@cs.cmu.edu> [Add] Andrew Arnold <aard@andrew.cmu.edu> [Add]

Tom Mitchell <tom@cs.cmu.edu> [Add] Andrew Arnold <aard@andrew.cmu.edu> [Add] Jon Elsas <jelsas@cs.cmu.edu> [Add] Frank Lin <frank@cs.cmu.edu> [Add]

Tom Mitchell <tom@cs.cmu.edu> [Add] Andrew Arnold <aard@andrew.cmu.edu> [Add] Jon Elsas <jelsas@cs.cmu.edu> [Add] Frank Lin <frank@cs.cmu.edu> [Add]

The Task: Intelligent Message Addressing Predicting likely recipients of email messages given: (1) contents of message being composed (2) other recipients already specified (3) a few initial letters of the intended recipient contact (intelligent auto-completion).

What for? Prevent high-cost management errors Identifying people related to specific topics (or have specific relevant skills.) Relation to Expert Finding Email message ↔ (long) query Email addresses ↔ experts Improved Email Address Auto-completion Prevent high-cost management errors People just forget to add important recipients preventing costly misunderstandings communication delays missed opportunities. [Dom et al, 03; Campbell et al,03] Particularly in large corporations

How Frequent are These Errors? Grep for “forgot”, “sorry” or “accident” in the Enron Email corpus - half a million real email messages from a large corporation. “Sorry, I forgot to CC you his final offer” “Oops, I forgot to send it to Vince.” “Adding John to the discussion…..(sorry John)” “Sorry....missed your name on the cc: list!”. More frequent than expected at least 9.27% of the users forgot to add a desired email recipient. At least 20.52% of the users were not included as recipients (even though they were intended recipients) in at least one received message. Lowerbound

Two Ranking Tasks TO+CC+BCC Prediction CC+BCC Prediction

Models Non-textual Models Expert Finding Models [Balog et al, 2006] Frequency only Recency only Expert Finding Models [Balog et al, 2006] M1: Candidate Model M2: Document Model Rocchio (TFIDF) K-Nearest Neighbors (KNN) Rank Aggregation of the above

Non-Textual Models Frequency model Recency Model Rank by total number of messages in training set Recency Model Exponential decay on chronologically ordered messages.

Expert Search Models M1: Candidate Model [Balog et al, 2006] M2: Document Model [Balog et al, 2006] f(doc,ca) is estimated as user centric (UC) or document centric (DC)

Other Models Rocchio (TFIDF) [Joachims, 1997; Salton & Buckley, 1988] K-Nearest Neighbors [Yang & Liu, 1999]

Model Parameters Chosen from preliminary tests. Recency b = 100 [10,20,50,100,200,500] KNN, K=30 [3,5,10,20,30,40,50,100] Rocchio’s b = 0 -[0,0.1,0.25,0.5]

Data: Enron Email Collection Some good reasons: Large, half a million messages Natural work-related email, not email lists Public and free Different roles: managers, assistants, etc. Unfortunates No clear message thread information No complete Address Book information no first/last/full names of many recipients

Enron Data Preprocessing Setup a realistic temporal setup (per user) For each user, 10% (most recent) sent messages will be used as test 36 users All users had their Address Books (AB) extracted TOCCBCC CCBCC

Enron Data Preprocessing Bag-of-words representation Message were represented as the union of BOW of body and BOW of subject Removed inconsistencies and repeated messages Disambiguated Several Enron addresses Stop words removed, No stemming Self-addressed messages were removed

Threading No explicit thread information in Enron – Try to reconstruct. Build “Message Thread Set” MTS(msg) set of messages with same “subject” as the current one.

Results

Results

Results

Rank Aggregation Ranking combined by Reciprocal Rank:

Rank Aggregation Results

Observations ‘Threading’ improves MAP for all models KNN seems is best choice overall: document-model with focus on a few top docs Data Fusion method for rank aggregation improved performance significantly Base systems making different types of mistakes

Intelligent Email Auto-completion TOCCBCC CCBCC

Intelligent Email Auto-completion

Mozilla Thunderbird extension (Cut Once) Suggestions: Click to add

Mozilla Thunderbird extension (Cut Once) Interested? Just google “mozilla extension carnegie mellon” User Study using Cut Once Instead…write-then-address behavior

Can we do better ranking? Learning to Rank: machine learning to improve ranking Feature-based ranking function Many recently proposed methods: RankSVM ListNet RankBoost Perceptron Variations Online, scalable. [Joachims, KDD-02] [Cao et al., ICML-07] [Freund et al, 2003] [Elsas, Carvalho & Carbonell, WSDM-08]

Learning to Rank Recipients Ranking scores as features Textual Scores (KNN) Network Scores Frequency score Recency score Co-Occurrence Features Combine textual scores with other “network” features Textual Feature (KNN scores) Network Features

Learning to Rank Recipients: Results

Conclusions Problem: Predicting recipients of email messages Useful for email auto-completion, finding related people, and management addressing errors Evidence from Large email collection 2 subtasks: TOCCBCC and CCBCC Various models: KNN best model in general Rank Aggregation improved performance Improvements in Email-auto completion Thunderbird Extension (Cut Once)* Promising Results on learning to rank recipients*

Thank you

Thank you

Comments (Thanks, reviewers!) No account for email structural info (body ≠ subject ≠ quoted) Identifying Name entities (“Dear Mr. X”, etc.) Implicitly doing, but could be better Enron did not provide many first/last names Fair estimation of f(doc,ca) on email? Might explain weaker performance of M2 models.