Modeling Intention in Email : Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)

Boosting Approach to ML

CMPUT 466/551 Principal Source: CMU

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.

& A Recommendation System for Recipients Vitor R. Carvalho and William W. Cohen, Carnegie Mellon University March 2007 Preventing Leaks.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.

Machine Learning for Personal Information Management William W. Cohen Machine Learning Department and Language Technologies Institute School of Computer.

Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….

Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.

Preventing Information Leaks in Vitor Text Learning Group Meeting Jan 18, 2007 – SCS/CMU.

Classifying into Acts From EMNLP-04, Learning to Classify into Speech Acts, Cohen-Carvalho-Mitchell An Act is described as a verb- noun pair.

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Online Learning Algorithms

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Modeling Intention in Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University.

Data mining and machine learning A brief introduction.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人：任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn.

Universit at Dortmund, LS VIII

Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

Learning to Classify into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series.

Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

Learning TFC Meeting, SRI March 2005 On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

NTU & MSRA Ming-Feng Tsai

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

1 Modeling Intention in Vitor R. Carvalho Ph.D. Thesis DefenseThesis Committee: Language Technologies Institute William W. Cohen (chair) School of.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Predicting Leadership Roles in Workgroups Vitor R. Carvalho, Wen Wu and William W. Cohen Carnegie Mellon University CEAS-2007, Aug 2 nd 2007.

Classification using Co-Training

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.

SIGIR, August 2005, Salvador, Brazil On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Correlation Clustering

An Empirical Study of Learning to Rank for Entity Search

Erasmus University Rotterdam

Preventing Information Leaks in

Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel

Towards a Personal Briefing Assistant

Ranking Users for Intelligent Message Addressing

CMU Y2 Rosetta GnG Distillation

Learning to Rank Typed Graph Walks: Local and Global Approaches

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Presentation transcript:

Modeling Intention in Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University

2 Other Research Interests More details at Info Extraction & Sequential Learning IJCAI-05 (Stacked Sequential Learning) CEAS-04 (Extract Signatures and Reply Lines in ) Richard Stallman, founder of the Free Software Foundation, declared today… Scalable Online & Stacked Algorithms KDD-06 (Single-pass online learning) NIPS-07 WEML (Online Stacked Graphical Learning) Keywords for Advertising & Implicit Queries WWW-06 (Finding Advertising Keywords in WebPages) CEAS-05 (Implicit Queries for )

3 Outline 1. Motivation 2. Speech Acts  Modeling textual intention in messages 3. Intelligent Addressing  Preventing information leaks  Ranking potential recipients 4. Fine-tuning Ranking Models  Ranking in two optimization steps

4 Why  The most successful e-communication application.  Great tool to collaborate, especially in different time zones.  Very cheap, fast, convenient and robust. It just works.  Increasingly popular  Clinton adm. left 32 million s to the National Archives  Bush adm….more than 100 million in 2009 (expected)  Visible impact  Office workers in the U.S. spend at least 25% of the day on – not counting handheld use [Shipley & Schwalbe, 2007]

5 Hard to manage  People get overwhelmed.  Costly interruptions  Serious impacts on work productivity  Increasingly difficult to manage requests, negotiate shared tasks and keep track of different commitments  People make horrible mistakes.  “I accidentally sent that message to the wrong person”  “Oops, I forgot to CC you his final offer”  “Oops, Did I just hit reply-to-all?” [Dabbish & Kraut, CSCW-2006]. [Belloti et al. HCI-2005]

6 Outline 1. Motivation√ 2. Speech Acts  Modeling textual intention in messages 3. Intelligent Addressing  Preventing information leaks  Ranking potential recipients 4. Fine-tuning Ranking Models  Ranking in two optimization steps

7 Example From: Benjamin Han To: Vitor Carvalho Subject: LTI Student Research Symposium Hey Vitor When exactly is the LTI SRS submission deadline? Also, don’t forget to ask Eric about the SRS webpage. Thanks. Ben Request - Information Reminder - Action/Task Prioritize by “intention” Help keep track of your tasks: pending requests, commitments, reminders, answers, etc. Better integration with to-do lists

8 Request Time/date Add Task: follow up on: “request for screen shots” by ___ days before -? 2 “next Wed” (12/5/07) “end of the week” (11/30/07) “Sunday” (12/2/07) - other -

9 Classifying into Acts [Cohen, Carvalho & Mitchell, EMNLP-04] Nouns Verbs  An Act is described as a verb- noun pair (e.g., propose meeting, request information) - Not all pairs make sense  One single message may contain multiple acts  Try to describe commonly observed behaviors, rather than all possible speech acts in English  Also include non-linguistic usage of (e.g. delivery of files)

10 Data & Features  Data: Carnegie Mellon MBA students competition  Semester-long project for CMU MBA students. Total of 277 students, divided in 50 teams (4 to 6 students/team). Rich in task negotiation.  messages (from 5 teams) were manually labeled. One of the teams was double labeled, and the inter-annotator agreement ranges from 0.72 to 0.83 (Kappa) for the most frequent acts.  Features: –N-grams: 1-gram, 2-gram, 3-gram,4-gram and 5-gram –Pre-Processing  Remove Signature files, quoted lines (in-reply-to) [Jangada package]  Entity normalization and substitution patterns:  “Sunday”…”Monday” →[day], [number]:[number] → [hour],  “me, her, him,us or them” → [me], “after, before, or during” → [time], etc

11 Error Rate for Various Acts 5-fold cross-validation over s, SVM with linear kernel [Carvalho & Cohen, HLT-ACTS-06] [Cohen, Carvalho & Mitchell, EMNLP-04]

12 Best features Ciranda : Java package for Speech Act Classification

13 Idea: Predicting Acts from Surrounding Acts Deliver Request Commit Propose Request Commit Deliver Commit Deliver Act has little or no correlation with other acts of same message Strong correlation between previous and next message’s acts Example of Thread Sequence Both Context and Content have predictive value for act classification [Carvalho & Cohen, SIGIR-05] Context: Collective classification problem

14 Collective Classification with Dependency Networks (DN) In DNs, the full joint probability distribution is approximated with a set of conditional distributions that can be learned independently. The conditional probabilities are calculated for each node given its Markov blanket. Parent Message Child Message Current Message Request Commit Deliver … …… [Heckerman et al., JMLR-00] [Neville & Jensen, JMLR-07] Inference: Temperature-driven Gibbs sampling [Carvalho & Cohen, SIGIR-05]

15 Act by Act Comparative Results Kappa values with and without collective classification, averaged over four team test sets in the leave-one-team out experiment. Modest improvements over the baseline Only on acts related to negotiation: Request, Commit, Propose, Meet, Commissive, etc.

16 Applications of Acts Iterative Learning of Tasks and Acts Predicting Social Roles and Group Leadership Detecting Focus on Threaded Discussions Automatically Classifying s into Activities [Kushmerick & Khousainov, IJCAI-05] [Leusky,SIGIR-04][Carvalho,Wu & Cohen, CEAS-07] [Feng et al., HLT/NAACL-06] [Dredze, Lau & Kushmerick, IUI-06]

17 Outline 1. Motivation √ 2. Speech Acts √  Modeling textual intention in messages 3. Intelligent Addressing  Preventing information leaks  Ranking potential recipients 4. Fine-tuning User Ranking Models  Ranking in two optimization steps

18

19

20

21

22 Preventing Info Leaks 1.Similar first or last names, aliases, etc 2.Aggressive auto- completion of addresses 3.Typos 4.Keyboard settings Disastrous consequences: expensive law suits, brand reputation damage, negotiation setbacks, etc. Leak: accidentally sent to wrong person [Carvalho & Cohen, SDM-07]

23 Preventing Info Leaks 1.Similar first or last names, aliases, etc 2.Aggressive auto- completion of addresses 3.Typos 4.Keyboard settings [Carvalho & Cohen, SDM-07] Method 1.Create simulated/artificial recipients 2.Build model for (msg.recipients): train classifier on real data to detect synthetically created outliers (added to the true recipient list). Features: textual(subject, body), network features (frequencies, co- occurrences, etc). 3.Rank potential outliers - Detect outlier and warn user based on confidence.

24 Preventing Info Leaks 1.Similar first or last names, aliases, etc 2.Aggressive auto- completion of addresses 3.Typos 4.Keyboard settings [Carvalho & Cohen, SDM-07] Rec 6 Rec 2 … Rec K Rec 5 Most likely outlier Least likely outlier P(rec t ) P(rec t ) =Probability recipient t is an outlier given “message text and other recipients in the message”.

25 Simulating Leaks Else: Randomly select an address book entry Generate a random address NOT in Address Book  Several options: –Frequent typos, same/similar last names, identical/similar first names, aggressive auto-completion of addresses, etc. We adopted the 3g-address criteria: –On each trial, one of the msg recipients is randomly chosen and an outlier is generated according to:

26 Data and Baselines Enron dataset, with a realistic setting –For each user, ~10% most recent sent messages were used as test –Some basic preprocessing Baseline methods: –Textual similarity –Common baselines in IR Rocchio/TFIDF Centroid [1971] Create a “TfIdf centroid” for each user in Address Book. For testing, rank according to cosine similarity between test msg and each centroid. Knn-30 [Yang & Chute, 1994] Given a test msg, get 30 most similar msgs in training set. Rank according to “sum of similarities” of a given user on the 30-msg set.

27 Enron Data Preprocessing ISI version of Enron –Remove repeated messages and inconsistencies Disambiguate Main Enron addresses –List provided by Corrada-Emmanuel from UMass Bag-of-words –Messages were represented as the union of BOW of body and BOW of subject Some stop words removed Self-addressed messages were removed

28 Leak Results 1 Average Accuracy in 10 trials: On each trial, a different set of outliers is generated Rocc

29 Using Network Features 1.Frequency features –Number of received, sent and sent+received messages (from this user) 2.Co-Occurrence Features –Number of times a user co- occurred with all other recipients. 3.Auto features –For each recipient R, find Rm (=address with max score from 3g-address list of R), then use score(R)-score(Rm) as feature.

30 Using Network Features 1.Frequency features –Number of received, sent and sent+received messages (from this user) 2.Co-Occurrence Features –Number of times a user co- occurred with all other recipients. 3.Auto features –For each recipient R, find Rm (=address with max score from 3g-address list of R), then use score(R)-score(Rm) as feature. Combine with text-only scores using perceptron-based reranking, trained on simulated leaks Network Features Text-based Feature

31 Leak Results [Carvalho & Cohen, SDM-07]

32 Finding Real Leaks in Enron How can we find it? –Grep for “mistake”, “sorry” or “accident” –Note: must be from one of the Enron users Found 2 good cases: 1.Message germany-c/sent/930, message has 20 recipients, leak is 2.kitchen-l/sent items/497, it has 44 recipients, leak is Prediction results: –The proposed algorithm was able to find these two leaks “Sorry. Sent this to you by mistake.”, “I accidentally sent you this reminder”

33 Not the only problem when addressing s…

34 Sometimes we just… forget an intended recipient  Particularly in large organizations, it is not uncommon to forget to CC an important collaborator: a manager, a colleague, a contractor, an intern, etc.  More frequent than expected (from Enron Collection) –at least 9.27% of the users have forgotten to add a desired recipient. –At least 20.52% of the users were not included as recipients (even though they were intended recipients) in at least one received message.  Cost of errors in task management can be high:  Communication delays, Deadlines can be missed  Opportunities wasted, Costly misunderstandings, Task delays [Carvalho & Cohen, ECIR-2008]

35 Data and Features  Two Ranking problems:  Predicting TO+CC+BCC  Predicting CC+BCC  Easy to obtain labeled data  Features  Textual: Rocchio (TfIdf) and KNN  Network (from Headers)  Frequency  # messages received and/or sent (from/to this user)  Recency  How often was a particular user addressed in the last 100 msgs  Co-Occurrence  Number of times a user co-occurred with all other recipients. Co-occurr means “two recipients were addressed in the same message in the training set”

36 Recipient Recommendation 36 Enron users [Carvalho & Cohen, ECIR-08] queries Avg: ~1267 q/user

37 Intelligent Auto-completion TOCCBCC CCBCC [Carvalho & Cohen, ECIR-08]

38 Leak warnings: hit x to remove recipient Timer: msg is sent after 10sec by default Suggestions: hit + to add Mozilla Thunderbird plug-in (Cut Once)

39 Outline 1. Motivation √ 2. Speech Acts √  Modeling textual intention in messages 3. Intelligent Addressing √  Preventing information leaks  Ranking potential recipients 4. Fine-tuning User Ranking Models  Ranking in two optimization steps

40 Recipient Recommendation 36 Enron users

41 Learning to Rank Can we do better ranking? –Learning to Rank: machine learning to improve ranking –Many recently proposed methods: RankSVM ListNet RankBoost Perceptron Variations –Online, scalable. –Learning to rank using 2 optimization steps Pairwise-based ranking framework [Elsas, Carvalho & Carbonell, WSDM-08] [Joachims, KDD-02] [Cao et al., ICML-07] [Freund et al, 2003]

42 Pairwise-based Ranking Rank q d 1 d 2 d 3 d 4 d 5 d 6... d T We assume a linear function f Goal: induce a ranking function f(d) s.t. Constraints:

43 Ranking with Perceptrons Nice convergence and mistake bounds –bound on the number of misranks Online, fast and scalable Many variants –Voting, averaging, committee, pocket, etc. –General update rule: –Here: Averaged version of perceptron [Elsas, Carvalho & Carbonell, 2008] [Collins, 2002; Gao et al, 2005]

44 Rank SVM Minimizing number of misranks (hinge loss approx.) Equivalent to maximizing AUC Lowerbound on MAP, MRR, etc. [Joachims, KDD-02], [Herbrich et al, 2000] Equivalent to: [Elsas, Carvalho & Carbonell, WSDM-08]

45 Loss Function

46 Loss Function

47 Loss Function Robust to outliers

48 Fine-tuning Ranking Models Base Ranker Sigmoid Rank Non-convex: Final model Base ranking model e.g., RankSVM, Perceptron, ListNet, etc. Minimize (a very close approximation for) the empirical error: number of misranks Robust to outliers (label noise)

49 Learning SigmoidRank Loss Learning with Gradient Descent

50 Recipient Results 36 Enron users 12.7% 1.69%-0.09% 13.2%0.96%2.07% queries Avg: ~1267 q/user p=0.74p<0.01 p=0.06p<0.01

51 Recipient Results 36 Enron users 26.7% 1.22% 0.67% 15.9% 0.77% 4.51% p<0.01 p=0.55 p<0.01

52 Recipient Results 36 Enron users 8.71% 1.78% -0.2% 12.9% 2.68% 1.53%

53 Recipient Results 36 Enron users MAP values CCBCC task

54 Set Expansion (SEAL) Results [Wang & Cohen, ICDM-2007] [18 features, ~120/60 train/test splits, ~half relevant] 1.76% 0.46% 3.22% 2.68% 0.11% 3.20% 1.94% 0.44% 1.81%

55 Letor Results [Liu et al, SIGIR-LR4IR 2007] [ #queries/#features: (106/25) (50/44) (75/44)] 41.8% 0.38% -0.2% 261% 7.86% 19.5% 21.5% 2.51% 2.02%

56 TOCCBCC Enron: user lokay-m Learning Curve A few steps to convergence - good starting point

57 Sigma Parameter

58 Conclusions acts Managing/tracking commits, requests... (semi) automatically Preventing User’s Mistakes Leaks (accidentally adding non-intended recipients) Recipient prediction (forgetting intended recipients) Mozilla Thunderbird extension Ranking in two-optimization steps Closer approximation minimizing number of misranks (empirical risk minimization framework) Robust to outliers (when compared to convex losses) Fine-tune any base learner in few steps - good starting point

59 Related Work 1 acts: –Speech Act Theory [Austin, 1962;Searle,1969] – classification: spam, folder, etc. –Dialog Acts for Speech Recognition, Machine Translation, and other dialog-based systems. [Stolcke et al., 2000] [Levin et al., 03] Typically, 1 act per utterance (or sentence) and more fine-grained taxonomies, with larger number of acts. is new domain –Winograd’s Coordinator (1987) users manually annotated with intent. –Related applications: Focus message in threads/discussions [Feng et al, 2006], Action-items discovery [Bennett & Carbonell, 2005], Activity classification [ Dredze et al., 2006], Task-focused summary [Corsten-Oliver et al, 2004], Predicting Social Roles [Leusky, 2004], etc.

60 Related Work 2 Leak –[Boufaden et al., 2005] proposed a privacy enforcement system to monitor specific privacy breaches (student names, student grades, IDs). Recipient Recommendation –[Pal & McCallum, 2006], [Dredze et al., 2008] CC Prediction problem, Recipient prediction based on summary keywords –Expert Search in [Dom et al.,2003], [Campbell et al,2003], [Balog & de Rijke, 2006], [Balog et al, 2006],[Soboroff, Craswell, de Vries (TREC- Enterprise …)]

61 Related Work 3 Ranking in two-optimization steps –[Perez-Cruz et al, 2003] similar idea for the SVM-classification context (Empirical Risk Minimization) –[Xu, Crammer & Schuurman, 2006][Krause & Singer, 2004][Zhan & Shen, 2005], etc. SVM robust to outliers and label noise –[Collobert et al, 2006], [Liu et al, 2005] convexity tradeoff

62 Thank you.