Intelligent Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by Nareg Torosian
What’s the use? Whittaker & Sidner’s “ overload” Task management Personal archiving Asynchronous communication Assist overwhelmed users Support enhanced interface
Intelligent? How? Prediction tasks treated as binary classification problems Binary vector, where each dimension represents a feature Learning performed with logistic regression System evaluated using F 1, harmonic mean of precision and recall Single-user (adaptive) and cross-user (adaptable) settings
Reply prediction Indicate which messages require reply Allow user to manage these messages
Reply prediction features Relational features Based on user profile # of sent and received messages, address book, address and domain I appear in the CC list, I frequently reply to this user, etc. 200 in Dredze et al.’s experiment Document features Presence of question marks and question words TF-IDF (term frequency – inverse document frequency) scores Presence of attachments 14,800 in Dredze et al.’s experiment
The grand experiment Evaluated on 4 user mailboxes Users manually tagged messages as either needs reply or does not need reply “It is not surprising that overwhelmed users acknowledge that a message did require their reply even though they failed to do so; classifiers trained on actual user reply behavior are thus very poor.” 2,391 total s, excluding spam 80/20 train/test split
The single-user results
The cross-user results Only relational features were effective, so others omitted
Attachment prediction “See attachment…hey, wait a minute…” Possible UI considerations Document sidebar Alert user before sending Indicate which messages need attachments
Attachment prediction features Relational features Based on user profile # of sent and received messages, # of attachments, address and domain Conjunctions between volume of messages/attachments and TO/CC fields 72 in Dredze et al.’s experiment Document features Presence and placement of “attach” Presence of attachments 39,308 in Dredze et al.’s experiment
The grander experiment Evaluated on publicly available Enron corpus 150 users and 250,000 s Lots of cleanup needed Users manually tagged messages as needs attachment Only popular document formats Forwarded messages excluded Subset of 15,000 messages from 144 users 1,020 with attachments 10-fold cross validation
The results
GUEPs and CDs GUEPs Mental model Improvement Consistency CDs Premature commitment Hidden dependencies Abstraction Consistency Provisionality