Download presentation
Presentation is loading. Please wait.
Published byAlyson Williams Modified over 9 years ago
1
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu
2
Background Statistics show that emails have been ubiquitous at the workplace With the availability of additional online social media, information overload has become problematic Use emails daily or several times each week: 97% Emails being “essential” for their everyday work: 71% --- Institute for the Future (Bowes 2000) US workers average 49 minutes a day managing email, and 25% spend more than one hour per day on that task (Gartner 2001). 2
3
http://blog.stephenwolfram.com/2012/03/the- personal-analytics-of-my-life/ 3
4
4
5
5
6
Motivation Help users boost productivity Summarize their work areas automatically Keep track of past and on-going collaborations Prioritize work-related tasks 6
7
Problem Formulation Given: a user’s emails Find: the user’s work profile -- a set of work areas Constraints: Unsupervised (or semi-supervised later on) Effectiveness in providing insights Computation efficiency Teaching class, homework, score Alice, Bob, Charlie Teaching class, homework, score Alice, Bob, Charlie Research email, mining, data, paper Hongxia, Yan Research email, mining, data, paper Hongxia, Yan Advising meeting, report, draft Dane, Ellen, Flint Advising meeting, report, draft Dane, Ellen, Flint Grants project, proposal, grant, due Sarah, Tim Grants project, proposal, grant, due Sarah, Tim 7
8
Traditional Community Finding 8
9
Community (i.e. Work Area) Two aspects people people (whom you collaborate with) task task (what you collaborate on) 9
10
The Email Data 10
11
Data Preprocessing People (email accounts) Disregarded roles, only considered occurrence Content (subject + body) Removed punctuations and stop words; Words are stemmed; Documents converted into bag of words. Unused: Replicate messages; Time-stamps; Attachments; 11
12
Topic Models: A Bayesian Approach Assume: a topic is a unique distribution of words a document has a mixture of topics documents are generated by sampling from topics and words 12
13
Latent Dirichlet Allocation (Blei et al., 2003) 13
14
COllaborator COMmunity Profiling Model (COCOMP) 14
15
Enron Emails 15
16
16
17
Social Messages 17
18
18
19
19
20
Summary COCOMP: a latent community model Each social media document corresponds to a sharing activity within a community. A community is represented with a list of top participants and associated list of topics. Experiments on email and social media datasets demonstrate interesting results. Future work Different sources of data with the same user Evolution over time with incremental learning Scalable inference with user feedback 20
21
Thank You! 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.