KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu
Background Statistics show that s have been ubiquitous at the workplace With the availability of additional online social media, information overload has become problematic Use s daily or several times each week: 97% s being “essential” for their everyday work: 71% --- Institute for the Future (Bowes 2000) US workers average 49 minutes a day managing , and 25% spend more than one hour per day on that task (Gartner 2001). 2
personal-analytics-of-my-life/ 3
4
5
Motivation Help users boost productivity Summarize their work areas automatically Keep track of past and on-going collaborations Prioritize work-related tasks 6
Problem Formulation Given: a user’s s Find: the user’s work profile -- a set of work areas Constraints: Unsupervised (or semi-supervised later on) Effectiveness in providing insights Computation efficiency Teaching class, homework, score Alice, Bob, Charlie Teaching class, homework, score Alice, Bob, Charlie Research , mining, data, paper Hongxia, Yan Research , mining, data, paper Hongxia, Yan Advising meeting, report, draft Dane, Ellen, Flint Advising meeting, report, draft Dane, Ellen, Flint Grants project, proposal, grant, due Sarah, Tim Grants project, proposal, grant, due Sarah, Tim 7
Traditional Community Finding 8
Community (i.e. Work Area) Two aspects people people (whom you collaborate with) task task (what you collaborate on) 9
The Data 10
Data Preprocessing People ( accounts) Disregarded roles, only considered occurrence Content (subject + body) Removed punctuations and stop words; Words are stemmed; Documents converted into bag of words. Unused: Replicate messages; Time-stamps; Attachments; 11
Topic Models: A Bayesian Approach Assume: a topic is a unique distribution of words a document has a mixture of topics documents are generated by sampling from topics and words 12
Latent Dirichlet Allocation (Blei et al., 2003) 13
COllaborator COMmunity Profiling Model (COCOMP) 14
Enron s 15
16
Social Messages 17
18
19
Summary COCOMP: a latent community model Each social media document corresponds to a sharing activity within a community. A community is represented with a list of top participants and associated list of topics. Experiments on and social media datasets demonstrate interesting results. Future work Different sources of data with the same user Evolution over time with incremental learning Scalable inference with user feedback 20
Thank You! 21