Presentation is loading. Please wait.

Presentation is loading. Please wait.

KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.

Similar presentations


Presentation on theme: "KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu."— Presentation transcript:

1 KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu

2 Background Statistics show that emails have been ubiquitous at the workplace With the availability of additional online social media, information overload has become problematic Use emails daily or several times each week: 97% Emails being “essential” for their everyday work: 71% --- Institute for the Future (Bowes 2000) US workers average 49 minutes a day managing email, and 25% spend more than one hour per day on that task (Gartner 2001). 2

3 http://blog.stephenwolfram.com/2012/03/the- personal-analytics-of-my-life/ 3

4 4

5 5

6 Motivation Help users boost productivity Summarize their work areas automatically Keep track of past and on-going collaborations Prioritize work-related tasks 6

7 Problem Formulation Given: a user’s emails Find: the user’s work profile -- a set of work areas Constraints: Unsupervised (or semi-supervised later on) Effectiveness in providing insights Computation efficiency Teaching class, homework, score Alice, Bob, Charlie Teaching class, homework, score Alice, Bob, Charlie Research email, mining, data, paper Hongxia, Yan Research email, mining, data, paper Hongxia, Yan Advising meeting, report, draft Dane, Ellen, Flint Advising meeting, report, draft Dane, Ellen, Flint Grants project, proposal, grant, due Sarah, Tim Grants project, proposal, grant, due Sarah, Tim 7

8 Traditional Community Finding 8

9 Community (i.e. Work Area) Two aspects people people (whom you collaborate with) task task (what you collaborate on) 9

10 The Email Data 10

11 Data Preprocessing People (email accounts) Disregarded roles, only considered occurrence Content (subject + body) Removed punctuations and stop words; Words are stemmed; Documents converted into bag of words. Unused: Replicate messages; Time-stamps; Attachments; 11

12 Topic Models: A Bayesian Approach Assume: a topic is a unique distribution of words a document has a mixture of topics documents are generated by sampling from topics and words 12

13 Latent Dirichlet Allocation (Blei et al., 2003) 13

14 COllaborator COMmunity Profiling Model (COCOMP) 14

15 Enron Emails 15

16 16

17 Social Messages 17

18 18

19 19

20 Summary COCOMP: a latent community model Each social media document corresponds to a sharing activity within a community. A community is represented with a list of top participants and associated list of topics. Experiments on email and social media datasets demonstrate interesting results. Future work Different sources of data with the same user Evolution over time with incremental learning Scalable inference with user feedback 20

21 Thank You! 21


Download ppt "KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu."

Similar presentations


Ads by Google