Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon

2 Outline Problem Description Approaches Experiments Contributions

3 Problem Description Email Overload is severe problem Identifying Importance of email will alleviate email overload Challenges  No access to other people’s emails and labels  Personalized labeling is time consuming  The same message may have different priority labels for different recipients  We want to leverage the sparse training data by using social network of each user Sparse Training Data

4 Outline Problem Description Approaches Social Clustering Social Importance Semi-supervised Importance Propagation Experiments Conclusion and Future Work

5 Social Clustering – Motivation Personal Email Inbox  Lots of unlabeled emails  No privacy issue Observations The sender can be important Some senders are not appeared in the training set at all or very few instances Need generalization of sender  Let’s find similar senders from social network

6 Social Clustering – Contact Network Personal Contact Network  G =(V,E )  All the network is constructed from personal inbox 35412 Agent /Person

11 Social Clustering – Newman Clustering Newman Clustering Algorithm [Newman, 04]  Find social cliques or cohesive social groups  Based on edge betweeness The number of shortest path that go through the edge / the total number of shortest path Drop edges from highest edge betweeness  Hard clustering 1 23 4 56 9 4444 Group AGroup B

Social Clustering – Validations 8 Clusters are coherent!

Social Clustering – Feature Incorporation Extended Vector Space  text: social network:  combined:  The combined vector space is used as enriched feature set to the email prioritizer 9

10 Social Importance – Motivations Social Importance  A person in the center of a cluster might be more important than others  Betweeness Edge betweeness for Newman Clustering Vertex betweeness  The degree of communication bottleneck from social network  Contact points among the network  Might be important person  We may try other kinds of social importance metrics too

11 Social Importance – Metrics Metrics  Degree (in, out, total) [Wasserman and Faust, 94]  Clique Counts (ClqCnt) [Wasserman and Faust, 94] The number of clique sub-graphs which contain a node v  Betweeness (BetCent) [Freeman, 77]  HITS Authority (Authority) [Kleinberg, 99] λ: the greatest Eigen value r : the Eigen vector  similar to PageRank scores  Neighborhood Connectivity (“Clustering Coefficient”, ClustCoef) [Boykin and Roychowdhury, 05] measure the connectivity among the neighbor of a node v

Social Importance – Validations Correlation coefficients with priority levels  12

SIP- Motivations Semi-supervised Importance Propagation (SIP) Can we propagate importance labels?  Bi-partite graph, Labels only in Emails 13 Agent /Person Emails 432?? ? ????

SIP- Email Network A: Sender to Emails (N x M) B T : Email to Recipients (M x N) x k : k th importance labels for emails(M x 1) y k =Bx k (N x 1) 14 Agent /Person Emails 432?? ? ????

SIP - Algorithm Problems of the above propagation  may not be irreducible  is insensitive to (not personalized) Apply Personalized PageRank with  Normalize and column-wise normalize C :C’  15

16 Outline Problem Description Approaches Experiments Contributions

Collected Data  25 subjects are recruited from Canegie Mellon University  7 users who submitted more than 200 emails  1 faculty, 2 staffs, 4 students 17 Experiments – Data Collection TrainingTesting time

18 Experiments – Metrics Mean Absolute Error (MAE)   1.0 MAE means on average the prediction is deviated from the truth by one priority level  MAE considers the difference among the errors It ranges from 0 to 4 when we use five importance level 1 vs. 5 and 4 vs. 5  Micro-MAE Pooling the test instances from all users to obtain a joint test set  Macro-MAE Compute each user MAE first and then take the average of per-user MAE

Experiments – Setups Features : four subsets  Basic Feature (BF) : from, to, cc, title, body  Newman Clustering (NC)  Social Importance (SI)  Semi-supervised Importance Propagation (SIP) Ten times random shuffling among training data Linear SVM  10 Fold C.V. for parameter tuning  Tuned regularization parameter [10 -3.. 10 3 ] 19

Experiments – Results 20

21 Contributions The first study on personalized email prioritization  Using statistical classification and clustering  Based on fine-grained personal judgments with multiple users Enriched representation through personal Social Network  Social Clustering  Social Importance Estimation  Semi-supervised Importance Propagation Fully personalized methodology  Technical development and Evaluation

Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Similar presentations

Presentation on theme: "Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Similar presentations

Presentation on theme: "Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae."— Presentation transcript:

Similar presentations

About project

Feedback