Download presentation
Presentation is loading. Please wait.
Published byAleesha Gibson Modified over 8 years ago
1
Mining Social Network for Personalized Email Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon
2
2 Outline Problem Description Approaches Experiments Contributions
3
3 Problem Description Email Overload is severe problem Identifying Importance of email will alleviate email overload Challenges No access to other people’s emails and labels Personalized labeling is time consuming The same message may have different priority labels for different recipients We want to leverage the sparse training data by using social network of each user Sparse Training Data
4
4 Outline Problem Description Approaches Social Clustering Social Importance Semi-supervised Importance Propagation Experiments Conclusion and Future Work
5
5 Social Clustering – Motivation Personal Email Inbox Lots of unlabeled emails No privacy issue Observations The sender can be important Some senders are not appeared in the training set at all or very few instances Need generalization of sender Let’s find similar senders from social network
6
6 Social Clustering – Contact Network Personal Contact Network G =(V,E ) All the network is constructed from personal inbox 35412 Agent /Person
7
11 Social Clustering – Newman Clustering Newman Clustering Algorithm [Newman, 04] Find social cliques or cohesive social groups Based on edge betweeness The number of shortest path that go through the edge / the total number of shortest path Drop edges from highest edge betweeness Hard clustering 1 23 4 56 9 4444 Group AGroup B
8
Social Clustering – Validations 8 Clusters are coherent!
9
Social Clustering – Feature Incorporation Extended Vector Space text: social network: combined: The combined vector space is used as enriched feature set to the email prioritizer 9
10
10 Social Importance – Motivations Social Importance A person in the center of a cluster might be more important than others Betweeness Edge betweeness for Newman Clustering Vertex betweeness The degree of communication bottleneck from social network Contact points among the network Might be important person We may try other kinds of social importance metrics too
11
11 Social Importance – Metrics Metrics Degree (in, out, total) [Wasserman and Faust, 94] Clique Counts (ClqCnt) [Wasserman and Faust, 94] The number of clique sub-graphs which contain a node v Betweeness (BetCent) [Freeman, 77] HITS Authority (Authority) [Kleinberg, 99] λ: the greatest Eigen value r : the Eigen vector similar to PageRank scores Neighborhood Connectivity (“Clustering Coefficient”, ClustCoef) [Boykin and Roychowdhury, 05] measure the connectivity among the neighbor of a node v
12
Social Importance – Validations Correlation coefficients with priority levels 12
13
SIP- Motivations Semi-supervised Importance Propagation (SIP) Can we propagate importance labels? Bi-partite graph, Labels only in Emails 13 Agent /Person Emails 432?? ? ????
14
SIP- Email Network A: Sender to Emails (N x M) B T : Email to Recipients (M x N) x k : k th importance labels for emails(M x 1) y k =Bx k (N x 1) 14 Agent /Person Emails 432?? ? ????
15
SIP - Algorithm Problems of the above propagation may not be irreducible is insensitive to (not personalized) Apply Personalized PageRank with Normalize and column-wise normalize C :C’ 15
16
16 Outline Problem Description Approaches Experiments Contributions
17
Collected Data 25 subjects are recruited from Canegie Mellon University 7 users who submitted more than 200 emails 1 faculty, 2 staffs, 4 students 17 Experiments – Data Collection TrainingTesting time
18
18 Experiments – Metrics Mean Absolute Error (MAE) 1.0 MAE means on average the prediction is deviated from the truth by one priority level MAE considers the difference among the errors It ranges from 0 to 4 when we use five importance level 1 vs. 5 and 4 vs. 5 Micro-MAE Pooling the test instances from all users to obtain a joint test set Macro-MAE Compute each user MAE first and then take the average of per-user MAE
19
Experiments – Setups Features : four subsets Basic Feature (BF) : from, to, cc, title, body Newman Clustering (NC) Social Importance (SI) Semi-supervised Importance Propagation (SIP) Ten times random shuffling among training data Linear SVM 10 Fold C.V. for parameter tuning Tuned regularization parameter [10 -3.. 10 3 ] 19
20
Experiments – Results 20
21
21 Contributions The first study on personalized email prioritization Using statistical classification and clustering Based on fine-grained personal judgments with multiple users Enriched representation through personal Social Network Social Clustering Social Importance Estimation Semi-supervised Importance Propagation Fully personalized methodology Technical development and Evaluation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.