Download presentation
Presentation is loading. Please wait.
Published byMelissa Lawson Modified over 9 years ago
1
Mining Social Networks for Personalized Email Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2009/08/25
2
Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 2
3
Introduction Email – One of the most prevalent personal and business communication tools – Asynchronous Process a large volume of email messages of differing importance is BURDEN! 3
4
Introduction Information overload problem – Need to develop systems that automatically learn personal priorities for each user Identify personally interesting Identify important messages for user’s attention 4
5
Introduction Many statistical learning techniques have been studied in support of Email-based prediction tasks Spam identification, folder recommendation, recipient reminding, action-item identification, social group analysis BUT, Personalized email prioritization – Remains an under-explored problem – Mainly due to privacy issues in collecting personal data 5
6
Introduction This paper – Create a new collection of anonymized personal email data with importance levels – Proposed a fully personalized methodology for technical development and evaluation – Developed a supervised classification framework For model personal priorities over messages, and predicting importance levels for new messages 6
7
Outline Introduction Social Clustering Measuring Social Importance Simi-supervised Importance Propagation Experiments Conclusions and Future work 7
8
Motivation Sender information – One of most indicative features – Messages sent by the members of the same group tend to share similar priority level – Capturing sender groups would be informative for predicting the importance of messages If a sender who does not have any labeled instances – Based on unsupervised clustering, infer that user’s importance from other group members 8
9
Personalized Social Network For each user, a personalized social network is – constructed by using the email data of that user Practicality Personalization Email contact network – Represent by graph G=(V, E) V: email contacts (users) E: message sending among users, un-weighted (E ij =1 if there is a message from user i to user j, E ij =0 otherwise.) 9
10
Clustering Newman Clustering – Be used to successfully find social structures – Defines edge-betweenness A link has a high score means that the link is crucial between two boundary nodes of two clusters – Delete links with high edge-betweenness scores, results in disconnect components as clusters 10 A B E D C F G H I J L R
11
Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 11
12
Measuring Social Importance Link relations provides useful information about the centrality of each contact 12
13
Measuring Social Importance In-degree centrality Out-degree centrality Total-degree centrality 13 B C D A E
14
Measuring Social Importance Clustering Coefficient – Measure connectivity among the neighborhood of the node Clique Count – Clique: fully connected sub-graph – A large clique count of node v means It connects to large and well-connected sub-graphs It is located in the center of the sub-graphs 14 B C D A E F
15
Measuring Social Importance Betweenness centrality – Percentage of existing shortest paths out of all possible paths that goes through the node v σ jk : number of shortest path between j and k σ jk (i) : number of shortest path between j and k that goes through i 15
16
Measuring Social Importance HITS Authority – Hyperlink-Induced Topic Search, also known as Hubs and authorities – measures the global importance of node – Definition: Adjacency matrix X N-by-N, can be calculated by Finding the principle eigenvector r of matrix, where r satisfies, λ is the largest eigenvalue 16
17
Measuring Social Importance PCC Analysis – Pearson Correlation Coefficient – Compute PCC of each social metric with human-labeled importance levels of email messages – Indicative about “How useful each metric for predicting the importance of email messages” 17
18
Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 18
19
Semi-supervised Importance Propagation Semi-supervised Importance Propagation (SIP) – Propagate the importance values of labeled email messages (the training examples) to other messages and corresponding contact persons 19
20
SIP Algorithm Use a bipartite graph – to represent the interactions between email contacts and email messages Let N = number of email contacts, M = number of messages Using matrix to represent two types of edge, matrix A (N by M) and matrix B (N by M) – A i,j =1 if person i sends message j, and A i,j =0 otherwise – B i,j =1 if person i received message j, and B i,j =0 otherwise 20
21
SIP Algorithm Treat each importance label (1~5) as a category Use vector (M by 1) to indicate the labels of message, – x k,i =1 if message i belongs to category k, x k,i =0 otherwise Importance propagation from messages to persons (receivers) is calculated as Importance propagation from persons (senders) to messages is calculated as 21
22
Propagation Example 22 ? ? ? ? ? 4 3 2 ? ? Messages to persons (receivers) Persons (senders) to messages
23
SIP Algorithm Updating of the importance values for contact persons at each time step (t) is calculated by: 23 ? ? ? ? ? 4 3 2 ? ?
24
SIP Algorithm is a linear transformation of If is irreducible, and t is large stabilizes at the principal eigenvector of C – Irreducible property is not always guaranteed – If so, its principal eigenvector is insensitive to the starting vector 24
25
SIP Algorithm A linear interpolation – Define, and normalize by sum of vector – Define importance-sensitive matrix columns are identical, each column is equivalent to – Normalize matrix C to C’ α = [0,1] E k is irreducible and importance-sensitive 25
26
SIP Algorithm Finally, – SIP method is define iteratively as: ( ) – E k is irreducible, y k stabilizes when t is large – y k consists of the expected importance score of each person after iterative SIP 26
27
Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 27
28
Experiments Data – Recruited 25 experimental subjects – Each subjects was requested to label non-spam messages Preprocessing – Email address canonicalization – Word tokenization and stemming didn’t remove stop words from title and body text 28
29
Experiments Features – Basic features are tokens in from, to, cc, title, and body text, use a v-dimensional vector to represent – Social-network based features Use a m-dimensional sub-vector to represent NC features Sub-vector (7-dims) to represent the social importance (SI) – 5-dimensional sub-vector to represent five SIP scores per user 29
30
Experiments Classifiers – Use five linear SVM classifiers for prediction of importance level per email message – Use the standard SVM light software package Metric N = number of messages yi = the true importance level of message i = the predicted importance level for that message 30
31
Experiments 31
32
Conclusions and Future Work Future work – Collection of more data from a larger number of users in a longer time period – Comparative study on different clustering algorithms, and graph-mining techniques with respect to effectiveness 32
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.