Mining Social Networks for Personalized Email Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Text-Based Measures of Document Diversity Date : 2014/02/12 Source : KDD’13 Authors : Kevin Bache, David Newman, and Padhraic Smyth Advisor : Dr. Jia-Ling,
1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Mining and Searching Massive Graphs (Networks)
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Proceedings of the 2007 SIAM International Conference on Data Mining.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
Link Analysis HITS Algorithm PageRank Algorithm.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.
Improving Spam Detection Based on Structural Similarity By Luiz H. Gomes, Fernando D. O. Castro, Rodrigo B. Almeida, Luis M. A. Bettencourt, Virgílio A.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Date: 2014/05/27 Author: Xiangnan Kong, Bokai Cao, Philip S. Yu Source: KDD’13 Advisor: Jia-ling Koh Speaker: Sheng-Chih Chu Multi-Label Classification.
Network Community Behavior to Infer Human Activities.
Post-Ranking query suggestion by diversifying search Chao Wang.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Classification using Co-Training
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Presentation prepared by Yehonatan Cohen and Danny Hendler Some of the slides based on the online book “Social media mining” Danny Hendler Advanced Topics.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Graph clustering to detect network modules
Finding Dense and Connected Subgraphs in Dual Networks
Semi-Supervised Clustering
Section 7.12: Similarity By: Ralucca Gera, NPS.
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
“Traditional” image segmentation
Heterogeneous Graph Attention Network
Presentation transcript:

Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2009/08/25

Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 2

Introduction – One of the most prevalent personal and business communication tools – Asynchronous Process a large volume of messages of differing importance is BURDEN! 3

Introduction Information overload problem – Need to develop systems that automatically learn personal priorities for each user Identify personally interesting Identify important messages for user’s attention 4

Introduction Many statistical learning techniques have been studied in support of -based prediction tasks Spam identification, folder recommendation, recipient reminding, action-item identification, social group analysis BUT, Personalized prioritization – Remains an under-explored problem – Mainly due to privacy issues in collecting personal data 5

Introduction This paper – Create a new collection of anonymized personal data with importance levels – Proposed a fully personalized methodology for technical development and evaluation – Developed a supervised classification framework For model personal priorities over messages, and predicting importance levels for new messages 6

Outline Introduction Social Clustering Measuring Social Importance Simi-supervised Importance Propagation Experiments Conclusions and Future work 7

Motivation Sender information – One of most indicative features – Messages sent by the members of the same group tend to share similar priority level – Capturing sender groups would be informative for predicting the importance of messages If a sender who does not have any labeled instances – Based on unsupervised clustering, infer that user’s importance from other group members 8

Personalized Social Network For each user, a personalized social network is – constructed by using the data of that user Practicality Personalization contact network – Represent by graph G=(V, E) V: contacts (users) E: message sending among users, un-weighted (E ij =1 if there is a message from user i to user j, E ij =0 otherwise.) 9

Clustering Newman Clustering – Be used to successfully find social structures – Defines edge-betweenness A link has a high score means that the link is crucial between two boundary nodes of two clusters – Delete links with high edge-betweenness scores, results in disconnect components as clusters 10 A B E D C F G H I J L R

Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 11

Measuring Social Importance Link relations provides useful information about the centrality of each contact 12

Measuring Social Importance In-degree centrality Out-degree centrality Total-degree centrality 13 B C D A E

Measuring Social Importance Clustering Coefficient – Measure connectivity among the neighborhood of the node Clique Count – Clique: fully connected sub-graph – A large clique count of node v means It connects to large and well-connected sub-graphs It is located in the center of the sub-graphs 14 B C D A E F

Measuring Social Importance Betweenness centrality – Percentage of existing shortest paths out of all possible paths that goes through the node v σ jk : number of shortest path between j and k σ jk (i) : number of shortest path between j and k that goes through i 15

Measuring Social Importance HITS Authority – Hyperlink-Induced Topic Search, also known as Hubs and authorities – measures the global importance of node – Definition: Adjacency matrix X N-by-N, can be calculated by Finding the principle eigenvector r of matrix, where r satisfies, λ is the largest eigenvalue 16

Measuring Social Importance PCC Analysis – Pearson Correlation Coefficient – Compute PCC of each social metric with human-labeled importance levels of messages – Indicative about “How useful each metric for predicting the importance of messages” 17

Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 18

Semi-supervised Importance Propagation Semi-supervised Importance Propagation (SIP) – Propagate the importance values of labeled messages (the training examples) to other messages and corresponding contact persons 19

SIP Algorithm Use a bipartite graph – to represent the interactions between contacts and messages Let N = number of contacts, M = number of messages Using matrix to represent two types of edge, matrix A (N by M) and matrix B (N by M) – A i,j =1 if person i sends message j, and A i,j =0 otherwise – B i,j =1 if person i received message j, and B i,j =0 otherwise 20

SIP Algorithm Treat each importance label (1~5) as a category Use vector (M by 1) to indicate the labels of message, – x k,i =1 if message i belongs to category k, x k,i =0 otherwise Importance propagation from messages to persons (receivers) is calculated as Importance propagation from persons (senders) to messages is calculated as 21

Propagation Example 22 ? ? ? ? ? ? ? Messages to persons (receivers) Persons (senders) to messages

SIP Algorithm Updating of the importance values for contact persons at each time step (t) is calculated by: 23 ? ? ? ? ? ? ?

SIP Algorithm is a linear transformation of If is irreducible, and t is large stabilizes at the principal eigenvector of C – Irreducible property is not always guaranteed – If so, its principal eigenvector is insensitive to the starting vector 24

SIP Algorithm A linear interpolation – Define, and normalize by sum of vector – Define importance-sensitive matrix columns are identical, each column is equivalent to – Normalize matrix C to C’ α = [0,1] E k is irreducible and importance-sensitive 25

SIP Algorithm Finally, – SIP method is define iteratively as: ( ) – E k is irreducible, y k stabilizes when t is large – y k consists of the expected importance score of each person after iterative SIP 26

Outline Introduction Social Clustering Measuring Social Importance Semi-supervised Importance Propagation Experiments Conclusions and Future work 27

Experiments Data – Recruited 25 experimental subjects – Each subjects was requested to label non-spam messages Preprocessing – address canonicalization – Word tokenization and stemming didn’t remove stop words from title and body text 28

Experiments Features – Basic features are tokens in from, to, cc, title, and body text, use a v-dimensional vector to represent – Social-network based features Use a m-dimensional sub-vector to represent NC features Sub-vector (7-dims) to represent the social importance (SI) – 5-dimensional sub-vector to represent five SIP scores per user 29

Experiments Classifiers – Use five linear SVM classifiers for prediction of importance level per message – Use the standard SVM light software package Metric N = number of messages yi = the true importance level of message i = the predicted importance level for that message 30

Experiments 31

Conclusions and Future Work Future work – Collection of more data from a larger number of users in a longer time period – Comparative study on different clustering algorithms, and graph-mining techniques with respect to effectiveness 32