Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

ECG Signal processing (2)
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Mauro Sozio and Aristides Gionis Presented By:
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Analysis and Modeling of Social Networks Foudalis Ilias.
An Introduction of Support Vector Machine
Based on chapter 3 in Networks, Crowds and markets (by Easley and Kleinberg) Roy Mitz Supervised by: Prof. Ronitt Rubinfeld November 2014 Strong and weak.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Introduction of Probabilistic Reasoning and Bayesian Networks
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
1 Topology Control of Multihop Wireless Networks Using Transmit Power Adjustment Infocom /12/20.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Isolated-Word Speech Recognition Using Hidden Markov Models
by B. Zadrozny and C. Elkan
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.
Predicting Positive and Negative Links in Online Social Networks
Aemen Lodhi (Georgia Tech) Amogh Dhamdhere (CAIDA)
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Machine Learning 5. Parametric Methods.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
MDL Principle Applied to Dendrites and Spines Extraction in 3D Confocal Images 1. Introduction: Important aspects of cognitive function are correlated.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
An Enhanced Support Vector Machine Model for Intrusion Detection
Categorizing networks using Machine Learning
Using Friendship Ties and Family Circles for Link Prediction
Network Science: A Short Introduction i3 Workshop
Pattern Recognition and Image Analysis
3.3 Network-Centric Community Detection
Korea University of Technology and Education
Presenter: Donovan Orn
Presentation transcript:

Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized and presented by Kim Chungrim

Page 2 Contents Introduction Motivation Inferring Social Networks –Dataset –Constructing Thresholded Networks Network Descriptive Statistics –Network level features –Node level features Network-based Prediction –Node Status / Gender –Future Communication / Community Detection Discussion/Conclusion

Page 3 The rapidly growing volume of electronic communication data has been a great benefit to social network analysis. However, social network analysts have found out that there are two problems: –Inference problem : the “real” social ties are not directly observable and hence must be inferred from observation of events –Relevance problem : there is no one “true” social network, but rather many such networks, each corresponding to a different definition of a tie, and each relevant to different social processes According on the definition of an ‘edge’, a network can have different meanings –1) An edge exists between I and j if either has communicated with the other at least once in the past year –2) An edge exists if each has communicated with the other at least once in the past week –3) An edge exists if each has communicated with the other at least once per week for the past year Which of these networks is the “relevant” one depends on the research question of interest I NTRODUCTION

Page 4 Motivation Define a minimum threshold  on a network threshold To infer networks for various definitions of “threshold” over a tie Study the impact of different thresholded networks on: –Descriptive statistics –Ability of the network in predicting node characteristics

Page 5 Inferring Social Networks - Datasets University –A compiled registry of all associated with individuals at a large university –Duration : 2 years (6 Trimester) –Number of users : 19,817 –Number of s : 1.09M –Disregard s involving non-university domain –A node contains information about a person : id, gender, position, etc Enron –A repository of the s exchanged internally among the employees at Enron –Duration : 4 years –Number of users : 4,736 –Number of s : 1.06M –A node contains information about a person : id, position, etc

Page 6 Inferring Social Networks - Constructing Thresholded Networks Edge definition –Geometric mean of the annualize rate of messages exchanged Edge threshold –Minimum of  s between each pair of individuals, over a period of time T –A social graph G(V,E;  ) s.t. –A Family of networks: {G( ), G( ), …, G( )}

Page 7 Network Descriptive Statistics – Network Level Features Network density: –Number of edges –Number of connected nodes –Number of components –Relative Sizes of Components

Page 8 Network Descriptive Statistics – Network Level Features

Page 9 Network Descriptive Statistics – Node Level Features Reach of a node: –Node degree : –Average Neighbor Degree : The average degree over all of a nodes neighbor –Size of Two-hop Neighborhood : count of all of the node’s neighbors plus all of the node’s neighbors’ neighbors

Page 10 Network Descriptive Statistics – Node Level Features Closure of the ego-network: –Embeddedness –Normalized clustering coefficient

Page 11 Network Descriptive Statistics – Node Level Features To what extent does a node “bridge” communities: –Network constraint [Burt ‘04] –Number of ego components : count of the number of connected components that rema in when the focal node and its incident edges are removed

Page 12 Network-based Prediction The characteristics of a network depends on the threshold Which network to choose for an experiment? Experiment to find out the right threshold for various research interest – Predictions on Node Status/Gender – Predictions on Future communication activity – Predictions on Community detection

Page 13 Prediction Tasks: Node Status/Gender Given feature vector, A Feature matrix is built using the feature vectors for each node i, and a vector of status/gender attribute of each node i is constructed. The and are split into training set and test se –Training set : 90% of the and –Test set : 10 % of the and Using SVM with Gaussian RBF kernel, learn parameters & kernel width with 10-fold cross-validation

Page 14 Prediction Tasks: Future Communication / Community Detection Given a feature vector Where is the activity of node j from time t0 to tm and is the activity of node I at the time tl. The model of communication activity can be expressed as a function –The best-fit regression coefficient is used to predict the future node activity Fit a stochastic block model to G(  ) using variational Bayes inference [Hofman et al. 2008]

Page 15 Experimental Result – University Dataset

Page 16 Experimental Results – Enron Dataset

Page 17 Conclusion It is hard to find the optimal threshold –Accuracy maximized at non-obvious point –Still, accuracy is improved 30% than the unthresholded network –Deleting edges removes noise Optimal threshold at consistent value –For different prediction tasks –For different data sets

Page 18 Summary / Discussion / Future work Network inference procedure assumes ad-hoc edge filtering Introduced a threshold on edges and a family of Networks to find a optimal threshold for a certain prediction task –The prediction accuracies peak in a non-obvious yet relatively narrow threshold range Tested on too few datasets Not enough to give a solid conclusion Apply method to variety of networks Test various thresholds for more interests