Social Network Analysis

Slides:



Advertisements
Similar presentations
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Advertisements

Clustering.
HW 4 Answers.
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Categorical Data The Case of Quran Verses
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Network Models and Data Analysis Stephen E. Fienberg Department of Statistics Machine Learning Department Machine Learning /15-781, Fall 2008.
Network histograms and universality of blockmodel approximation Sofia C. Olhede and Patrick J. Wolfe PNAS 111(41):
Machine Learning and Data Mining Clustering
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private User Profiles By Elena Zheleva, Lise Getoor Presented.
4. Ad-hoc I: Hierarchical clustering
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Clustering Unsupervised learning Generating “classes”
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Section 8 – Ec1818 Jeremy Barofsky March 31 st and April 1 st, 2010.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Flat clustering approaches
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Analysis of Social Media MLD , LTI William Cohen
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
Hierarchical Agglomerative Clustering on graphs
Semi-Supervised Clustering
CSE 4705 Artificial Intelligence
Approximating Graphs by Trees Marcin Bieńkowski University of Wrocław
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Hierarchical Clustering
Machine Learning Lecture 9: Clustering
Machine Learning and Data Mining Clustering
Community detection in graphs
Multiple Alignment and Phylogenetic Trees
Nonparametric Latent Feature Models for Link Prediction
K-means and Hierarchical Clustering
Hierarchical and Ensemble Clustering
Finding modules on graphs
Information Organization: Clustering
Why Social Graphs Are Different Communities Finding Triangles
Estimating Networks With Jumps
KAIST CS LAB Oh Jong-Hoon
Hierarchical and Ensemble Clustering
Topic models for corpora and for graphs
Topic models for corpora and for graphs
Text Categorization Berlin Chen 2003 Reference:
SEEM4630 Tutorial 3 – Clustering.
Sanguthevar Rajasekaran University of Connecticut
Machine Learning and Data Mining Clustering
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

Social Network Analysis Shan Zhong

Data 10 people were asked to manually identify all the circles to which their friends belonged. From Facebook, obtained profile and network data from 10 ego-networks, consisting of 193 circles and 4,039 users. These Facebook data has been anonymized by replacing the Facebook-internal IDs for each user with a new value.

There are many model available. Some models only consider the edge (link) between users, some only consider the profile of user, and some models are mixed. For my models, I only consider the edge/link between users (or only consider the network of users), And we will pretend we do not have the right answer for the social circles of each user. I tried several models and choose two to present alogirthm: Example for person 1 Using structure cohesion: 1.Baseline model: Hierarchical clustering 2.Mixed Membership Stochastic Blockmodels by E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing, Journal of Machine Learning Research, 2008

1.Hierarchical clustering with Agglomerative-Euclidean-max-linkage clustering approach: "Bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Hierarchical clustering seeks to build a hierarchy of clusters. Distance metric: Euclidean distance; i.e. the "ordinary" straight line distance between two points in a Euclidean space. Distance of each circle: Maximum over all pairs of users these two circles. Concept is define distance between each node first, then define the distance between each circle Then for each optimal number of circle given, there exist a smallest total circle distance.

For User 1: The computation time do not vary much for different distance defining method:

Mixed Membership Stochastic Blockmodels Still use user 1 to explain Randomly choose one point in the above matrix, there is a related p(row) and q(column). The p and q stands for nodes (which are the users). Assume we have k circle(block) in total, assume each node (user) belongs to exactly one circle (block), make it a vector Zp,(or Zq) for each node(user). Example: Zp = c(0,0,1,0) if p belongs to the third circle(block) and there are 4 circles (block) in total. B is a k*k matrix that describes the relationship between each circle (block), and B(g,h) represents the probability of having a link(is friends) between a user from group g and a user from group h. If p is in block g and q is in block h then the probability of observing an interaction from node p to node q is B(g,h). Then, the relation between p and q, or Y(p,q), is Bernoulli( t(Zp)BZq )*** The t here stands for transpose. For example, if p is in block 3 and q is in block 2 then P(Y(p, q) = 1) = B(3,2).

Assume there are K circles (blocks) and N users (nodes) Assume there are K circles (blocks) and N users (nodes)! Y(p,q) are binary: 0 or 1 Because our data is undirected, we do not need to worry about the direction of Z

I choosed a K = 9 to reduce the computation time. |α,B| is the number of hyper-parameters in the model, and |Y| is the number of positive relations observed.

Observations and thoughts The mix membership model takes one hour to run, while baseline methods take within a second. The mix membership model provides more information such as predicted probability of user p in block(circle) k. These model are good enough to help users identify their friendship circle. References: http://www.analytictech.com/e-net/pdwhandout.pdf https://uc-r.github.io/hc_clustering http://i.stanford.edu/~julian/pdfs/nips2012.pdf https://courses.washington.edu/b572/TALKS/2014/TedWestling-2.pdf http://www.stat.cmu.edu/~brian/905-2009/all-papers/airoldi08a.pdf https://cran.r-project.org/web/packages/mixedMem/vignettes/mixedMem.pdf Thank you! Shan Zhong Aprial 24th,2019