Danny Hendler Advanced Topics in on-line Social Networks Analysis Social networks analysis seminar Introductory lecture Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Seminar requirements Select a paper and notify me by Tuesday, November 8, 2016 Study the paper well and prepare a good presentation Meet with me to receive feedback before your talk At least 1 week before presentation Give the seminar talk Participate in at least 80% of seminar talks Recommended reading: “Networks, crowds, and markets: reasoning about a highly connected world”. Easley & Kleinberg, 2010. Available online. “Social Media Mining: an Introduction”. Zafarani, Abassi & Liu, 2014. Available online. Papers list (to be published soon). Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Seminar schedule 10 more weeks of Student talks Semester ends 3/11/16 Introductory lecture #1 8/11/16 Papers list published, paper assignment period starts 10/11/16 Introductory lecture #2 13/11/16 Paper assignment period ends 15/11/16 Papers assignment published 17/11/16 Student talks start 10 more weeks of Student talks Semester ends Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Talk outline Social network concepts Properties of social networks Small-world phenomenon Power-law distribution Community structure Community detection Newman & Girvan algorithm Click Percolation Method (CPM) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Social networks What is a social network? A network, represented by a graph where nodes represent actors and edges represent interactions / relationships Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Social networks: an example Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Social networks: an example Giant component Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Social networks: an example Some nodes are very active Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Social networks: an example Others less so Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Types of online social media Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Top 20 USA websites 1 Google.com 11 Craiglist.com 2 Facebook.com 12 Netflix.com 3 Amazon.com 13 Live.com 4 Youtube.com 14 Bing.com 5 Yahoo.com 15 Linkedin.com 6 Wikipedia.org 16 Pinterest.com 7 Ebay.com 17 Espn.go.com 8 Twitter.com 18 Imgur.com 9 Go.com 19 Tumblr.com 10 Reddit.com 20 Chase.com Source: Alexa report, October, 2015 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Top 20 USA websites 1 Google.com 11 Craiglist.com 2 Facebook.com 12 Netflix.com 3 Amazon.com 13 Live.com 4 Youtube.com 14 Bing.com 5 Yahoo.com 15 Linkedin.com 6 Wikipedia.org 16 Pinterest.com 7 Ebay.com 17 Espn.go.com 8 Twitter.com 18 Imgur.com 9 Go.com 19 Tumblr.com 10 Reddit.com 20 Chase.com 25% social network sites Source: Alexa report, October, 2015 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Top 20 USA websites 1 Google.com 11 Craiglist.com 2 Facebook.com 12 Netflix.com 3 Amazon.com 13 Live.com 4 Youtube.com 14 Bing.com 5 Yahoo.com 15 Linkedin.com 6 Wikipedia.org 16 Pinterest.com 7 Ebay.com 17 Espn.go.com 8 Twitter.com 18 Imgur.com 9 Go.com 19 Tumblr.com 10 Reddit.com 20 Chase.com 25% social network sites 25% additional sites with social network aspects Source: Alexa report, February, 2014 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Knowledge we may gain: Identifying romantic ties in facebook. (*) Backstrom & Kleinberg. Romantic partnerships and the dispersion of social ties: a network analysis of relationship status on facebook. CSCW 2014, pp. 831-841. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Knowledge we may gain: Web structure (*) Broder et al. Graph structure in the Web. WWW 2000, pp. 309-320. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Knowledge we may gain: Dynamic of viral marketing. (*) Leskovec et al.. The dynamics of viral marketing. Transactions on the Web, 2007. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Knowledge we may gain: Identify “key players”, collaborations. Paul Erdős, 1913-1996 “A mathematician is a machine for turning coffee into theorems” Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Knowledge we may gain: Identify “key players”, collaborations. Paul Erdős, 1913-1996 A mathematician is a machine for turning coffee into theorems Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Knowledge we may gain: Identify “key players”, collaborations. Bacon number Erdős number Paul Erdős's Bacon number is 5 Paul Erdős and Ronald Graham appeared in N Is a Number: A Portrait of Paul Erdős. Ronald Graham and Merce Cunningham appeared in Great Genius and Profound Stupidity. Merce Cunningham and Dennis Hopper appeared in John Cage: The Revenge of the Dead Indians. Dennis Hopper and Chris Penn appeared in True Romance. Chris Penn and Kevin Bacon appeared in Footloose Source: wiki Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Properties of social networks Social network concepts Properties of social networks Small-world phenomenon Power-law distribution Community structure Community detection Newman & Girvan algorithm Click Percolation Method (CPM) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Milgram's small world phenomenon experiment (1967) Six degrees of separation: “I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation between us and everyone else on this planet.” (*) Milgram decided to check if this is the case… (*) John Guare. Six Degrees of Separation: A Play. Vintage Books, 1990. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Milgram's experiment Budget: $680!!! A set of “starters”, all try to forward a letter to a single “target” person Starters notified of target’s name/address/occupation Must forward letter to someone known on “first-name basis” Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis Image taken from Wiki.
Milgram's experiment: results 64 chains arrived Median length: 6 source: “Networks, crowds and Markets”, D. Easley & J. Kleinberg. (Book is online) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
A slightly more modern example (2008): Microsoft instant messenger shortest paths Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Average path-length in Real-World networks source: “Social Media Mining, an Introduction”, R. Zafarani, M. A. Abbasi & H. Liu. (Book is online) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Properties of social networks Social network concepts Properties of social networks Small-world phenomenon Power-law distribution Community structure Community detection Newman & Girvan algorithm Click Percolation Method (CPM) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
As a function of k: what fraction of Web pages have k in-links? A matter of popularity… As a function of k: what fraction of Web pages have k in-links? ~1/k2.1 (*) (*) Broder et al. Graph structure in the Web. WWW 2000, pp. 309-320. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Few nodes do have extremely high degrees The power law distribution Degrees A.k.a. long tail distribution, scale-free distribution Most nodes have low degrees Few nodes do have extremely high degrees Fraction of nodes Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Web pages in-degree: log-log scale Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Some more examples Friendship Network in Flickr Friendship Network in YouTube Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Why is popularity power-law? Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
As a function of k: what fraction of Web pages have k in-links? A simple game… Procedure for creating Web page j {1,2…N} Choose page i<j randomly & uniformly: With probability p, create a link to page i With probability 1-p, create a link to the page pointed to by page i As a function of k: what fraction of Web pages have k in-links? ~1/kc, lim c =-2 p 0 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
= Rich get richer… Procedure for creating Web page j {1,2…N} Choose page i<j randomly & uniformly: With probability p, create a link to page i With probability 1-p, create a link to the page pointed to by page i = With probability p, choose page i<j uniformly and create a link to page i With probability 1-p, choose a page i<j with probability proportional to i‘th number of incoming links and create a link to i Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
What behaves like power law? Fraction of telephone numbers that receive k calls per day (~ 𝑎 𝑘 2 ) Fraction of books bought by k people (~ 𝑎 𝑘 3 ) Fraction of papers receiving k citations (~ 𝑎 𝑘 3 ) Fraction of cities with population k … (*) Broder et al. Graph structure in the Web. WWW 2000, pp. 309-320. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
The situation in random graphs Nodes connected at random Node degrees follow a binomial distribution Probability of “very popular” nodes practically 0 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Communities (a.k.a. clusters/modules) Community structure: the organization of vertices in clusters, with “many” edges joining vertices of the same community and “relatively few” edges joining different communities Often represent sets of actors sharing similar properties/roles. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Community detection Social network concepts Properties of social networks Small-world phenomenon Power-law distribution Community structure Community detection Newman & Girvan algorithm Click Percolation Method (CPM) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Why is community-detection important? A community ``summarizes” a group of actors and is relatively easy to visualize/understand Partition to communities reveals high-level domain structure May reveal important properties without compromising individuals' privacy Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Community detection applications Clustering web clients with geographical proximity and similar access patterns cache servers positioning [Krishnamurty & Wang, SIGCOMM 2000] Clustering customers with similar interests Recommendation systems [Reddy et al., DNIS 2002] Analysing structural positions Identifying central actors and inter-community mediators Follow political trends Detect malicious actors (e.g. spammers) … Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Community detection Social network concepts Properties of social networks Small-world phenomenon Power-law distribution Community structure Community detection Newman & Girvan algorithm Click Percolation Method (CPM) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
“Edge-betweeness” based detection A divisive method (as opposed to agglomerative methods) Look for an edge that is most “between” pairs of nodes Responsible for connecting many pairs Remove edge and recalculate Newman and Girvan. Finding and evaluating community structure in networks, 2003 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness Compute all-pairs shortest paths For each edge, compute the number of such paths it belongs to Remove a maximum-weight edge Repeat until no edges (more on this later) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 24 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 9 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 3 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 1 2 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 1 2 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 1 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 1 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 4 5 1 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 1 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 1 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 1 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example What if there are several shortest paths? 1 4 3 2 5 2.5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Dendrograms (hierarchical trees) A dendrogram (hierarchical tree) illustrates the output of hierarchical clustering algorithms Leaves represent graph nodes, top represents original graph As we move down the tree, larger communities are partitioned to smaller ones 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 24 1 7 1 2 3 4 5 6 7 8 9 6 8 3 9 4 5 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 9 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 3 7 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 1 2 1 7 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 1 2 1 7 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 1 7 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 1 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 1 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 4 5 1 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 1 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 1 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 1 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path betweeness: an example 2 1 7 6 8 3 9 4 5 1 2 3 4 5 6 7 8 9 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Evaluation: computer-generated networks Large number of graphs with 128 nodes and 4 communities of 32-nodes each Probability pin for intra-community edges Probablilty pext for inter-community edges Chosen such that expected vertex degree is 16 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Results (for 64-nodes networks) zin=6, zout=2 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Evaluation: The Zachary karate club Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Shortest-path no recalculation Results on Zachary club network Shortest path 2-communities partition missed just a single person! Re-calculation of betweeness essential Shortest-path Shortest-path no recalculation Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
We need a quality function Quality functions Hierarchical clustering algorithms create numerous partitions In general, we do not know how many communities we should seek. How will we know that our clustering is “good”? We need a quality function Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
The modularity quality function No communities in random graphs Equal probabilities for all edges Check how far intra-community and inter-community densities are from those you would expect in a random graph with identical nodes and same degree-distribution Newman and Girvan. Finding and evaluating community structure in networks, 2003 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
The modularity quality function Degrees of nodes-pair Modularity value Probability of an edge if degrees are set and edges placed in random # edges In-same-cluster indicator variable Graph adjacency matrix Clauset, Newman and Moore. Finding community structure in very large networks, 2004 Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Modularity maximized at correct partition Computer-generated networks: modularity Modularity maximized at correct partition Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
One of two local maxima at correct partition Zachary club network: modularity One of two local maxima at correct partition Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Community detection Social network concepts Properties of social networks Small-world phenomenon Power-law distribution Community structure Community detection Newman & Girvan algorithm Click Percolation Method (CPM) Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Clique Percolation Method (CPM) Input: A parameter k, and a network Procedure: Find out all cliques of size k in the given network Construct a clique graph Two cliques are adjacent if they share k-1 nodes These connected components in the clique graph form a community Slide based on “Social Media Mining, an Introduction”, R. Zafarani, M. A. Abbasi & H. Liu. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Clique Percolation Method: an example Cliques of size 3: {1, 2, 3}, {3, 4,5}, {4, 5, 7}, {4,5, 6}, {4,6,7}, {5,6, 7}, {6, 7, 8}, {8,9,10} Communities: {1, 2, 3} {8,9,10} {3,4, 5, 6, 7, 8} Slide based on “Social Media Mining, an Introduction”, R. Zafarani, M. A. Abbasi & H. Liu. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis
Reveals overlapping community structure Clique Percolation Method: an example Communities: {1, 2, 3} {8,9,10} {3,4, 5, 6, 7, 8} Reveals overlapping community structure Slide based on “Social Media Mining, an Introduction”, R. Zafarani, M. A. Abbasi & H. Liu. Danny Hendler, Ben-Gurion University CS20225921, Advanced Topics in On-Line Social Networks Analysis