Download presentation
Presentation is loading. Please wait.
Published byBryan Black Modified over 9 years ago
1
Social Network Analysis
2
Outline l Background of social networks –Definition, examples and properties l Data in social networks –Data creation, flow and storage l Analytic tasks in social networks –Problems, solutions and examples l Summary
3
What is a Social Network? l A definition from Wikipedia –A social network is a social structure made up of a set of social actors (such as individuals or organizations) and a set of the dyadic ties between these actors. –Social network analysis: analyze the structure of the whole network, identify local and global patterns, locate influential entities, and examine network dynamics.
4
Social Network Representation l Graph Representation l Matrix Representation
5
Social Network: Examples
7
The Scale and Growth of Social Networks l Facebook statistics –829 million daily active users on average in June 2014 –1.32 billion monthly active users as of June 30, 2014 –81.7% of daily active users are outside the U.S. and Canada –22% increase in Facebook users from 2012 to 2013 l Facebook activities (every 20 minutes on Facebook) –1 million links shared –2 million friends requested –3 million messages sent http://newsroom.fb.com/company-info/ http://www.statisticbrain.com/facebook-statistics/
8
Visualizing Friendships on Facebook
9
The Scale and Growth of Social Networks l Twitter statistics –271 million monthly active users in 2014 –135,000 new users signing up every day –78% of Twitter active users are on mobile –77% of accounts are outside the U.S. l Twitter activities –500 million Tweets are sent per day –9100 Tweets are sent per second https://about.twitter.com/company http://www.statisticbrain.com/twitter-statistics/
10
A Tweet Map of America
11
Properties of Large-Scale Social Networks l Scale-free distributions l Small-world effect l Strong community structure
12
Scale-free Distributions l Degree distribution in large-scale networks often follows a power law, that is, the fraction p(x) of nodes in the network having x connections to other nodes goes for large values of x as: l A.k.a. long tail distribution, scale-free distribution
13
Log-log Plot l Power law distribution becomes a straight line if plotted in a log-log scale Friendship Network in FlickrFriendship Network in YouTube
14
Small-world Effect l “Six Degrees of Separation” l A famous experiment conducted by Travers and Milgram (1969) –Subjects were asked to send a chain letter to his acquaintance in order to reach a target person –The average path length is around 5.5 l Verified on a planetary-scale IM network of 180 million users (Leskovec and Horvitz 2008) –The average path length is 6.6 l Facebook users (721 million) were separated by 4.74 degrees as of May 2011.
15
Diameter l Measures used to calibrate the small world effect –Diameter: the longest shortest path distance in a network –Average shortest path length l Example –The shortest distance between node 1 and node 9 is 4. –The diameter of the network is 5, corresponding to the shortest distance between nodes 2 and 9. Shortest Path The Longest Shortest Path
16
Community Structure l Community: People in a group interact with each other more frequently than those outside the group l Friends of a friend are likely to be friends as well l Measured by clustering coefficient: –density of connections among one’s friends
17
Clustering Coefficient l d 6 =4, N 6 = {4, 5, 7,8} l k 6 =4 as e(4,5), e(5,7), e(5,8), e(7,8) l C 6 = 4/(4*3/2) = 2/3 l Average clustering coefficient C = (C 1 + C 2 + … + C n )/n l C = 0.61 for the left network
18
Data in Social Networks l Data creation l Data flow l Data storage
19
Data Creation in Social Networks l User profiles and relationships l User-generated content –Text (blogs, microblogs, messages, reviews, etc.) 500 million tweets are sent per day. –Images, audio, and video 100 hours of video are uploaded to YouTube every minute.
20
Distinction from Content in Traditional Media (Newspaper, TV, etc.) l Inexpensive to generate and publish l Widely accessible l Varying quality l Rich user interaction
21
Data Flow Architecture at Facebook l Hadoop: a distributed file system and map-reduce platform l Scribe: a distributed and scalable data bus that aggregates logs from web servers l Hive: a data warehousing framework for reporting, querying and analysis l Federated MySQL: contains all the Facebook site related data [Thusoo et al., SIGMOD’10]
22
Data Storage at Facebook l The production cluster usually has to hold only one month’s worth of data l The ad hoc cluster needs to hold all the historical data, so that measures, models and hypotheses can be tested on historical data l Using gzip to compress data with a compression factor of 6-7
23
Cold Data Storage l Facebook uses 10,000 Blu-ray discs to store a petabyte (=1,000,000 GB) of ‘cold’ data that hardly ever needs to be accessed, including duplicates of its users’ photos and videos that Facebook keeps for backup purposes. l The Blu-ray system reduces costs by 50% and energy use by 80% compared with its current cold-storage system, which uses hard disk drives.
24
Server Racks in Facebook’s Data Center
25
Data Analytic Tasks in Social Networks l Community detection l Friend recommendation l Importance of nodes l Influence propagation l Event detection
26
Community Detection
27
What is a Community? l Community: It is formed by individuals such that those within a group interact with each other more frequently than with those outside the group –a.k.a. group, cluster, cohesive subgroup, module in different contexts l Two types of groups in social networks –Explicit Groups: formed by user subscriptions –Implicit Groups: implicitly formed by social interactions
28
Community Example [McAuley and Leskovec, NIPS’2012]
29
Subjectivity of Community Definition Each component is a community A densely-knit community Definition of a community can be subjective. Definition of a community can be subjective.
30
Community Detection l Community detection: discovering groups in a network where individuals’ group memberships are not explicitly given l Some social media sites allow people to join groups, is it necessary to extract groups based on network topology? –Not all sites provide community platform –Not all people want to make effort to join groups –Groups can change dynamically
31
Community Detection based on Cliques l Clique: a maximum complete subgraph in which all nodes are adjacent to each other l In a clique of size k, each node maintains degree >= k-1 (for example, node 7 with degree 4) l Nodes with degree < k-1 will not be included in the clique (for example, node 9 with degree 1) Nodes 5, 6, 7 and 8 form a clique of size 4
32
Maximum Clique Example l In order to find a clique >3, remove all nodes with degree <=3-1=2 –Step 1. Remove nodes 2 and 9 –Step 2. Remove nodes 1 and 3 –Step 3. Remove node 4
33
Clique Percolation Method (CPM) l Clique is a very strict definition, unstable l Normally use cliques as a core or a seed to find larger communities l CPM is such a method to find overlapping communities –Input A parameter k, and a network –Procedure Find out all cliques of size k in a given network Construct a clique graph. Two cliques are adjacent if they share k-1 nodes Each connected component in the clique graph forms a community
34
CPM Example Cliques of size 3: {1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8}, {6, 7, 8} Communities: {1, 2, 3, 4} {4, 5, 6, 7, 8}
35
Friend Recommendation
36
Friend Recommendation Example
37
What is Friend Recommendation? l Given a snapshot of a social network, can we recommend new friendships among its members that are likely to occur in the near future? –a.k.a. link prediction l Observation: Users do not form friendship at random with all other users. Instead, they tend to prefer other users that are “close” to them. link prediction
38
Popular Link Prediction Heuristics HeuristicScore Definition shortest path distance common neighbors Adamic/Adar
39
Link Prediction Heuristics Example
40
Link Prediction Accuracy RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007; Sarkar, 2010 The number of paths matters, not the length For large dense graphs, common neighbors are enough Differentiating between different degrees is important In sparse graphs, length 3 or more paths help in prediction.
41
Importance of Nodes
42
l Not all nodes are equally important l Find out the most important nodes (influential entities) in one network l Commonly-used measures –Degree Centrality –Closeness Centrality
43
Degree Centrality l The importance of a node is determined by the number of nodes adjacent to it –The larger the degree, the more important the node is –Only a small number of nodes have high degrees in many real- life networks l Degree Centrality l Normalized Degree Centrality:
44
Degree Centrality Example l Which node is the most important in the network? NodeDegree centrality Normalized degree centrality 133/8 222/8 333/8 444/8 54 64 74 833/8 911/8
45
Closeness Centrality l “Central” nodes are important, as they can reach the whole network more quickly than non-central nodes l Importance measured by how close a node is to other nodes l Average Distance: l Closeness Centrality
46
Closeness Centrality Example Node 4 is more central than node 3
47
Summary l In this lecture, we introduce –social networks, examples and their properties –data creation, flow and storage in social networks –social network analysis tasks, applications and case studies
48
References l Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.