Download presentation
Presentation is loading. Please wait.
Published byMervin Richards Modified over 9 years ago
1
Microsoft Instant Messenger Communication Network How does the world communicate? Jure Leskovec (jure@cs.cmu.edu) Machine Learning Department http://www.cs.cmu.edu/~jure Joint work with: Eric Horvitz, Microsoft Research
2
Networks: Why? Today: large on-line systems leave detailed records of social activity On-line communities: MyScace, Facebook Email, blogging, instant messaging On-line publications repositories, arXiv, MedLine Emerging behavior (need lots of data): Actions of individual nodes are independent but global patterns and regularities emerge
3
The Largest Social Network What is the largest social network in the world (that we can relatively easily obtain)? For the first time we had a chance to look at complete (anonymized) communication of the whole planet (using Microsoft MSN instant messenger network) 3
4
Instant Messaging Contact (buddy) list Messaging window 4
5
Instant Messaging as a Network 5 Buddy Conversation
6
IM – Phenomena at planetary scale Observe social phenomena at planetary scale: How does communication change with user demographics (distance, age, sex)? How does geography affect communication? What is the structure of the communication network? 6
7
Communication data The record of communication Presence data user status events (login, status change) Communication data who talks to whom Demographics data user age, sex, location 7
8
Data description: Presence Events: Login, Logout Is this first ever login Add/Remove/Block buddy Add unregistered buddy (invite new user) Change of status (busy, away, BRB, Idle,…) For each event: User Id Time 8
9
Data description: Communication For every conversation (session) we have a list of users who participated in the conversation There can be multiple people per conversation For each conversation and each user: User Id Time Joined Time Left Number of Messages Sent Number of Messages Received 9
10
Data description: Demographics For every user (self reported): Age Gender Location (Country, ZIP) Language IP address (we can do reverse geo IP lookup) 10
11
Data collection Log size: 150Gb/day Just copying over the network takes 8 to 10h Parsing and processing takes another 4 to 6h After parsing and compressing ~ 45 Gb/day Collected data for 30 days of June 2006: Total: 1.3Tb of compressed data 11
12
Network: Conversations 12 Conversation
13
Data statistics Activity over June 2006 (30 days) 245 million users logged in 180 million users engaged in conversations 17,5 million new accounts activated More than 30 billion conversations 13
14
Data statistics per day Activity on June 1 2006 1 billion conversations 93 million users login 65 million different users talk (exchange messages) 1.5 million invitations for new accounts sent 14
15
User characteristics: age 15
16
Age piramid: MSN vs. the world 16
17
Conversation: Who talks to whom? Cross gender edges: 300 male-male and 235 female-female edges 640 million female-male edges 17
18
Number of people per conversation Max number of people simultaneously talking is 20, but conversation can have more people 18
19
Conversation duration Most conversations are short 19
20
Conversations: number of messages Sessions between fewer people run out of steam 20
21
Time between conversations Individuals are highly diverse What is probability to login into the system after t minutes? Power-law with exponent 1.5 Task queuing model [Barabasi] My email, Darvin’s and Einstein’s letters follow the same pattern 21
22
Age: Number of conversations User self reported age High Low 22
23
Age: Total conversation duration User self reported age High Low 23
24
Age: Messages per conversation User self reported age High Low 24
25
Age: Messages per unit time User self reported age High Low 25
26
Who talks to whom: Number of conversations 26
27
Who talks to whom: Conversation duration 27
28
Geography and communication Count the number of users logging in from particular location on the earth 28
29
How is Europe talking Logins from Europe 29
30
Users per geo location Blue circles have more than 1 million logins. 30
31
Users per capita Fraction of population using MSN: Iceland: 35% Spain: 28% Netherlands, Canada, Sweden, Norway: 26% France, UK: 18% USA, Brazil: 8% Fraction of population using MSN: Iceland: 35% Spain: 28% Netherlands, Canada, Sweden, Norway: 26% France, UK: 18% USA, Brazil: 8% 31
32
Communication heat map For each conversation between geo points (A,B) we increase the intensity on the line between A and B 32
33
Correlation: Probability: Homophily (gliha v kup štriha) Age vs. Age 33
34
Per country statistics On a particular typical day… 34 Country# of logins# of users# of messagesMessages per user USA38,319,36313,261,337412,729,27831.12 Brazil20,582,6137,864,424467,972,52259.50 France19,163,1316,475,858518,931,78580.13 Unknown18,444,3526,872,347191,167,08527.81 Spain16,868,5496,140,895503,759,24082.03 UK16,659,0095,724,826487,018,47085.07 Canada14,558,6925,021,185160,249,68631.91 China14,225,1635,314,463101,003,72919.00 Turkey13,619,7894,696,555353,540,47575.27 Mexico10,756,9894,359,932209,195,10047.98 Note that global usage and market share statistics are higher if we accumulate data over longer time periods.
35
Per typical user per country On a typical day MSN user from a country … 35 Country Logins on a particular day Users on a particular day Messages sent Messages per user Slovenia364,988130,88415,919,892121.6335992 Malta122,84641,8294,993,316119.3745009 Hungary1,214,268427,32047,623,604111.4471684 Bosnia105,58435,6893,254,17091.18131637 Teunion100,33533,3993,041,63591.0696428 Gibraltar19,0966,452581,19590.07982021 UK16,659,0095,724,826487,018,47085.07131396 Macedonia126,72943,7543,669,97783.87751977 Netherlands7,399,1602,696,669221,300,21082.06428375 Spain16,868,5496,140,895503,759,24082.03352117 Note that global usage and market share numbers are higher if we accumulate data over longer time periods.
36
What about Slovenia (per capita)? StatisticNumber Rank (per capita) Conversations inside19,868,88622 Conversation to outside7,868,48348 Total conversations27,737,36929 Avg. time inside309.49147 Avg. time outside314.3980 Avg. time inside (pct.)0.4960 Messages sent inside9.7832 Messages sent outside9.4619 Messages inside (pct.)0.5083 36
37
Who is Slovenia talking to? 37 Rank Target Country Pairs of people Number of conversations Avg. time per conv. Avg. # of messages 1Slovenia13,41,25019,868,886309.49.78 2USA61,794922,527303.49.14 3Spain27,650310,357289.47.97 4UK14,709204,335325.49.02 5Germany9,047129,551350.310.20 6Bosnia9,956114,509385.914.62 7Yugoslavia8,194104,270381.712.55 8Italy8,612100,698358.89.89 9Croatia6,83884,362359.011.00 10Turkey10,76377,651292.48.08 11Albania9,51776,440320.710.88 12Sweden5,08369,019306.98.34 13Netherlands5,06168,287315.98.87 14Canada5,00360,617301.87.38
38
Instant Messaging as a Network 38 Buddy
39
IM Communication Network Buddy graph: 240 million people (people that login in June ’06) 9.1 billion edges (friendship links) Communication graph: There is an edge if the users exchanged at least one message in June 2006 180 million people 1.3 billion edges 30 billion conversations 39
40
Buddy network: Number of buddies Buddy graph: 240 million nodes, 9.1 billion edges (~40 buddies per user) 40
41
Communication Network: Degree Number of people a users talks to in a month 41
42
Network: Small-world 6 degrees of separation [Milgram ’60s] Average distance 5.5 90% of nodes can be reached in < 8 hops HopsNodes 110 278 3396 48648 53299252 628395849 779059497 852995778 910321008 101955007 11518410 12149945 1344616 1413740 154476 161542 17536 18167 1971 2029 2116 2210 233 242 253
43
Network: Searchability Milgram’s experiment showed: (1) short paths exist in networks (2) humans are able to find them Assume the following setting: Nodes are scattered on a plane Given starting node u and we want to reach target node v Algorithm: always navigate to a neighbor that is geographically closest to target node v Surprise: Geo-routing finds the short paths (for appropriate distance measure) 43 u v
44
Communication network: Clustering How many triangles are closed? Clustering normally decays as k -1 Communication network is highly clustered: k -0.37 High clusteringLow clustering 44
45
Communication Network Connectivity 45
46
k-Cores decomposition What is the structure of the core of the network? 46
47
k-Cores: core of the network People with k<20 are the periphery Core is composed of 79 people, each having 68 edges among them 47
48
Network robustness We delete nodes (in some order) and observe how network falls apart: Number of edges deleted Size of largest connected component 48
49
Robustness: Nodes vs. Edges 49
50
Robustness: Connectivity 50
51
Conclusion A first look at planetary scale social network The largest social network analyzed Strong presence of homophily: people that communicate share attributes Well connected: in only few hops one can research most of the network Very robust: Many (random) people can be removed and the network is still connected 51
52
References Leskovec and Horvitz: Worldwide Buzz: Planetary-Scale Views on an Instant- Messaging Network, 2007 http://www.cs.cmu.edu/~jure http://www.cs.cmu.edu/~jure 52
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.