Download presentation
Presentation is loading. Please wait.
Published byEdward Walsh Modified over 9 years ago
1
IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey 2006 autumn intern presentation
2
IBM Research, Network Server Systems Software © 2006 IBM Corporation 2 Instant messaging Quick response User presence service Interactive communication Multitasking Private chat Enterprise cooperation AIM: 53 M users MSN: 29 M users Jabber: 13.5 M users SameTime:15M users Skype: 7 M QQ: 20 M Peak online users
3
IBM Research, Network Server Systems Software © 2006 IBM Corporation 3 Challenges of IM measurements No large scale measurement study on IM traffic characterization so far No server logs –In contrast to Web and streaming media servers Difficulty of online packet analysis User privacy concerns
4
IBM Research, Network Server Systems Software © 2006 IBM Corporation 4 Our Objective and Methodology First large scale IM traffic measurement –IM system design and optimization –Experimental basis for IM workload generation –Security in IM network Online IM traffic parser with the protection of user privacy related information –Packet level workloads of AIM and MSN Messenger (by port number) –Packet headers of Yahoo and GTalk/Jabber (by port number) –Nearly one month in a large enterprise network with thousands of employees –More than 20,000 user conversations by 469 AIM users and 408 MSN users
5
IBM Research, Network Server Systems Software © 2006 IBM Corporation 5 Dump data pcap format file IM Sniffer MSNP AIM protocol –Classic: OSCAR –Triton: new, N/A 10% AIM traffic Network interface OS kernel pcap library Online packet reconstructor AIM packet parser MSN packet parser Offline analysis Ethernet packets Protect user privacy information IM packet 1 IM packet 2 lguo@us.ibm.com: hello, how are you doing 4d347c1b: e51c49a1043fc IP packets MD5 hash
6
IBM Research, Network Server Systems Software © 2006 IBM Corporation 6 Instant messaging in AIM Authentication Redirection User-to-user chat Multi-user chat P2P communication Authentication server BOS server Chat room server P2P voice/video chat, file transferring Email server Buddy icon server … Other services
7
IBM Research, Network Server Systems Software © 2006 IBM Corporation 7 Instant messaging in MSN Messenger Switchboard server Dispatch server Notification server P2P voice/video chat, file transferring MSN passport server Email server … Other services
8
IBM Research, Network Server Systems Software © 2006 IBM Corporation 8 Outline Overview of IM traffic Online activity of IM users Characterizing IM servers Analysis of IM traffic Conclusion
9
IBM Research, Network Server Systems Software © 2006 IBM Corporation 9 Overview of IM traffic For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends. MB Traffic volume# of packets with TCP payload x10 6
10
IBM Research, Network Server Systems Software © 2006 IBM Corporation 10 IM servers in our workloads The number of IM servers is very large Total # of server IPs collectedCum. # of server IPs collected over time
11
IBM Research, Network Server Systems Software © 2006 IBM Corporation 11 IM TCP connections Number of TCP requestsFailed TCP requests (%) The percentage of failed TCP requests is non-trivial
12
IBM Research, Network Server Systems Software © 2006 IBM Corporation 12 IM traffic rate IM traffic rate (sampled per minute) IM traffic rate (sampled per hour) IM traffic is highly bursty: a lot of spikes 8.9 Kbps in average
13
IBM Research, Network Server Systems Software © 2006 IBM Corporation 13 IM traffic rate Hourly traffic rate of AIM Hourly traffic rate of MSN Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring
14
IBM Research, Network Server Systems Software © 2006 IBM Corporation 14 IM traffic rate Hourly traffic rate of Yahoo Hourly traffic rate of GTalk GTalk traffic rate has clear diurnal and weekly pattern, due to the less use of voice/video chat and file transfers
15
IBM Research, Network Server Systems Software © 2006 IBM Corporation 15 Summary of IM traffic overview The traffic volume a client receives from IM servers is much greater than that it sends (Yahoo is an exception) A large number of servers are used for IM services The failure ratio of IM TCP connections is non-trivial IM traffic is highly bursty due to voice/video chat and file transfers
16
IBM Research, Network Server Systems Software © 2006 IBM Corporation 16 Outline Overview of IM traffic Online activity of IM users Characterizing IM servers Analysis of IM traffic Conclusion
17
IBM Research, Network Server Systems Software © 2006 IBM Corporation 17 Online session and chat conversation: AIM Online session duration –Login time to logout/disconnect time –Duration of TCP connection to BOS server Conversation –All messages are forwarded by the BOS server –Interleaved in a TCP connection together –5-minute threshold for msg inter-arrival time to identify a conversation > 5min conversations BOS server A B C> 5minAB1 AB2 AC1
18
IBM Research, Network Server Systems Software © 2006 IBM Corporation 18 Online session and chat conversation: MSN Online session duration –Login time to logout/disconnect time –Duration of TCP connection to notification server Conversation –Each conversation is forwarded by a new switchboard server –Disconnect automatically if idle > 5min –Removing conversations without chat messages Switchboard server Notification server
19
IBM Research, Network Server Systems Software © 2006 IBM Corporation 19 Online activity of AIM users Number of online users Number of simultaneous chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM # of chat conversations << # of online users 120 users 12 chat conversations
20
IBM Research, Network Server Systems Software © 2006 IBM Corporation 20 Online activity of MSN users Number of online users Number of simultaneous chat conversations 90 users 14 chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break) # of chat conversations << # of online users
21
IBM Research, Network Server Systems Software © 2006 IBM Corporation 21 Number of conversations per user Users are idle in most time Few users chatting simultaneously with two buddies AIMAIM MSNMSN average: 0.058 average: 0.075
22
IBM Research, Network Server Systems Software © 2006 IBM Corporation 22 Distribution of user online duration AIMMSN Weibull distribution has been reported by a P2P study (IMC 2006) Cumulative probability distribution: P (X > x) = exp[-(x/x 0 ) c ] log(–log P) = log[(x/x 0 ) c ] = c log x – c log x 0 straight line: not well fit
23
IBM Research, Network Server Systems Software © 2006 IBM Corporation 23 Online duration of IM user sessions CDF CCDF Two-mode distribution 10 hours – the divide between long online durations and short online durations
24
IBM Research, Network Server Systems Software © 2006 IBM Corporation 24 Online activity of AIM users Login eventsLogout events Peak time: about 9:00 AMPeak time: about 5:00 PM
25
IBM Research, Network Server Systems Software © 2006 IBM Corporation 25 Online activity of MSN users Login eventsLogout events Peak time: about 9:00 AMPeak time: about 5:00 PM
26
IBM Research, Network Server Systems Software © 2006 IBM Corporation 26 The 10-hour divide of online duration AIMAIM MSNMSN Login eventsLogout events 10 hours Online time roughly 10 hours: some employees working longer than 8 hours Online time longer than 10 hours: users do not turn off computer when leaving work
27
IBM Research, Network Server Systems Software © 2006 IBM Corporation 27 Number of online days AIM MSN Not a heavy-tailed distribution, show user activity in another perspective –Inactive users: online occasionally –Active users: online every weekday –Random users: online sporadically
28
IBM Research, Network Server Systems Software © 2006 IBM Corporation 28 Summary of user online activity Number of online users and simultaneous chat conversations have clearly diurnal and weekly patterns Users are idle in most online time: # of chat conversations << # of online users User online duration does not follow Weibull distribution –Most user sessions: login and logout events are highly related with working hours –Long duration user sessions (> 10 hours): users do not turn off computer when they leave work –Two-mode online duration distribution Users can classified into three categories based on their online days: actively online, inactively online, and sporadically online
29
IBM Research, Network Server Systems Software © 2006 IBM Corporation 29 Outline Overview of IM traffic Online activity of IM users Characterizing IM servers Analysis of IM traffic Conclusion
30
IBM Research, Network Server Systems Software © 2006 IBM Corporation 30 Characterizing IM servers RTT SRT sniffer IM server IM client Server response time measurement Purpose: A first step to understanding the server load from client side CRT: client perceived response time SRT: server response time of MSNP commands RTT: packet round trip time (get from TCP handshake) CRT
31
IBM Research, Network Server Systems Software © 2006 IBM Corporation 31 MSN server response time Response time for the first MSNP command of a TCP connection –RTT is still accurate –Reflects the server load Some commands are responded with a long latency Dispatch serverNotification serverSwitchboard server
32
IBM Research, Network Server Systems Software © 2006 IBM Corporation 32 Outline Overview of IM traffic Online activity of IM users Characterizing IM servers Analysis of IM traffic Conclusion
33
IBM Research, Network Server Systems Software © 2006 IBM Corporation 33 Message level analysis of IM traffic Inbound traffic >> outbound traffic # of msgs: chat < hint < presence (AIM hint msg is small because OSCAR is binary based) MSN has more bin msgs for user icons, voice/video chats AIMMSN
34
IBM Research, Network Server Systems Software © 2006 IBM Corporation 34 Size of chat messages AIM: messages are in html format (not extracted online) MSN: format is described in message header and easy to remove MSN: 90% messages are smaller than 50 bytes CDF (semi-log scale)CCDF (log-log scale) < 50 bytes
35
IBM Research, Network Server Systems Software © 2006 IBM Corporation 35 Number of messages in a user conversation CDF (semi-log scale)CCDF (Weibull scale) 2540 90% Most conversations have small number of messages The number of msg in a conversation –Not power law –Follows Weibull distribution approximately
36
IBM Research, Network Server Systems Software © 2006 IBM Corporation 36 Number of messages in a user conversation AIMMSN Weibull fitting results
37
IBM Research, Network Server Systems Software © 2006 IBM Corporation 37 Number of conversations by a user Most users have small number of conversations Number of user conversations –Not power law –Follows Weibull distribution approximately CDF (semi-log scale)CCDF (Weibull scale)
38
IBM Research, Network Server Systems Software © 2006 IBM Corporation 38 Distribution of MSN conversation duration < 200 sec Most conversations are short MSN client will disconnect to the SB server after a long idle time
39
IBM Research, Network Server Systems Software © 2006 IBM Corporation 39 IM social network: number of users contacted Rank (log-log scale)CCDF (Weibull scale) Users in buddy list –Contact list packets may be lost or cannot completely parsed by IM sniffer Users chat with –IM spammers MSN: Weibull, AIM: a little rough
40
IBM Research, Network Server Systems Software © 2006 IBM Corporation 40 Number of buddies an IM user chats with A user only contacts with a small portion of of buddies in its contact list MSN users are more active? –Not sure, we do not count AIM Triton users MSNAIM A user chat with 5.5 buddies (about 25%) in average A user chat with 1.9 buddies (about 7%) in average
41
IBM Research, Network Server Systems Software © 2006 IBM Corporation 41 Concluding remarks IM sniffer and measurement –Packet level –User privacy protection IM traffic characterization –Diurnal and weekly patterns of IM traffic –The traffic volume a client receives is much greater than it sends –Chat msgs only account for a small percentage of total msgs –Online activity of IM users –Messages in conversations: Weibull –Conversations of users: Weibull –Social network: Weibull roughly
42
IBM Research, Network Server Systems Software © 2006 IBM Corporation 42 Future work Implement IM sniffer in Linux kernel –For heavy workload collection Larger scale measurement in Cornell University –Larger user population, dominated by students Collect SameTime workload on the server side –Understand IM servers better –How IM is used in work cooperation: a global map of IM user social network
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.