Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio.

Similar presentations


Presentation on theme: "IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio."— Presentation transcript:

1 IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey 2006 autumn intern presentation

2 IBM Research, Network Server Systems Software © 2006 IBM Corporation 2 Instant messaging  Quick response  User presence service  Interactive communication  Multitasking  Private chat  Enterprise cooperation AIM: 53 M users MSN: 29 M users Jabber: 13.5 M users SameTime:15M users Skype: 7 M QQ: 20 M Peak online users

3 IBM Research, Network Server Systems Software © 2006 IBM Corporation 3 Challenges of IM measurements  No large scale measurement study on IM traffic characterization so far  No server logs –In contrast to Web and streaming media servers  Difficulty of online packet analysis  User privacy concerns

4 IBM Research, Network Server Systems Software © 2006 IBM Corporation 4 Our Objective and Methodology  First large scale IM traffic measurement –IM system design and optimization –Experimental basis for IM workload generation –Security in IM network  Online IM traffic parser with the protection of user privacy related information –Packet level workloads of AIM and MSN Messenger (by port number) –Packet headers of Yahoo and GTalk/Jabber (by port number) –Nearly one month in a large enterprise network with thousands of employees –More than 20,000 user conversations by 469 AIM users and 408 MSN users

5 IBM Research, Network Server Systems Software © 2006 IBM Corporation 5 Dump data pcap format file IM Sniffer  MSNP  AIM protocol –Classic: OSCAR –Triton: new, N/A 10% AIM traffic Network interface OS kernel pcap library Online packet reconstructor AIM packet parser MSN packet parser Offline analysis Ethernet packets Protect user privacy information IM packet 1 IM packet 2 lguo@us.ibm.com: hello, how are you doing 4d347c1b: e51c49a1043fc IP packets MD5 hash

6 IBM Research, Network Server Systems Software © 2006 IBM Corporation 6 Instant messaging in AIM  Authentication  Redirection  User-to-user chat  Multi-user chat  P2P communication Authentication server BOS server Chat room server P2P voice/video chat, file transferring Email server Buddy icon server … Other services

7 IBM Research, Network Server Systems Software © 2006 IBM Corporation 7 Instant messaging in MSN Messenger Switchboard server Dispatch server Notification server P2P voice/video chat, file transferring MSN passport server Email server … Other services

8 IBM Research, Network Server Systems Software © 2006 IBM Corporation 8 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

9 IBM Research, Network Server Systems Software © 2006 IBM Corporation 9 Overview of IM traffic For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends. MB Traffic volume# of packets with TCP payload x10 6

10 IBM Research, Network Server Systems Software © 2006 IBM Corporation 10 IM servers in our workloads The number of IM servers is very large Total # of server IPs collectedCum. # of server IPs collected over time

11 IBM Research, Network Server Systems Software © 2006 IBM Corporation 11 IM TCP connections Number of TCP requestsFailed TCP requests (%) The percentage of failed TCP requests is non-trivial

12 IBM Research, Network Server Systems Software © 2006 IBM Corporation 12 IM traffic rate IM traffic rate (sampled per minute) IM traffic rate (sampled per hour) IM traffic is highly bursty: a lot of spikes 8.9 Kbps in average

13 IBM Research, Network Server Systems Software © 2006 IBM Corporation 13 IM traffic rate Hourly traffic rate of AIM Hourly traffic rate of MSN Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring

14 IBM Research, Network Server Systems Software © 2006 IBM Corporation 14 IM traffic rate Hourly traffic rate of Yahoo Hourly traffic rate of GTalk GTalk traffic rate has clear diurnal and weekly pattern, due to the less use of voice/video chat and file transfers

15 IBM Research, Network Server Systems Software © 2006 IBM Corporation 15 Summary of IM traffic overview  The traffic volume a client receives from IM servers is much greater than that it sends (Yahoo is an exception)  A large number of servers are used for IM services  The failure ratio of IM TCP connections is non-trivial  IM traffic is highly bursty due to voice/video chat and file transfers

16 IBM Research, Network Server Systems Software © 2006 IBM Corporation 16 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

17 IBM Research, Network Server Systems Software © 2006 IBM Corporation 17 Online session and chat conversation: AIM  Online session duration –Login time to logout/disconnect time –Duration of TCP connection to BOS server  Conversation –All messages are forwarded by the BOS server –Interleaved in a TCP connection together –5-minute threshold for msg inter-arrival time to identify a conversation > 5min conversations BOS server A B C> 5minAB1 AB2 AC1

18 IBM Research, Network Server Systems Software © 2006 IBM Corporation 18 Online session and chat conversation: MSN  Online session duration –Login time to logout/disconnect time –Duration of TCP connection to notification server  Conversation –Each conversation is forwarded by a new switchboard server –Disconnect automatically if idle > 5min –Removing conversations without chat messages Switchboard server Notification server

19 IBM Research, Network Server Systems Software © 2006 IBM Corporation 19 Online activity of AIM users Number of online users Number of simultaneous chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM # of chat conversations << # of online users 120 users 12 chat conversations

20 IBM Research, Network Server Systems Software © 2006 IBM Corporation 20 Online activity of MSN users Number of online users Number of simultaneous chat conversations 90 users 14 chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break) # of chat conversations << # of online users

21 IBM Research, Network Server Systems Software © 2006 IBM Corporation 21 Number of conversations per user  Users are idle in most time  Few users chatting simultaneously with two buddies AIMAIM MSNMSN average: 0.058 average: 0.075

22 IBM Research, Network Server Systems Software © 2006 IBM Corporation 22 Distribution of user online duration AIMMSN  Weibull distribution has been reported by a P2P study (IMC 2006)  Cumulative probability distribution: P (X > x) = exp[-(x/x 0 ) c ]  log(–log P) = log[(x/x 0 ) c ] = c log x – c log x 0  straight line: not well fit

23 IBM Research, Network Server Systems Software © 2006 IBM Corporation 23 Online duration of IM user sessions CDF CCDF  Two-mode distribution  10 hours – the divide between long online durations and short online durations

24 IBM Research, Network Server Systems Software © 2006 IBM Corporation 24 Online activity of AIM users Login eventsLogout events Peak time: about 9:00 AMPeak time: about 5:00 PM

25 IBM Research, Network Server Systems Software © 2006 IBM Corporation 25 Online activity of MSN users Login eventsLogout events Peak time: about 9:00 AMPeak time: about 5:00 PM

26 IBM Research, Network Server Systems Software © 2006 IBM Corporation 26 The 10-hour divide of online duration AIMAIM MSNMSN Login eventsLogout events 10 hours Online time roughly 10 hours: some employees working longer than 8 hours Online time longer than 10 hours: users do not turn off computer when leaving work

27 IBM Research, Network Server Systems Software © 2006 IBM Corporation 27 Number of online days AIM MSN  Not a heavy-tailed distribution, show user activity in another perspective –Inactive users: online occasionally –Active users: online every weekday –Random users: online sporadically

28 IBM Research, Network Server Systems Software © 2006 IBM Corporation 28 Summary of user online activity  Number of online users and simultaneous chat conversations have clearly diurnal and weekly patterns  Users are idle in most online time: # of chat conversations << # of online users  User online duration does not follow Weibull distribution –Most user sessions: login and logout events are highly related with working hours –Long duration user sessions (> 10 hours): users do not turn off computer when they leave work –Two-mode online duration distribution  Users can classified into three categories based on their online days: actively online, inactively online, and sporadically online

29 IBM Research, Network Server Systems Software © 2006 IBM Corporation 29 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

30 IBM Research, Network Server Systems Software © 2006 IBM Corporation 30 Characterizing IM servers RTT SRT sniffer IM server IM client Server response time measurement Purpose: A first step to understanding the server load from client side CRT: client perceived response time SRT: server response time of MSNP commands RTT: packet round trip time (get from TCP handshake) CRT

31 IBM Research, Network Server Systems Software © 2006 IBM Corporation 31 MSN server response time  Response time for the first MSNP command of a TCP connection –RTT is still accurate –Reflects the server load  Some commands are responded with a long latency Dispatch serverNotification serverSwitchboard server

32 IBM Research, Network Server Systems Software © 2006 IBM Corporation 32 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

33 IBM Research, Network Server Systems Software © 2006 IBM Corporation 33 Message level analysis of IM traffic  Inbound traffic >> outbound traffic  # of msgs: chat < hint < presence (AIM hint msg is small because OSCAR is binary based)  MSN has more bin msgs for user icons, voice/video chats AIMMSN

34 IBM Research, Network Server Systems Software © 2006 IBM Corporation 34 Size of chat messages  AIM: messages are in html format (not extracted online)  MSN: format is described in message header and easy to remove  MSN: 90% messages are smaller than 50 bytes CDF (semi-log scale)CCDF (log-log scale) < 50 bytes

35 IBM Research, Network Server Systems Software © 2006 IBM Corporation 35 Number of messages in a user conversation CDF (semi-log scale)CCDF (Weibull scale) 2540 90%  Most conversations have small number of messages  The number of msg in a conversation –Not power law –Follows Weibull distribution approximately

36 IBM Research, Network Server Systems Software © 2006 IBM Corporation 36 Number of messages in a user conversation AIMMSN Weibull fitting results

37 IBM Research, Network Server Systems Software © 2006 IBM Corporation 37 Number of conversations by a user  Most users have small number of conversations  Number of user conversations –Not power law –Follows Weibull distribution approximately CDF (semi-log scale)CCDF (Weibull scale)

38 IBM Research, Network Server Systems Software © 2006 IBM Corporation 38 Distribution of MSN conversation duration < 200 sec Most conversations are short MSN client will disconnect to the SB server after a long idle time

39 IBM Research, Network Server Systems Software © 2006 IBM Corporation 39 IM social network: number of users contacted Rank (log-log scale)CCDF (Weibull scale)  Users in buddy list –Contact list packets may be lost or cannot completely parsed by IM sniffer  Users chat with –IM spammers  MSN: Weibull, AIM: a little rough

40 IBM Research, Network Server Systems Software © 2006 IBM Corporation 40 Number of buddies an IM user chats with  A user only contacts with a small portion of of buddies in its contact list  MSN users are more active? –Not sure, we do not count AIM Triton users MSNAIM A user chat with 5.5 buddies (about 25%) in average A user chat with 1.9 buddies (about 7%) in average

41 IBM Research, Network Server Systems Software © 2006 IBM Corporation 41 Concluding remarks  IM sniffer and measurement –Packet level –User privacy protection  IM traffic characterization –Diurnal and weekly patterns of IM traffic –The traffic volume a client receives is much greater than it sends –Chat msgs only account for a small percentage of total msgs –Online activity of IM users –Messages in conversations: Weibull –Conversations of users: Weibull –Social network: Weibull roughly

42 IBM Research, Network Server Systems Software © 2006 IBM Corporation 42 Future work  Implement IM sniffer in Linux kernel –For heavy workload collection  Larger scale measurement in Cornell University –Larger user population, dominated by students  Collect SameTime workload on the server side –Understand IM servers better –How IM is used in work cooperation: a global map of IM user social network


Download ppt "IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio."

Similar presentations


Ads by Google