IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio.

Slides:



Advertisements
Similar presentations
W3C Workshop on Web Services Mark Nottingham
Advertisements

SIMPLE Presence Traffic Optimization and Server Scalability Vishal Kumar Singh Henning Schulzrinne Markus Isomaki Piotr Boni IETF 67, San Diego.
CMSC 414 Computer and Network Security Lecture 26 Jonathan Katz.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.
Implementation of a Two-way Authentication Protocol Using Shared Key with Hash CS265 Sec. 2 David Wang.
Analysis of a Campus-wide Wireless Network David Kotz Kobby Essien Dartmouth College September 2002.
An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.
Real-Time Authentication Using Digital Signature Schema Marissa Hollingsworth BOISECRYPT ‘09.
Instant Messaging Questions welcome after session.
Instant Messaging Internet Technologies and Applications.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences 1 IM Analysis Shruti.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 Measurement-based Characterization of a Collection of On-line Games Chris Chambers Wu-chang Feng Portland State University Sambit Sahu Debanjan Saha.
Introduction to the Application Layer Computer Networks Computer Networks Spring 2012 Spring 2012.
1 TCP Traffic Analysis in cooperation with Motorola Todd DeSantis and David Loose Advisor: Professor Mark Claypool Co-Advisor: Professor Robert Kinicki.
1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹ Dept. of Computer Science, University.
1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason.
Presence Applications in the Real World Patrick Ferriter VP of Product Marketing.
IMonitor Software About IMonitorSoft Since the year of 2002, coming with EAM Security Series born, IMonitor Security Company stepped into the field of.
CIS679: RTP and RTCP r Review of Last Lecture r Streaming from Web Server r RTP and RTCP.
A measurement study of vehicular internet access using in situ Wi-Fi networks Vladimir Bychkovsky, Bret Hull, Allen Miu, Hari Balakrishnan, and Samuel.
 Introduction  VoIP  P2P Systems  Skype  SIP  Skype - SIP Similarities and Differences  Conclusion.
Presentation on Osi & TCP/IP MODEL
1 Understanding VoIP from Backbone Measurements Marco Mellia, Dario Rossi Robert Birke, and Michele Petracca INFOCOM 07’, Anchorage, Alaska, USA Young.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Bluetooth POP3 Relay Project Benjamin Kennedy April 30 th, 2002.
Instant Messaging for the Workplace A pure collaborative communication tool that does not distract users from their normal activities.
Advanced Computer Networks1 Efficient Policies for Carrying Traffic Over Flow-Switched Networks Anja Feldmann, Jenifer Rexford, and Ramon Caceres Presenters:
Skype P2P Kedar Kulkarni 04/02/09.
Instant Messaging for the Workplace A pure collaborative communication tool that does not distract users from their normal activities.
What makes a network good? Ch 2.1: Principles of Network Apps 2: Application Layer1.
Network Services Networking for Home & Small Business.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.
Forensic and Investigative Accounting Chapter 14 Internet Forensics Analysis: Profiling the Cybercriminal © 2005, CCH INCORPORATED 4025 W. Peterson Ave.
I. Basic Network Concepts. I.1 Networks Network Node Address Packet Protocol.
2: Application Layer 1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Copyright © 2006 TietoEnator Corporation Using Community Tools To Improve Team Work Magnus Einarsson.
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
An Experimental Study of the Skype Peer-to-Peer VoIP System Saikat Guha, Cornell University Neil DasWani, Google Ravi Jain, Google IPTPS ’ 06 Presenter:
A Case Study: UIM The Universal Instant Messenger Babak Esfandiari Carleton University SYSC 5800 Winter 2003.
© 2006 Cisco Systems, Inc. All rights reserved.1 Connection 7.0 Serviceability Reports Todd Blaisdell.
Internet Measurment Multimedia 1. Properties Challenges Tools State of the Art 2.
An analysis of Skype protocol Presented by: Abdul Haleem.
Customizing Pidgin for Library Services Pam Sessoms Electronic Reference Services Librarian Aim: SessomsPam Google Talk:
Lectu re 1 Recap: “Operational” view of Internet r Internet: “network of networks” m Requires sending, receiving of messages r protocols control sending,
Voice over IP B 林與絜.
Security components of the CERN farm nodes Vladimír Bahyl CERN - IT/FIO Presented by Thorsten Kleinwort.
Copyright © 2012 Kendall Electric, Inc. All rights reserved.
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
IM Power Project Summer 2007 Raye Gomez April Wensel Heather Tomko Jen Mankoff (mentor) Anind Dey.
2 pt 3 pt 4 pt 5pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2pt 3 pt 4pt 5 pt 1pt 2pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4pt 5 pt 1pt Internet History Computer Networks.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Application Layer – Lecture.
External Messaging Services. Page 2 External Messaging: Extends the power of Presence and Instant Messaging outside corporate Network Provided only to.
#16 Application Measurement Presentation by Bobin John.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Instant Messaging. Magnitude of the Problem Radicati reports that 85% of enterprises today use IM. Furthermore, Radicati predicts IM usage increases will.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.
Skype.
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Principles of Network Applications
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Traffic Analysis– Traffic Forensic Example
Measurement-based Characterization of a Collection of On-line Games
Intrusion Detection Systems
Presentation transcript:

IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey 2006 autumn intern presentation

IBM Research, Network Server Systems Software © 2006 IBM Corporation 2 Instant messaging  Quick response  User presence service  Interactive communication  Multitasking  Private chat  Enterprise cooperation AIM: 53 M users MSN: 29 M users Jabber: 13.5 M users SameTime:15M users Skype: 7 M QQ: 20 M Peak online users

IBM Research, Network Server Systems Software © 2006 IBM Corporation 3 Challenges of IM measurements  No large scale measurement study on IM traffic characterization so far  No server logs –In contrast to Web and streaming media servers  Difficulty of online packet analysis  User privacy concerns

IBM Research, Network Server Systems Software © 2006 IBM Corporation 4 Our Objective and Methodology  First large scale IM traffic measurement –IM system design and optimization –Experimental basis for IM workload generation –Security in IM network  Online IM traffic parser with the protection of user privacy related information –Packet level workloads of AIM and MSN Messenger (by port number) –Packet headers of Yahoo and GTalk/Jabber (by port number) –Nearly one month in a large enterprise network with thousands of employees –More than 20,000 user conversations by 469 AIM users and 408 MSN users

IBM Research, Network Server Systems Software © 2006 IBM Corporation 5 Dump data pcap format file IM Sniffer  MSNP  AIM protocol –Classic: OSCAR –Triton: new, N/A 10% AIM traffic Network interface OS kernel pcap library Online packet reconstructor AIM packet parser MSN packet parser Offline analysis Ethernet packets Protect user privacy information IM packet 1 IM packet 2 hello, how are you doing 4d347c1b: e51c49a1043fc IP packets MD5 hash

IBM Research, Network Server Systems Software © 2006 IBM Corporation 6 Instant messaging in AIM  Authentication  Redirection  User-to-user chat  Multi-user chat  P2P communication Authentication server BOS server Chat room server P2P voice/video chat, file transferring server Buddy icon server … Other services

IBM Research, Network Server Systems Software © 2006 IBM Corporation 7 Instant messaging in MSN Messenger Switchboard server Dispatch server Notification server P2P voice/video chat, file transferring MSN passport server server … Other services

IBM Research, Network Server Systems Software © 2006 IBM Corporation 8 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

IBM Research, Network Server Systems Software © 2006 IBM Corporation 9 Overview of IM traffic For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends. MB Traffic volume# of packets with TCP payload x10 6

IBM Research, Network Server Systems Software © 2006 IBM Corporation 10 IM servers in our workloads The number of IM servers is very large Total # of server IPs collectedCum. # of server IPs collected over time

IBM Research, Network Server Systems Software © 2006 IBM Corporation 11 IM TCP connections Number of TCP requestsFailed TCP requests (%) The percentage of failed TCP requests is non-trivial

IBM Research, Network Server Systems Software © 2006 IBM Corporation 12 IM traffic rate IM traffic rate (sampled per minute) IM traffic rate (sampled per hour) IM traffic is highly bursty: a lot of spikes 8.9 Kbps in average

IBM Research, Network Server Systems Software © 2006 IBM Corporation 13 IM traffic rate Hourly traffic rate of AIM Hourly traffic rate of MSN Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring

IBM Research, Network Server Systems Software © 2006 IBM Corporation 14 IM traffic rate Hourly traffic rate of Yahoo Hourly traffic rate of GTalk GTalk traffic rate has clear diurnal and weekly pattern, due to the less use of voice/video chat and file transfers

IBM Research, Network Server Systems Software © 2006 IBM Corporation 15 Summary of IM traffic overview  The traffic volume a client receives from IM servers is much greater than that it sends (Yahoo is an exception)  A large number of servers are used for IM services  The failure ratio of IM TCP connections is non-trivial  IM traffic is highly bursty due to voice/video chat and file transfers

IBM Research, Network Server Systems Software © 2006 IBM Corporation 16 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

IBM Research, Network Server Systems Software © 2006 IBM Corporation 17 Online session and chat conversation: AIM  Online session duration –Login time to logout/disconnect time –Duration of TCP connection to BOS server  Conversation –All messages are forwarded by the BOS server –Interleaved in a TCP connection together –5-minute threshold for msg inter-arrival time to identify a conversation > 5min conversations BOS server A B C> 5minAB1 AB2 AC1

IBM Research, Network Server Systems Software © 2006 IBM Corporation 18 Online session and chat conversation: MSN  Online session duration –Login time to logout/disconnect time –Duration of TCP connection to notification server  Conversation –Each conversation is forwarded by a new switchboard server –Disconnect automatically if idle > 5min –Removing conversations without chat messages Switchboard server Notification server

IBM Research, Network Server Systems Software © 2006 IBM Corporation 19 Online activity of AIM users Number of online users Number of simultaneous chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM # of chat conversations << # of online users 120 users 12 chat conversations

IBM Research, Network Server Systems Software © 2006 IBM Corporation 20 Online activity of MSN users Number of online users Number of simultaneous chat conversations 90 users 14 chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break) # of chat conversations << # of online users

IBM Research, Network Server Systems Software © 2006 IBM Corporation 21 Number of conversations per user  Users are idle in most time  Few users chatting simultaneously with two buddies AIMAIM MSNMSN average: average: 0.075

IBM Research, Network Server Systems Software © 2006 IBM Corporation 22 Distribution of user online duration AIMMSN  Weibull distribution has been reported by a P2P study (IMC 2006)  Cumulative probability distribution: P (X > x) = exp[-(x/x 0 ) c ]  log(–log P) = log[(x/x 0 ) c ] = c log x – c log x 0  straight line: not well fit

IBM Research, Network Server Systems Software © 2006 IBM Corporation 23 Online duration of IM user sessions CDF CCDF  Two-mode distribution  10 hours – the divide between long online durations and short online durations

IBM Research, Network Server Systems Software © 2006 IBM Corporation 24 Online activity of AIM users Login eventsLogout events Peak time: about 9:00 AMPeak time: about 5:00 PM

IBM Research, Network Server Systems Software © 2006 IBM Corporation 25 Online activity of MSN users Login eventsLogout events Peak time: about 9:00 AMPeak time: about 5:00 PM

IBM Research, Network Server Systems Software © 2006 IBM Corporation 26 The 10-hour divide of online duration AIMAIM MSNMSN Login eventsLogout events 10 hours Online time roughly 10 hours: some employees working longer than 8 hours Online time longer than 10 hours: users do not turn off computer when leaving work

IBM Research, Network Server Systems Software © 2006 IBM Corporation 27 Number of online days AIM MSN  Not a heavy-tailed distribution, show user activity in another perspective –Inactive users: online occasionally –Active users: online every weekday –Random users: online sporadically

IBM Research, Network Server Systems Software © 2006 IBM Corporation 28 Summary of user online activity  Number of online users and simultaneous chat conversations have clearly diurnal and weekly patterns  Users are idle in most online time: # of chat conversations << # of online users  User online duration does not follow Weibull distribution –Most user sessions: login and logout events are highly related with working hours –Long duration user sessions (> 10 hours): users do not turn off computer when they leave work –Two-mode online duration distribution  Users can classified into three categories based on their online days: actively online, inactively online, and sporadically online

IBM Research, Network Server Systems Software © 2006 IBM Corporation 29 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

IBM Research, Network Server Systems Software © 2006 IBM Corporation 30 Characterizing IM servers RTT SRT sniffer IM server IM client Server response time measurement Purpose: A first step to understanding the server load from client side CRT: client perceived response time SRT: server response time of MSNP commands RTT: packet round trip time (get from TCP handshake) CRT

IBM Research, Network Server Systems Software © 2006 IBM Corporation 31 MSN server response time  Response time for the first MSNP command of a TCP connection –RTT is still accurate –Reflects the server load  Some commands are responded with a long latency Dispatch serverNotification serverSwitchboard server

IBM Research, Network Server Systems Software © 2006 IBM Corporation 32 Outline  Overview of IM traffic  Online activity of IM users  Characterizing IM servers  Analysis of IM traffic  Conclusion

IBM Research, Network Server Systems Software © 2006 IBM Corporation 33 Message level analysis of IM traffic  Inbound traffic >> outbound traffic  # of msgs: chat < hint < presence (AIM hint msg is small because OSCAR is binary based)  MSN has more bin msgs for user icons, voice/video chats AIMMSN

IBM Research, Network Server Systems Software © 2006 IBM Corporation 34 Size of chat messages  AIM: messages are in html format (not extracted online)  MSN: format is described in message header and easy to remove  MSN: 90% messages are smaller than 50 bytes CDF (semi-log scale)CCDF (log-log scale) < 50 bytes

IBM Research, Network Server Systems Software © 2006 IBM Corporation 35 Number of messages in a user conversation CDF (semi-log scale)CCDF (Weibull scale) %  Most conversations have small number of messages  The number of msg in a conversation –Not power law –Follows Weibull distribution approximately

IBM Research, Network Server Systems Software © 2006 IBM Corporation 36 Number of messages in a user conversation AIMMSN Weibull fitting results

IBM Research, Network Server Systems Software © 2006 IBM Corporation 37 Number of conversations by a user  Most users have small number of conversations  Number of user conversations –Not power law –Follows Weibull distribution approximately CDF (semi-log scale)CCDF (Weibull scale)

IBM Research, Network Server Systems Software © 2006 IBM Corporation 38 Distribution of MSN conversation duration < 200 sec Most conversations are short MSN client will disconnect to the SB server after a long idle time

IBM Research, Network Server Systems Software © 2006 IBM Corporation 39 IM social network: number of users contacted Rank (log-log scale)CCDF (Weibull scale)  Users in buddy list –Contact list packets may be lost or cannot completely parsed by IM sniffer  Users chat with –IM spammers  MSN: Weibull, AIM: a little rough

IBM Research, Network Server Systems Software © 2006 IBM Corporation 40 Number of buddies an IM user chats with  A user only contacts with a small portion of of buddies in its contact list  MSN users are more active? –Not sure, we do not count AIM Triton users MSNAIM A user chat with 5.5 buddies (about 25%) in average A user chat with 1.9 buddies (about 7%) in average

IBM Research, Network Server Systems Software © 2006 IBM Corporation 41 Concluding remarks  IM sniffer and measurement –Packet level –User privacy protection  IM traffic characterization –Diurnal and weekly patterns of IM traffic –The traffic volume a client receives is much greater than it sends –Chat msgs only account for a small percentage of total msgs –Online activity of IM users –Messages in conversations: Weibull –Conversations of users: Weibull –Social network: Weibull roughly

IBM Research, Network Server Systems Software © 2006 IBM Corporation 42 Future work  Implement IM sniffer in Linux kernel –For heavy workload collection  Larger scale measurement in Cornell University –Larger user population, dominated by students  Collect SameTime workload on the server side –Understand IM servers better –How IM is used in work cooperation: a global map of IM user social network