SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo.

Slides:



Advertisements
Similar presentations
Code-Red : a case study on the spread and victims of an Internet worm David Moore, Colleen Shannon, Jeffery Brown Jonghyun Kim.
Advertisements

A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
SIEM Based Intrusion Detection Jim Beechey May 2010 GSEC, GCIA, GCIH, GCFA, GCWN twitter: jim_beechey.
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Application Layer – Lecture.
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Lesson 18-Internet Architecture. Overview Internet services. Develop a communications architecture. Design a demilitarized zone. Understand network address.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Measurement and Diagnosis of Address Misconfigured P2P traffic Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic Lab for Internet and Security.
School of Computer Science and Information Systems
FIREWALLS & NETWORK SECURITY with Intrusion Detection and VPNs, 2 nd ed. 6 Packet Filtering By Whitman, Mattord, & Austin© 2008 Course Technology.
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Assessing the Nature of Internet traffic: Methods and Pitfalls Wolfgang John Chalmers University of Technology, Sweden together with Min Zhang Beijing.
Licentiate Seminar: On Measurement and Analysis of Internet Backbone Traffic Wolfgang John Department of Computer Science and Engineering Chalmers University.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols Network Fundamentals – Chapter.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research.
Department Of Computer Engineering
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
KaZaA: Behind the Scenes Shreeram Sahasrabudhe Lehigh University
EVERYWHERE: IMPACT OF DEVICE AND INFRASTRUCTURE SYNERGIES ON USER EXPERIENCE Cost TMA – Figaro - NSF Alessandro Finamore Marco Mellia Maurizio Munafò Sanjay.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Port Knocking Software Project Presentation Paper Study – Part 1 Group member: Liew Jiun Hau ( ) Lee Shirly ( ) Ong Ivy ( )
Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.
Why do we need Firewalls? Internet connectivity is a must for most people and organizations  especially for me But a convenient Internet connectivity.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
Privacy in P2P based Data Sharing Muhammad Nazmus Sakib CSCE 824 April 17, 2013.
IP Ports and Protocols used by H.323 Devices Liane Tarouco.
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
Chapter 6: Packet Filtering
P.1Service Control Technologies for Peer-to-peer Traffic in Next Generation Networks Part2: An Approach of Passive Peer based Caching to Mitigate P2P Inter-domain.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Connecting to the Network Networking for Home and Small Businesses.
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
What makes a network good? Ch 2.1: Principles of Network Apps 2: Application Layer1.
Windows 7 Firewall.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.
Crossing firewalls Liane Tarouco Leandro Bertholdo RNP POP/RS.
MonNet – a project for network and traffic monitoring Detection of malicious Traffic on Backbone Links via Packet Header Analysis Wolfgang John and Tomas.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Review for Exam 4 School of Business Eastern Illinois University © Abdou Illia, Fall 2004.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
An analysis of Skype protocol Presented by: Abdul Haleem.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
5 Firewalls in VoIP Selected Topics in Information Security – Bazara Barry.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
Advanced Packet Analysis and Troubleshooting Using Wireshark 23AF
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
Reading TCP/IP Protocol. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also.
Performance Limitations of ADSL Users: A Case Study Matti Siekkinen, University of Oslo Denis Collange, France Télécom R&D Guillaume Urvoy-Keller, Ernst.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Large-Scale Monitoring of DHT Traffic Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel.
#16 Application Measurement Presentation by Bobin John.
Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Network Processing Systems Design
Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)
Introducing To Networking
Monitoring Network Bias
Lecture 3: Secure Network Architecture
The Case for DDoS Resistant Membership Management in P2P Systems
Unconstrained Endpoint Profiling (Googling the Internet)‏
Presentation transcript:

SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo † * Internet Systems Lab, Department of ECE, Purdue University, USA † Department of Electronics and Telecommunications, Politecnico di Torino, Italy

2 SIGMETRICS'09 Rapid Evolution of P2P Networks Peer-to-peer (P2P) systems are huge, complex and with millions of participants.  Over 60% of network traffic is due to P2P systems. Used for many different applications.  File sharing – BitTorrent, eMule.  VoIP – Skype.  Video streaming – PPlive. Matured to the point there are commercial offerings.

3 SIGMETRICS'09 Undesirable Behavior in P2P Networks Most of the research is on P2P systems design and characterization. Shift attention to the impact P2P systems may have on the Internet. Our focus is on identifying undesirable behavior.  Patterns not expected, not intended or unwanted by developers, users or network operators. Potential for undesirable behavior due to:  Millions of users.  Completely distributed.  Software bugs.  Malicious clients.  Security vulnerabilities.

4 SIGMETRICS'09 Our Contributions One of the first works to show that undesirable behavior exists, is prevalent and significant.  Evidence of DDoS attacks exploiting P2P clients.  Significant waste of ISP resources.  Impact of application/user performance. Expose problems in the context of a traffic trace of a large ISP.  More than 5 million customers. One of the first systematic approaches to uncover undesirable behavior in P2P systems.

5 SIGMETRICS'09 Talk Outline Dataset Methodology Results

6 SIGMETRICS'09 Setup Traces obtained from large European ISP. ISP provides ADSL (20/1Mbps) or Fiber (10/10Mbps). Extensive usage of NATs in the ISP  Peering point (Most clients in the ISP have private IP addresses).  Home NAT.

7 SIGMETRICS'09 Setup Traces obtained from large European ISP. ISP provides ADSL (20/1Mbps) or Fiber (10/10Mbps). Extensive usage of NATs in the ISP  Peering point (Most clients in the ISP have private IP addresses).  Home NAT. Packet traces collected from a PoP within the ISP network.  There are more than 2000 customers in the PoP.

8 SIGMETRICS'09 eMule Traffic is Predominant in the PoP eMule is a popular P2P file sharing application. Over an entire 3 month period:  60-70% of inbound traffic to PoP is due to eMule.  95% of outbound traffic is due to eMule. eMule consists of two networks:  Kad - decentralized DHT-based network. UDP-based and mainly used for file search.  ED2K - centralized tracker-based network. TCP-based and used for both search and data exchange.

9 SIGMETRICS'09 Systems Analyzed 1. Generic eMule, which we refer to as Kad. 2. Version of eMule customized to ISP, which we refer to as KadU. Modified version of Kad developed by users in the ISP. Avoid performance problems because of the NAT at the edge of the network. Difference: KadU clients only contact other clients within the ISP. These two systems are analyzed separately because they have different characteristics.  e.g. Performance of KadU clients is much better.

10 SIGMETRICS'09 High-Level Statistics of Dataset Analyzed 25 hours dataset. 478 kadU clients inside the PoP contact 229,000 kadU clients inside ISP. 136 Kad clients inside the PoP contact more than 300,000 Kad clients in the Internet. 815,000 ED2K TCP connections. More than 8 million Kad/KadU UDP flows.

11 SIGMETRICS'09 Traffic Classification and Samples Generation Per host aggregation of flows Samples Packet trace Per flow classification using Tstat Tstat is a Passive sniffer with Deep Packet Inspection (DPI) capabilities Aggregate over 5 minute period Metrics

12 SIGMETRICS'09 Metrics More than 50 metrics obtained from flow records.  Consider both TCP and UDP flows.  Consider if the flow initiator is inside or outside the PoP. Examples:  Flow: average flow duration.  Data Transfer: bps sent, bps received.  Destinations: number of distinct destination IP addresses.  Failures: failure ratio [TCP only]. Choice of metric:  Intuitively important.  Used in the past in the context of P2P systems.  Can capture specific behaviors of interest to us.

13 SIGMETRICS'09 Challenges Very little knowledge of what kinds of undesirable behavior may exist. It is hard to clearly distinguish between normal and unwanted behavior.  P2P traffic patterns are very heterogeneous across users. Techniques relying on detecting abrupt changes may not work since undesirable behavior can:  Be exhibited by the majority of the samples.  Last throughout the observation period. e.g. due to implementation bug in the P2P system.

14 SIGMETRICS'09 Our Approach We use clustering techniques and manual inspection to determine undesirable behavior. Clustering:  Tens of thousands of samples and more than 50 metrics.  Clustering reduces the number of samples to study to a granularity of clusters. Domain knowledge and manual inspection:  Select regions of interest.  Interpret the results.

15 SIGMETRICS'09 Clustering - DBScan DBScan is a density based clustering technique.  Dense regions of points are considered a cluster.  Low density regions are considered noise. Parameter tuning and sensitivity discussed in the paper. Cluster1 Cluster2 Cluster3 Noise Number of Samples Average Packet Size [bytes]

16 SIGMETRICS'09 Selecting Regions of Interest - Metrics with more than One Cluster Metrics with more than one cluster and noise.  A cluster and/or noise are selected as interesting. Cluster1 Cluster2 Cluster3 Noise Number of Samples Average Packet Size [bytes] clients only send control messages

17 SIGMETRICS'09 Selecting Regions of Interest - Metrics with One Cluster Metrics with one cluster and noise.  Noise is typically selected as interesting. Number of Samples Cluster1: Normal clients Bits per Second Sent noise very active clients x10 5

18 SIGMETRICS'09 Correlating Interesting Samples Once samples in interesting regions are identified, infer undesirable behavior. Find the hosts that generate the interesting samples.  If a few hosts, anomalous behavior is a property of the hosts.  If many hosts, behavior is general to the application. Find correlation across metrics.  Rely on domain knowledge to identify this.  Ongoing work exploring use of techniques like rule association mining.

19 SIGMETRICS'09 Talk Outline Dataset Methodology Results  Generic Observations  Key Findings

20 SIGMETRICS'09 Preliminary Results For Kad:  Most metrics have one cluster and noise.  8 metrics have two clusters and noise.  2 metrics have three clusters and noise. Similar results for KadU. Sensitivity study.  Night period and day period.  One week trace.  Obtained very similar results.

21 SIGMETRICS'09 Samples Distribution in the Interesting Region Fraction of Hosts Generating Samples Fraction of Samples in the Interesting Region A few hosts have abnormal behavior. Abnormality spread across many hosts (circled below). Number of destination ports in range that receive a kad flow

22 SIGMETRICS'09 Talk Outline Dataset Methodology Results  Generic Observations  Key Findings

23 SIGMETRICS'09 DDoS Attacks Exploiting Kad Considered UDP flows classified as Kad with destination port in range > 50% of these flows are sent to port 53 (DNS).  > 90% of these flows are unanswered. Top most destinations were reported to be under attack. Port 53 Fraction of Unanswered UDP Flows Port Number Port 4672: Default Kad port Unanswered UDP flows are those in which the flow destination never replies.

24 SIGMETRICS'09 DDoS Attack Exploiting P2P Systems Redirection Attacks.  Malicious clients inject fake membership information about a victim into the system.  Innocent clients send normal protocol message to the victim. There has been some awareness of the problem in the research community - Belovin [2001], Ross [2006].  They have shown theoretical feasibility of doing the attack. But our work is one of the first to show that these attacks are prevalent in the wild.

25 SIGMETRICS'09 Unnecessary P2P Traffic in KadU and Kad Cluster2: Most incoming UDP flows are unanswered Cluster1 Noise Fraction of Unanswered Flows from Total Incoming UDP Flows Fraction of Samples in the Interesting Region Fraction of Hosts Generating Samples

26 SIGMETRICS'09 Unnecessary P2P Traffic in KadU and Kad Large amount of wasted traffic:  28% of all UDP flows incoming to PoP are unanswered. 65% due to Kad and KadU.  30% of all TCP connections incoming to the PoP fail. 50% due to KadU. Due to two reasons:  Stale membership information.  Nodes behind NAT. Staleness can be extremely long lived (e.g. tens of hours).

27 SIGMETRICS'09 Malicious P2P Trackers in the ED2K Network Metric: average number of TCP connections per destination IP. 94% of interesting samples generated by two hosts. Many short lived connections to two trackers reported as malicious.  Never responded to requests and closed the connections.  Likely deployed by copyright agencies (e.g. RIAA, IFPI). Similar findings by Banerjee [2008] and Siganos [2009]. Noise: Clients contact same destination more than once in 5 minutes Cluster1 Average Connections per Destination

28 SIGMETRICS'09 Generalizing to Other Systems Findings in BitTorrent:  Very significant amount unnecessary P2P traffic is present as in KadU. Findings in Direct Connect:  Possible DDoS attack exploiting DC++. Many TCP connections sent to port 80 of real web servers. More findings in the paper. Ongoing work studying traces from other networks.

29 SIGMETRICS'09 Summary One of the first works to systematically study P2P traffic to identify undesirable behavior. Shown various types of undesirable behavior of P2P systems in the wild:  DDoS attack on external servers exploiting the system.  Wasted resources.  Affect the performance of the P2P system (e.g. malicious trackers). Shown the potential of a systematic approach to uncover this behavior. Our initial analysis suggest that results hold over a range of other P2P systems.

30 SIGMETRICS'09 Questions?

31 SIGMETRICS'09 Backup Slides

32 SIGMETRICS'09 Encrypted Traffic in eMule :29 eMule 0.49a released :25 eMule 0.49b released Our trace collection

33 SIGMETRICS'09 Why DBScan? Does not rely on the assumption of the shape of the cluster. There is the concept of noise region You don’t need to know how many clusters you want ahead of time. But, in principle, any technique can be used. Just need a coarse way to cluster samples

34 SIGMETRICS'09 DBScan - Parameter Sensitivity We adjust the parameters to match our intuition of where the clusters should be if manually look at each metric.  Try to keep noise region small but not too small (at most 6% of the samples in our study). We have an automated way to get clusters.  More details in the paper

35 SIGMETRICS'09 Clustering for single metric instead of multiple metrics Clusters interpretation may be harder.  Typical metric distribution is very skewed.  Metrics distribution have different support. Single clustering still helps.  Automatic way to get thresholds for interesting region.  First cut observations. But this is a first step. Ongoing work on multi-metric analysis.

36 SIGMETRICS'09 Do you think you find all behavior or there is more? We expect there is more (so there is more work to do).  But we expect we have caught first order issues This is the first attempt on this direction. We don’t have an exhaustive of undesirable behavior  There may be other behavior we could find when the application or network setup changes.  For example, buddy problem. More to the architecture of Kad.

37 SIGMETRICS'09 How can you automated these, generalized to different network? First step pointing to the importance of the problem Now that is there, we could look at better ways to detect:  Changes over time  Changes across networks  For a class of P2P systems, use same list of undesirable behavior.