Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.

Slides:



Advertisements
Similar presentations
Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008
Advertisements

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic.
An Introduction of Botnet Detection – Part 2 Guofei Gu, Wenke Lee (Georiga Tech)
BASIC CRYPTOGRAPHY CONCEPT. Secure Socket Layer (SSL)  SSL was first used by Netscape.  To ensure security of data sent through HTTP, LDAP or POP3.
Predicting Tor Path Compromise by Exit Port IEEE WIDA 2009December 16, 2009 Kevin Bauer, Dirk Grunwald, and Douglas Sicker University of Colorado Client.
Winter CMPE 155 Week 7. Winter Assignment 6: Firewalls What is a firewall? –Security at the network level. Wide-area network access makes.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
IPlane: An Information Plane for Distributed Services Offence by: Anup Goyal Sagar Vemuri.
Hardware Firewalls: Advanced Feature © N. Ganesan, Ph.D.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
Traffic Engineering With Traditional IP Routing Protocols
Multi-level Application-based Traffic Characterization in a Large-scale Wireless Network Maria Papadopouli 1,2 Joint Research with Thomas Karagianis 3.
NetQuest: A Flexible Framework for Internet Measurement Lili Qiu Joint work with Mike Dahlin, Harrick Vin, and Yin Zhang UT Austin.
1 GENI: Global Environment for Network Innovations Jennifer Rexford Princeton University
Measurement in the Internet. Outline Internet topology Bandwidth estimation Tomography Workload characterization Routing dynamics.
Internet In A Slice Andy Bavier CS461 Lecture.
Application Identification in Information-poor Environments Charalampos (Haris) Rotsos Computer Laboratory University of Cambridge
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Anonymity on the Web: A Brief Overview By: Nipun Arora uni-na2271.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.
Automated malware classification based on network behavior
Network Planète Chadi Barakat
A fast identification method for P2P flow based on nodes connection degree LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI- DONG MA Apperceiving Computing and.
Towards Modeling Legitimate and Unsolicited Traffic Using Social Network Properties 1 Towards Modeling Legitimate and Unsolicited Traffic Using.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Section 11.1 Identify customer requirements Recommend appropriate network topologies Gather data about existing equipment and software Section 11.2 Demonstrate.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
Vulnerabilities in peer to peer communications Web Security Sravan Kunnuri.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
MonNet – a project for network and traffic monitoring Detection of malicious Traffic on Backbone Links via Packet Header Analysis Wolfgang John and Tomas.
1 Impact of IT Monoculture on Behavioral End Host Intrusion Detection Dhiman Barman, UC Riverside/Juniper Jaideep Chandrashekar, Intel Research Nina Taft,
TDTS21: Advanced Networking Lecture 7: Internet topology Based on slides from P. Gill and D. Choffnes Revised 2015 by N. Carlsson.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Predicting Positive and Negative Links in Online Social Networks
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Server Performance, Scaling, Reliability and Configuration Norman White.
An analysis of Skype protocol Presented by: Abdul Haleem.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
April 4th, 2002George Wai Wong1 Deriving IP Traffic Demands for an ISP Backbone Network Prepared for EECE565 – Data Communications.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
1 Evaluating NGI performance Matt Mathis
Network Community Behavior to Infer Human Activities.
Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
Speaker: Hom-Jay Hom Date:2009/10/20 Botnet Research Survey Zhaosheng Zhu. et al July 28-August
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Transport layer identification of P2P traffic Victor Gau Yi-Hsien Wang
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
Also known as hardware/physi cal address Customer Computer (Client) Internet Service Provider (ISP) MAC Address Each Computer has: Given by NIC card.
An Effective Defense Against Spam Laundering Author: Mengjun Xie, Heng Yin, Haining Wang Presented At: CCS’ 06 Prepared By: Amit Shrivastava.
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
On-line Detection of Real Time Multimedia Traffic
New Directions in Routing
Monitoring Network Bias
Byung-Joon Lee and Youngseok Lee
Transport Layer Identification of P2P Traffic
Unconstrained Endpoint Profiling (Googling the Internet)‏
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Presentation transcript:

Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM CoNEXT, December 1 st 2010 Profiling-by-Association: A Resilient Traffic Profiling Solution for the Internet Backbone

Profiling Internet traffic Who is using my network and for what? Which applications are running in my network? 1 Internet Internet Service Provider (ISP) Application Breakdown Assign traffic to different applications Why is this useful? Traffic engineering Network planning

Profiling traffic is challenging There is a gap between what network administrators want and what existing tools can provide 2 What we want Traffic profiling results using deep packet inspection (data are from a peering link between two ISP is the US) What we get with existing tools We present a tool that: Profiles ALL the traffic Has high prediction accuracy (~90%)

Why traffic profiling is challenging? 3 Obfuscation at multiple levels Users and applications try to hide their traffic e.g., Peer-to-peer (P2P) Port Numbers Level-1 Use random ports Payload Signatures Flow Statistics Level-2 Encryption Level-3 Payload padding What existing profilers use:How to evade them:

Monitored link The more flows we see for a host, the easier is to profile him successfully [Kim et al. 2008] Profiling end-hosts is more robust, but … 4 Sensitive to partial visibility at the backbone Significantly affects behavioral host-profiling solutions BLINC [Karagiannis et al. 2005] Availability of information can be limited (e.g., P2P) Googling the Internet [Trestian et al. 2008] Profiler Easy for long lived servers, hard for short lived P2P IPs We need a tool that can profile traffic: 1. Even when ports, payload, and flows are obfuscated 2. At the backbone, where we have partial visibility 3. For P2P applications successfully, which is more challenging

Outline Introduction Profiling-by-Association (PBA) framework Our PBA-based profiling algorithms Experimental results Conclusions 5

Not all traffic is hard to profile Is easier to profile traffic from: 1. Popular servers (Web, , DNS, etc.) E.g., white lists, Googling the Internet [Trestian et al. 2008] 2. Some P2P hosts that do not hide their traffic 6 The default in many P2P clients is not to encrypt traffic. Some users keep these settings.

Connectivity does not lie We can exploit the “social” interactions of hosts E.g., P2P host tend to have many flows with other P2P hosts 7 P2P SMTP ( ) online game Graph representation of Internet traffic: - Nodes = IP addresses - Edges = TCP/UDP flows Traffic from a real-world ISP in the US P2P Our two key observations: 1. It is easy to profile some IP hosts 2. Social interactions among hosts contain valuable information

Our approach: Profiling-by-Association 8 A systematic way of utilizing our observations Network Traffic Initial Knowledge Phase A Seeding Nodes= IP addresses Edges= flows (TCP/UDP) Profiled Network Traffic Phase B Inference Use ONLY Connectivity (PBA) We no longer need: ports, payload, or flow features

Outline Introduction Profiling-by-Association (PBA) framework Our PBA-based profiling algorithms NLC (neighboring link classifier) HYP (hyper-graph classifier) CLUST, CSEED, C+NLC (in the paper) Experimental results Conclusions 9

1) The neighboring link classifier (NLC) Uses local structure of the graph Classify an edge using information from its neighbors ep 1 ep 2 u web x 0.5 +

After seeding, 10% edges labeled After NLC 1, 80% edges labeled After NLC 2, 90% edges labeled After NLC 3, 100% edges labeled The basic steps of NLC known host known host known host Profiled by association: 90% of edges

2) The HYP algorithm 12 P2P SMTP ( ) online game Two main steps: 1.Graph clustering: Use connectivity to identify communities 2.Exploit seeds: Use knowledge about few hosts to profile each community Known P2P Known gamers Known servers Uses global structure of the graph Community: A group of nodes in a graph that are more densely connected internally than with the rest of the graph. (The Louvain method by Blondel et al. outperformed other methods.)

2) The HYP algorithm (cont.) What if we have mixed clusters? Re-apply graph clustering to each such cluster Stop when we have a homogeneous cluster How do we profile clusters with no seeds? 13 ? HYPer-graph NLC

Outline Introduction Profiling-by-Association (PBA) framework Our PBA-based profiling algorithms Experimental results Conclusions 14

Evaluating at four backbone traces Seeding configurations 1.Randomly selected X% of IPs 2.Intentionally causing errors 3.Seeding using existing profilers BLINC, Coral Reef (in the paper) Evaluation Averaged over 20 runs Small standard error Ground truth: using a payload classifier Accuracy=

Comparing NLC and HYP on four trace HYP is more robust to the specifics of a trace 1% of hosts as seeds Accuracy This trace has more hosts with multiple applications

Results are from the BRAZ trace Our methods are robust to deficient seeds Few seeds Accuracy Hosts as seeds Bad seeds 40% with errors Accuracy Hosts as seeds We can make up for bad seeds using more seeds

Connectivity does not lie (except when it does) Hosts may try to evade the PBA profilers by: 1. Eliminating their associations It will defeat the very purpose of the application (e.g., P2P) 2. Confusing their associations P2P SMTP ( ) online game Open more connections towards other applications X = Total links from known P2P towards other applications We add more such links

HYP is robust to Connectivity Obfuscation We increase the number of observed connections from P2P hosts towards other applications k = how many times more connections we add 19 20x 200x k Results are from the BRAZ trace

Outline Introduction Profiling-by-Association (PBA) framework Our profiling algorithms Experimental results Conclusions 20

NLC is susceptible to connectivity obfuscation 21 Port Numbers Level-1 Use random ports Level-2 Level-3 Payload Signatures Flow Statistics Encryption Payload padding Level-4 Local Connectivity Random connections to servers HYP is robust to all four levels of obfuscation

Compared to the state-of-the-art HYP

Conclusions Users can change what they control Ports, payload, flow statistics, local connections Changing the global structure of connectivity is more challenging for evaders Our HYP algorithm shows robustness to all four levels of obscurations (ports, payload, flow, connectivity) Profiling by associations is a powerful new approach for profiling Internet backbone traffic ~90% accuracy with knowledge of only 1% of IP hosts 23

Thank You! Questions/Discussion? This work was sponsored by: