Unconstrained Endpoint Profiling (Googling the Internet)‏

Slides:



Advertisements
Similar presentations
Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008
Advertisements

Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Taming User-Generated Content in Mobile Networks via Drop Zones Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Measuring Serendipity: Connecting People, Locations and Interests in a Mobile 3G Network Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Monitoring the Network Connecting Networks.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 44 How Firewalls Work How Firewalls Work.
Firewalls : usage Data encryption Access control : usage restriction on some protocols/ports/services Authentication : only authorized users and hosts.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Multi-level Application-based Traffic Characterization in a Large-scale Wireless Network Maria Papadopouli 1,2 Joint Research with Thomas Karagianis 3.
1 A Comparison of Load Balancing Techniques for Scalable Web Servers Haakon Bryhni, University of Oslo Espen Klovning and Øivind Kure, Telenor Reserch.
CSCI 4550/8556 Computer Networks Comer, Chapter 3: Network Programming and Applications.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Countering Large-Scale Internet Pollution and Poisoning Aleksandar Kuzmanovic Northwestern University
NetFlow Analyzer Drilldown to the root-QoS Product Overview.
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Internet Traffic Analysis for Threat Detection Joshua Thomas, CISSP Thomas Conley, CISSP Ohio University Communication Network Services Joshua Thomas,
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Design and Implementation of SIP-aware DDoS Attack Detection System.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
Copyright © 2002 OSI Software, Inc. All rights reserved. PI-NetFlow and PacketCapture Eric Tam, OSIsoft.
What is FORENSICS? Why do we need Network Forensics?
P.1Service Control Technologies for Peer-to-peer Traffic in Next Generation Networks Part2: An Approach of Passive Peer based Caching to Mitigate P2P Inter-domain.
Dividing the Pizza An Advanced Traffic Billing System An Advanced Traffic Billing System Christopher Lawrence Burke The University of Queensland.
Intrusion Detection and Prevention. Objectives ● Purpose of IDS's ● Function of IDS's in a secure network design ● Install and use an IDS ● Customize.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Chapter 5: Implementing Intrusion Prevention
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
Content-oriented Networking Platform: A Focus on DDoS Countermeasure ( In incremental deployment perspective) Authors: Junho Suh, Hoon-gyu Choi, Wonjun.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Microsoft ISA Server 2000 Presented by Ricardo Diaz Ryan Fansa.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Virtual Dark IP for Internet Threat Detection Akihiro Shimoda & Shigeki Goto Waseda University
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Googling the Internet (and Beyond) Aleksandar Kuzmanovic EECS Department Northwestern University
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
18-1 PRENTICE HALL ©2008 Pearson Education, Inc. Upper Saddle River, NJ FORENSIC SCIENCE An Introduction By Richard Saferstein.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Some Great Open Source Intrusion Detection Systems (IDSs)
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Internet Quarantine: Requirements for Containing Self-Propagating Code
Monitoring Persistently Congested Internet Links
DNS-sly: Avoiding Censorship through Network Complexity
Data Streaming in Computer Networking
Lightweight Application Classification for Network Management
Web Mining Ref:
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Introduction to Networking
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Monitoring Network Bias
Web Traffic Analysis Script PHP Web Traffic Analysis Script PHP Web Traffic Analysis Software.
Network Monitoring System
Chapter 8: Monitoring the Network
Chapter 4: Protecting the Organization
A tool for locating QoS failures on an Internet path
Read this to find out how the internet works!
Transport Layer Identification of P2P Traffic
The Internet and Electronic mail
Presented by Aaron Ballew
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Intelligent Network Services through Active Flow Manipulation
Presentation transcript:

Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University Narus Inc. http://networks.cs.northwestern.edu http://www.narus.com

Huge amount of endpoint information available on the web Introduction Can we use Google for networking research? Huge amount of endpoint information available on the web Can we systematically exploit search engines to harvest endpoint information available on the Internet?

Where Does the Information Come From? Some popular proxy services also display logs Even P2P information is available on the Internet since the first point of contact with a P2P swarm is a publicly available IP address Websites run logging software and display statistics Blacklists, banlists, spamlists also have web interfaces Popular servers (e.g., gaming) IP addresses are listed Malicious P2P Servers Clients

Inferring Active IP Ranges in Target Networks Problem – detecting active IP ranges Needed for large-scale measurements e.g., Iplane [OSDI’06] Current approaches Active Passive Our approach IP addresses that generate hits on Google could suggest active ranges

Detecting Application Usage Trends Can we infer what applications people are using across the world without having access to network traces?

Traffic Classification Problem – traffic classification Current approaches (port-based, payload signatures, numerical and statistical etc.) Our approach Use information about destination IP addresses available on the Internet

Methodology – Web Classifier and IP Tagging IP Address xxx.xxx.xxx.xxx 58.61.33.40 – QQ Chat Server Rapid Match IP tagging URL Hit text URL Hit text URL Hit text Domain name Keywords …. …. Domain name Keywords Search hits …. …. Website cache

Evaluation – Ground Truth France ● No traces available China ● Packet level traces available United States ● Sampled NetFlow available Brasil ● Packet level traces available

Inferring Active IP Ranges in Target Networks

Inferring Active IP Ranges in Target Networks Actual endpoints from trace Google hits XXX.163.0.0/17 network range Overlap is around 77%

Application Usage Trends

Correlation Between Network Traces and UEP

Traffic Classification

Traffic Classification 165.124.182.169 Mail server 193.226.5.150 Website 68.87.195.25 Router Tagged IP Cache 186.25.13.24 Halo server Hold a small % of the IP addresses seen Look at source and destination IP addresses and classify traffic Is this scalable?

5% of the destinations sink 95% of traffic Traffic Classification 5% of the destinations sink 95% of traffic

BLINC vs. UEP BLINC (SIGCOMM 2005) Works “in the dark” (doesn’t examine payload) Uses “graphlets” to identify traffic patterns Uses thresholds to further classify traffic UEP (SIGCOMM 2008) Uses information available on the web Constructs a semantically rich endpoint database Very flexible (can be used in a variety of scenarios)

BLINC vs. UEP (cont.) UEP also provides better semantics Traffic[%] UEP classifies twice as much traffic as BLINC BLINC doesn’t find some categories UEP also provides better semantics Classes can be further divided into different services

Each packet has a 1/Sampling rate chance of being kept Working with Sampled Traffic Sampled data is considered to be poorer in information However ISPs consider scalable to gather only sampled data X X X X X X X X Each packet has a 1/Sampling rate chance of being kept (Cisco Netflow)

Working with Sampled Traffic Most of the popular IP addresses still in the trace A quarter of the IP addresses still in the trace at sampling rate 100

Working with Sampled Traffic UEP maintains a large classification ratio even at higher sampling rates When no sampling is done UEP outperforms BLINC BLINC stays in the dark 2% at sampling rate 100 UEP retains high classification capabilities with sampled traffic

Endpoint Clustering Performed clustering of endpoints in order to cluster out common behavior Please see the paper for detailed results Real strength: We managed to achieve similar results both by using the trace and only by using UEP

Conclusions Key contribution: Our approach can: Shift research focus from mining operational network traces to harnessing information that is already available on the web Our approach can: Predict application and protocol usage trends in arbitrary networks Dramatically outperform classification tools Retain high classification capabilities when dealing with sampled data