Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Taming User-Generated Content in Mobile Networks via Drop Zones Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Measuring Serendipity: Connecting People, Locations and Interests in a Mobile 3G Network Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Monitoring the Network Connecting Networks.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 44 How Firewalls Work How Firewalls Work.
G-RCA: A Generic Root Cause Analysis Platform for Service Quality Management in Large IP Networks He Yan, Lee Breslau, Zihui Ge, Dan Massey, Dan Pei, Jennifer.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Chapter 12: Web Usage Mining - An introduction
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Multi-level Application-based Traffic Characterization in a Large-scale Wireless Network Maria Papadopouli 1,2 Joint Research with Thomas Karagianis 3.
CSCI 4550/8556 Computer Networks Comer, Chapter 3: Network Programming and Applications.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Countering Large-Scale Internet Pollution and Poisoning Aleksandar Kuzmanovic Northwestern University
NetFlow Analyzer Drilldown to the root-QoS Product Overview.
Licentiate Seminar: On Measurement and Analysis of Internet Backbone Traffic Wolfgang John Department of Computer Science and Engineering Chalmers University.
Internet Traffic Analysis for Threat Detection Joshua Thomas, CISSP Thomas Conley, CISSP Ohio University Communication Network Services Joshua Thomas,
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
Design and Implementation of SIP-aware DDoS Attack Detection System.
TUNDRA The Ultimate Netflow Data Realtime Analysis Jeffrey Papen Yahoo! Inc.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
Copyright © 2002 OSI Software, Inc. All rights reserved. PI-NetFlow and PacketCapture Eric Tam, OSIsoft.
What is FORENSICS? Why do we need Network Forensics?
P.1Service Control Technologies for Peer-to-peer Traffic in Next Generation Networks Part2: An Approach of Passive Peer based Caching to Mitigate P2P Inter-domain.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Oppenheimer.
Honeypot and Intrusion Detection System
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Linux Networking and Security
Copyright 2004 Sheng Bai1 CommView Report for By Sheng Bai.
Chapter 5: Implementing Intrusion Prevention
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Content-oriented Networking Platform: A Focus on DDoS Countermeasure ( In incremental deployment perspective) Authors: Junho Suh, Hoon-gyu Choi, Wonjun.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Studying Spamming Botnets Using Botlab
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010.
1 Virtual Dark IP for Internet Threat Detection Akihiro Shimoda & Shigeki Goto Waseda University
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Large-Scale Monitoring of DHT Traffic Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel.
Googling the Internet (and Beyond) Aleksandar Kuzmanovic EECS Department Northwestern University
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
18-1 PRENTICE HALL ©2008 Pearson Education, Inc. Upper Saddle River, NJ FORENSIC SCIENCE An Introduction By Richard Saferstein.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Some Great Open Source Intrusion Detection Systems (IDSs)
DNS-sly: Avoiding Censorship through Network Complexity
Data Streaming in Computer Networking
Lightweight Application Classification for Network Management
Web Mining Ref:
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Monitoring Network Bias
Web Traffic Analysis Script PHP Web Traffic Analysis Script PHP Web Traffic Analysis Software.
Read this to find out how the internet works!
Transport Layer Identification of P2P Traffic
The Internet and Electronic mail
Unconstrained Endpoint Profiling (Googling the Internet)‏
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Presentation transcript:

Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University Narus Inc.

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 3 Introduction Can we use Google for networking research? Can we systematically exploit search engines to harvest endpoint information available on the Internet? Huge amount of endpoint information available on the web

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 4 Websites run logging software and display statistics Some popular proxy services also display logs Popular servers (e.g., gaming) IP addresses are listed Blacklists, banlists, spamlists also have web interfaces Even P2P information is available on the Internet since the first point of contact with a P2P swarm is a publicly available IP address Where Does the Information Come From? Servers Clients P2P Malicious

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) Problem – detecting active IP ranges Needed for large-scale measurements –e.g., Iplane [OSDI’06] Current approaches –Active –Passive Our approach –IP addresses that generate hits on Google could suggest active ranges 5 Inferring Active IP Ranges in Target Networks

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 6 Can we infer what applications people are using across the world without having access to network traces? Can we infer what applications people are using across the world without having access to network traces? Detecting Application Usage Trends

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 7 Traffic Classification Problem – traffic classification Current approaches (port-based, payload signatures, numerical and statistical etc.) Our approach –Use information about destination IP addresses available on the Internet

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) URL Hit text URL Hit text URL Hit text …. Rapid Match Domain name Keywords Domain name Keywords …. IP tagging IP Address xxx.xxx.xxx.xxx Website cache Search hits – QQ Chat Server Methodology – Web Classifier and IP Tagging

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 9 China ● Packet level traces available Brasil ● Packet level traces available United States ● Sampled NetFlow available France ● No traces available Evaluation – Ground Truth

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 10 Inferring Active IP Ranges in Target Networks

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 11 Google hits XXX /17 network range Overlap is around 77% XXX /17 network range Overlap is around 77% Inferring Active IP Ranges in Target Networks Actual endpoints from trace

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 12 Application Usage Trends

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 13 Correlation Between Network Traces and UEP

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 14 Traffic Classification

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) Is this scalable? Tagged IP Cache Traffic Classification Mail server Website Router Halo server Hold a small % of the IP addresses seen Look at source and destination IP addresses and classify traffic

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 16 5% of the destinations sink 95% of traffic Traffic Classification

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) UEP (SIGCOMM 2008) - Uses information available on the web - Constructs a semantically rich endpoint database - Very flexible (can be used in a variety of scenarios) 17 BLINC vs. UEP BLINC (SIGCOMM 2005) - Works “in the dark” (doesn’t examine payload) - Uses “graphlets” to identify traffic patterns - Uses thresholds to further classify traffic

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) Traffic[%] BLINC doesn’t find some categories UEP also provides better semantics Classes can be further divided into different services UEP also provides better semantics Classes can be further divided into different services UEP classifies twice as much traffic as BLINC 18 BLINC vs. UEP (cont.)

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 19 Each packet has a 1/Sampling rate chance of being kept (Cisco Netflow) Sampled data is considered to be poorer in information However ISPs consider scalable to gather only sampled data Working with Sampled Traffic XX XX XX XX

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) A quarter of the IP addresses still in the trace at sampling rate 100 Most of the popular IP addresses still in the trace 20 Working with Sampled Traffic

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) When no sampling is done UEP outperforms BLINC UEP maintains a large classification ratio even at higher sampling rates BLINC stays in the dark 2% at sampling rate 100 UEP retains high classification capabilities with sampled traffic UEP retains high classification capabilities with sampled traffic 21 Working with Sampled Traffic

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) Performed clustering of endpoints in order to cluster out common behavior Please see the paper for detailed results 22 Endpoint Clustering Real strength: We managed to achieve similar results both by using the trace and only by using UEP Real strength: We managed to achieve similar results both by using the trace and only by using UEP

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet) 23 Key contribution: –Shift research focus from mining operational network traces to harnessing information that is already available on the web Our approach can: –Predict application and protocol usage trends in arbitrary networks –Dramatically outperform classification tools –Retain high classification capabilities when dealing with sampled data Conclusions