An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data Mark Meiss Advanced Network Management Laboratory Indiana University.

Slides:



Advertisements
Similar presentations
U.S. Army Research Laboratory
Advertisements

Internal Assessment Your overall IB mark (the one sent to universities after the IB test) in any IB science course is based upon two kinds of assessments.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Research Directions Mark Crovella Boston University Computer Science.
Bro: A System for Detecting Network Intruders in Real-Time Vern Paxson Lawrence Berkeley National Laboratory,Berkeley, CA A stand-alone system for detecting.
Code-Red : a case study on the spread and victims of an Internet worm David Moore, Colleen Shannon, Jeffery Brown Jonghyun Kim.
Analysis and Modeling of Social Networks Foudalis Ilias.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
A Survey on Tracking Methods for a Wireless Sensor Network Taylor Flagg, Beau Hollis & Francisco J. Garcia-Ascanio.
FLAME: A Flow-level Anomaly Modeling Engine
Walter Willinger AT&T Research Labs Reza Rejaie, Mojtaba Torkjazi, Masoud Valafar University of Oregon Mauro Maggioni Duke University HotMetrics’09, Seattle.
Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
The Universal Laws of Structural Dynamics in Large Graphs Dmitri Krioukov UCSD/CAIDA David Meyer & David Rideout UCSD/Math F. Papadopoulos, M. Kitsak,
On the Self-Similar Nature of Ethernet Traffic - Leland, et. Al Presented by Sumitra Ganesh.
A Framework for Classifying Denial of Service Attacks Alefiya Hussain, John Heidemann and Christos Papadopoulos presented by Nahur Fonseca NRG, June, 22.
Traffic Engineering With Traditional IP Routing Protocols
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Vulnerabilities of Passive Internet Threat Monitors Yoichi Shinoda Japan Advanced Institute of Science and Technology Ko Ikai National Police Agency, Japan.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
NetFlow Analyzer Drilldown to the root-QoS Product Overview.
1 TCP Traffic Analysis in cooperation with Motorola Todd DeSantis and David Loose Advisor: Professor Mark Claypool Co-Advisor: Professor Robert Kinicki.
10/21/20031 Framework For Classifying Denial of Service Attacks Alefiya Hussain, John Heidemann, Christos Papadopoulos Kavita Chada & Viji Avali CSCE 790.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
Basic Concepts of Computer Networks
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
The Erdös-Rényi models
Reading Report 14 Yin Chen 14 Apr 2004 Reference: Internet Service Performance: Data Analysis and Visualization, Cross-Industry Working Team, July, 2000.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Estimating Bandwidth of Mobile Users Sept 2003 Rohit Kapoor CSD, UCLA.
MOJO: A Distributed Physical Layer Anomaly Detection System for WLANs Richard D. Gopaul CSCI 388.
Measurement and Modeling of Packet Loss in the Internet Maya Yajnik.
Comparison of Public End-to-End Bandwidth Estimation tools on High-Speed Links Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido,
Comparison of Public End-to-End Bandwidth Estimation tools on High- Speed Links Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido,
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Scenario: Internet Attack Eunice Huang. What is DDoS? A denial-of-service attack (DoS attack) is an attempt to make a computer resource unavailable to.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Structural Mining of Large-Scale Behavioral Data from the Internet Thesis Defense Mark Meiss April 30, 2010.
Copyright © Allyn & Bacon 2007 Chapter 2 Research Methods This multimedia product and its contents are protected under copyright law. The following are.
Detecting the Long-Range Dependence in the Internet Traffic with Packet Trains Péter Hága, Gábor Vattay Department Of Physics of Complex Systems Eötvös.
Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST)
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.
N. Hu (CMU)L. Li (Bell labs) Z. M. Mao. (U. Michigan) P. Steenkiste (CMU) J. Wang (AT&T) Infocom 2005 Presented By Mohammad Malli PhD student seminar Planete.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Brief Announcement : Measuring Robustness of Superpeer Topologies Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
CS526: Information Security Chris Clifton November 25, 2003 Intrusion Detection.
Explicit Allocation of Best-Effort Service Goal: Allocate different rates to different users during congestion Can charge different prices to different.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
#16 Application Measurement Presentation by Bobin John.
Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.
Unique Access Solutions OAM – an end-user perspective Presented by: Yaakov (J) Stein RAD Data Communications Ltd.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
© 2006 Andreas Haeberlen, MPI-SWS 1 Monarch: A Tool to Emulate Transport Protocol Flows over the Internet at Large Andreas Haeberlen MPI-SWS / Rice University.
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
DDoS Attack Detection under SDN Context
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
Network Science: A Short Introduction i3 Workshop
Presentation transcript:

An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data Mark Meiss Advanced Network Management Laboratory Indiana University

Quick Overview Why this study? Existing work focuses on the effects of sampling on individual flows or distributions of flows. Open question: How are graph structures built from flow data affected?

Quick Overview Building graphs from flow data Basic graph properties Methodology Experiments Results Take-home message: Aggregation matters and is not your enemy.

Background “graph structures derived from network flow data”… ?

Basic network

Degree

Clustering Coefficient

Betweenness

Motifs

Weighted network

Strength

Directed network

Applications Modeling and prediction Anomaly detection Application classification Capacity planning Community identification (etc.)

Motivation So what does packet sampling have to do with this? Isn’t knowing p(sample) = 0.01 good enough?

Motivation

The distributions of degree and strength for large-scale network data generally obey a power law:

Motivation The exact value matters!

Methodology Internet2 / Abilene used as testbed Generate UDP traffic and analyze its traces in Abilene netflow-v5 data

Flow Generation Language (FGL) FGL is a scripting language for quick and easy traffic generation: println("Bias study #4 ( )"); println(); println("This FGL code will generate byte packets to each UDP port"); println("in the range on the hosts "); println(); x = proc(pkt) begin println("Emitting 100 of ", pkt); notate(pkt); emit(pkt, 100, 0.02); delay(0.10); end; port = range(10100, 10199); host = range(start:ip(" "), end:ip(" ")); xip = [ ip_header(src:ip(" "), ]; xudp = [ udp_header(src_port:0, ]; xpacket = size:128, data:"This is a test.") ]; output("bias-study-4.event");

Experiment #1 Note: p(sample) = Generate flows of lengths between 1 and 200 packets; find chance of detection.

Experiment #2 Try to recover a power law, gamma = 2. Send to each of 10 hosts: packet flows packet flows packet flows (etc.)

Experiment #3 Second attempt to recover gamma = 2: Send to each of 10 hosts: packet flows packet flows packet flows (etc.)

Experiment #4 Third attempt to recover gamma = 2: Send to each of 10 hosts: packet flows packet flows packet flows (etc.)

Result A preponderance of very small flows will lead to an overestimate of the exponent. All flows smaller than a critical threshold are statistically indistinguishable.

Result With sufficiently large flow size, a range of exponents can be recovered reliably.

Is this a problem? What if we don’t have sufficiently large flow size?

Aggregation Aggregation is necessary for accurate results! Flows repeat themselves. Coalescing flows with identical endpoints allows us to distinguish smaller flows.

Aggregation Failure to aggregate on the experiments described causes an over-estimate of about 0.2. This can make a large difference for modeling!

Conclusions Given appropriate aggregation, packet sampling does not affect the large-scale properties of graphs derived from flow data. The effectiveness of aggregation in mitigating small-flow effects depends on repeated activity.

Future Work Effects on other properties (clustering, centrality, spectral). Effects on network growth models (preferential attachment, etc.). Effects on traffic models (PageRank, other Markov models).

Thank you! Any questions or observations?