RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.

Slides:



Advertisements
Similar presentations
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Advertisements

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.
Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg Rule-Based Anomaly Detection on IP Flows.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
1 (Un)Trustworthy Wireless: What your wireless traffic says about you… Jeff Pang with Ben Greenstein, Ramki Gummadi, Tadayoshi Kohno, David Wetherall (UW/Intel.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
1 Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network Jian Wu (University of Michigan) Z. Morley Mao (University.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Watchdog Confident Event Detection in Heterogeneous Sensor Networks Matthew Keally 1, Gang Zhou 1, Guoliang Xing 2 1 College of William and Mary, 2 Michigan.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
PBS: Periodic Behavioral Spectrum of P2P Applications Tom Z.J. Fu, Yan Hu, Xingang Shi, Dah Ming Chiu and John C.S. Lui The Chinese University of Hong.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
“On Scalable Attack Detection in the Network” Ramana Rao Kompella, Sumeet Singh, and George Varghese Presented by Nadine Sundquist.
Application Identification in Information-poor Environments Charalampos (Haris) Rotsos Computer Laboratory University of Cambridge
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Passive traffic measurement Capturing actual Internet packets in order to measure: –Packet sizes –Traffic volumes –Application utilisation –Resource utilisation.
An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
A Machine Learning-based Approach for Estimating Available Bandwidth Ling-Jyh Chen 1, Cheng-Fu Chou 2 and Bo-Chun Wang 2 1 Academia Sinica 2 National Taiwan.
Automated malware classification based on network behavior
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
SECURING NETWORKS USING SDN AND MACHINE LEARNING DRAGOS COMANECI –
Network Planète Chadi Barakat
A fast identification method for P2P flow based on nodes connection degree LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI- DONG MA Apperceiving Computing and.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns,
Active Learning for Class Imbalance Problem
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Using Measurement Data to Construct a Network-Wide View Jennifer Rexford AT&T Labs—Research Florham Park, NJ
Sharing Information across Congestion Windows CSE222A Project Presentation March 15, 2005 Apurva Sharma.
DiFMon Distributed Flow Monitor Dario Salvi Consorzio Interuniversitario Nazionale per l’Informatica (CINI) Naples, Italy.
Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.
Measurement and Modeling of Packet Loss in the Internet Maya Yajnik.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
April 4th, 2002George Wai Wong1 Deriving IP Traffic Demands for an ISP Backbone Network Prepared for EECE565 – Data Communications.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Networks and Distributed Systems Mark Stanovich Operating Systems COP 4610.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Early Detection of DDoS Attacks against SDN Controllers
1 Virtual Dark IP for Internet Threat Detection Akihiro Shimoda & Shigeki Goto Waseda University
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
SketchVisor: Robust Network Measurement for Software Packet Processing
On-line Detection of Real Time Multimedia Traffic
Distributed Network Traffic Feature Extraction for a Real-time IDS
Vivaldi: A Decentralized Network Coordinate System
A paper on Join Synopses for Approximate Query Answering
DDoS Attack Detection under SDN Context
Privacy-Preserving Dynamic Learning of Tor Network Traffic
Unconstrained Endpoint Profiling (Googling the Internet)‏
Presentation transcript:

RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao

Internet A plethora of Internet applications  Objectives  Re-provision networks  Detect undesirable behaviors of applications  Prepare network better against major application trends 2) Measure/Monitor 1) Emergence of new applications 3) Characterization

Monitoring applications at an edge  Goal: Monitoring application behavior  Identify number of flows  Identify number of packets  Current Solution: Sampled NetFlow  Supported by most modern routers  Key limitation: Application session structure gets distorted  Small # of flows per application session  Small # of packets per application session Enterprise Network Edge Router Internet Sampled NetFlow

Preserving application structure in flow measurements  Benefit 1: Enables continuous monitoring of applications  Better understanding about communication patterns  Better understanding of characteristics (# of flows, packets)  Benefit 2: Application classification becomes easier  Statistical machine learning techniques: SVM, C4.5, etc.  Social behavior-based classifier: BLINC  Benefit 3: Detecting undesirable traffic patterns of an application

Contributions  Introduce the notion of related sampling  Flows belonging to the same application session are sampled with higher probability  Propose RelSamp architecture for realizing related sampling  Uses three stages of sampling to preserve application structure  Show efficacy in preserving application structure  Captures more number of flows per application session  Significant increase of accuracy in application classification

Related sampling App2 App1App3 Original application structure Sampled NetFlow Related sampling Key idea: Sample more flows from fewer application sessions

Realizing related sampling  Question 1: How to sample an application session ?  Question 2: How to sample packets within an application session ?

Defining application session  A sequence of packets from an application on a given host with inter-arrival time ≤ τ seconds  Packets may belong to different flows to different destinations  Example 1: BitTorrent connections to several destinations within a short span of time constitute an application session  Example 2: Web connections from a browser several seconds apart constitute different application sessions

Sampling an application session  One possible approach: Similar to Sampled NetFlow  Sample packets with some probability  Create an application session record if no record exists  Update the application session record  Problem: Hard to do in an online fashion  No application session identifier (like flow key)  Need to know all flows that constitute an application session  DPI-based techniques are both difficult and incomplete

Our approach: sampling hosts  Observation: Host is a super-set of an application session  Sample more flows from the same host  Flows originating at a same host closely in time typically belong to few application sessions  About 80% hosts run fewer than 2 applications in our study  More details in the paper

RelSamp design  Three-stage sampling process consisting of host, flow, and packet selection stages  Host stage: hash-based sampling  No state maintained on a per-application basis  Many application sessions for a given host are possibly sampled  Change hash function periodically to track different hosts  Flow and packet stages: random packet sampling  Controls fraction of flows sampled in an application session and packets sampled in a flow  Post processing: Can separate flow records into application sessions using port-based/statistical classifiers

RelSamp architecture Host-level bias stage Flow-level bias stage Pkt-level bias stage 1 1 Copy PhPh Selection range H(SrcIP) Hash space P h = selection range / hash space PfPf if ( random no. ≤ P f && no flow record) create a flow record PpPp if ( random no. ≤ P p && flow record) update the flow record 1 Tunable parameters 2 2 Flow Memory

Exploring parametric space  Router sampling budget P e = f(P h, P f, P p )  Trade-off between accuracy of flow statistics and # flows/application session  Parameters can be tuned depending on  Objective  Network environment  Examples of tuning parameters by objective  Application classification: low P h, high P f, low P p  Application characterization: lower P h, high P f, high P p  Flow statistics of all flows: P h = P f = P p = P e

Evaluation goals  Application characterization  Question 1: Is RelSamp effective for sampling more # of flows in an application session?  Question 2: Can RelSamp estimate statistics of an application session?  Application classification  Questions 3: Is sampling more # flows in an application session beneficial for application classification?

Experimental setup

Flows per application session #captured flows/#total flows in an app session CDF More # of flows per app session

Accuracy of BLINC classifier Sampling rate Accuracy (%) Note: classification results on flows using non-standard port ~ 50% increase

Related work  Flow Sampling [ToN ’06]  Samples flows once flow record is created  Flow Slices [IMC ’05]  Focuses on controlling router resources (CPU and memory)  cSamp [NSDI ’08]  Supports sampling of all traffic by coordinating various vantage points in a network  FlexSample [IMC ’08]  Support monitoring of traffic subpopulations, but needs to maintain extra states for approximate checking of predicates

Summary  Introduced the notion of related sampling  Samples more number of related flows in the same application session with higher probability  Proposed RelSamp architecture  Preserve application structure in sampled flow records  Effective to preserving application session structure  5-10x more flows per application session compared to Sampled NetFlow  Up to 50% higher classification accuracy than Sampled NetFlow

Thank you! Questions?

Evaluation method of classification techniques DPI-based Classifier RelSamp Sampled NetFlow Flow Sampling Ground Truth Flow Record1 Flow Record2 Flow Record3 Classification Algorithm (e.g., BLINC, SVM, C4.5) Packet Trace Report Tstat

Comparison with other solutions using BLINC Sampling rate # of accurately classified flows Note: classification results on flows using non-standard port