Lightweight Application Classification for Network Management

Slides:



Advertisements
Similar presentations
Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams Jun Xu, Zhenglin Yu (Georgia Tech) Jia Wang, Zihui Ge, He Yan (AT&T.
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
3/2/2001Hanoch Levy, CS, TAU1 What Quality of Service is About Hanoch Levy Feb 2003.
Assessing the Nature of Internet traffic: Methods and Pitfalls Wolfgang John Chalmers University of Technology, Sweden together with Min Zhang Beijing.
Licentiate Seminar: On Measurement and Analysis of Internet Backbone Traffic Wolfgang John Department of Computer Science and Engineering Chalmers University.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.
Sven Ubik, CESNET TNC2004, Rhodos, 9 June 2004 Performance monitoring of high-speed networks from NREN perspective.
Ensuring the Reliability of Data Delivery © 2004 Cisco Systems, Inc. All rights reserved. Understanding How UDP and TCP Work INTRO v2.0—6-1.
Automated malware classification based on network behavior
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Traffic Classification through Simple Statistical Fingerprinting M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review,
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
Copyright © 2002 OSI Software, Inc. All rights reserved. PI-NetFlow and PacketCapture Eric Tam, OSIsoft.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
Energy Consumption in Mobile Phones: A Measurement Study and Implications for Network Applications REF:Balasubramanian, Niranjan, Aruna Balasubramanian,
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Firewall Fingerprinting Amir R. Khakpour 1, Joshua W. Hulst 1, Zhihui Ge 2, Alex X. Liu 1, Dan Pei 2, Jia Wang 2 1 Michigan State University 2 AT&T Labs.
Packet Classifiers In Ternary CAMs Can Be Smaller Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison) Jia Wang.
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
User Fingerprinting Jeffrey Pang 1 Ben Greenstein 2 Ramakrishna Gummadi 3 Srinivasan Seshan 1 David Wetherall 2,4 Presenter: Nan Jiang Most Slides:
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Fuzzy Control of Sampling Interval for Measurement of QoS Parameters Juraj Giertl.
© 2007 – 2010, Cisco Systems, Inc. All rights reserved. Cisco Public Course v6 Chapter # 1 Chapter 2: Troubleshooting Processes for Complex Enterprise.
NTU & MSRA Ming-Feng Tsai
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Internet Measurement and Analysis Vinay Ribeiro Shriram Sarvotham Rolf Riedi Richard Baraniuk Rice University.
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
1 Netflow Collection and Aggregation in the AT&T Common Backbone Carsten Lund.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 OSI network layer CCNA Exploration Semester 1 – Chapter 5.
أمن المعلومات لـ أ. عبدالرحمن محجوب حمد mtc.edu.sd أمن المعلومات Information Security أمن المعلومات Information Security  أ. عبدالرحمن محجوب  Lec (5)
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Chapter 5 Network and Transport Layers
Queensland University of Technology
CEE 6410 Water Resources Systems Analysis
An IP-based multimedia traffic generator
Panagiotis Demestichas
Data Collection Methods
Data Streaming in Computer Networking
Network and Services Management
Impact of Packet Sampling on Anomaly Detection Metrics
De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.
Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy.
Data Warehousing and Data Mining
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Classification & Prediction
Firewalls Jiang Long Spring 2002.
Classification and Prediction
Title of Your Paper Names of Co-Authors
REVISITING DEFENSES AGAINST LARGE SCALE ONLINE PASSWORD GUESSING ATTACKS Mansour Alsaleh,Mohammad Mannan and P.C van Oorschot.
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Internet Traffic Classification Using Bayesian Analysis Techniques
A flow aware packet sampling mechanism for high speed links
Using Link Information to Enhance Web Page Classification
Unconstrained Endpoint Profiling (Googling the Internet)‏
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Lightweight Application Classification for Network Management Hongbo Jiang Case Western Reserve University Andrew W. Moore University of Cambridge Zihui Ge Adverplex Inc. Shudng Jin Case Western Reserve University Jia Wang AT&T Labs - Research ACM SIGCOMM Workshop on Internet Network Management (INM) Kyoto, Japan, August 31, 2007 Lightweight Application Classification for Network Management

Why do Network Traffic Classification? Network planning Traffic engineering Accounting and billing Security profiling … Lightweight Application Classification for Network Management

Our Contribution A lightweight application classification scheme based on NetFlow data Evaluation & Sensitivity Analysis Trivial features Derivative features Training-set size Packet sampling Lightweight Application Classification for Network Management

Flow-level Traffic Classification Previous traffic classification use features derived from streams of packets Can achieve good accuracy (e.g., 95%) Have high complexity and cost Commonly available flow-level statistics (Cisco NetFlow, Juniper cflowd, Huawei NetStream,…) Sampling further reduces the cost Lightweight Application Classification for Network Management

Probabilistic Method Example Training Set In Training Probability box Class of membership Object Characteristics Prior Pr = .15 Pr = .33 In Use Probability box Probability of membership (estimate of membership) Prior Object Characteristics ? Pr = .97 Lightweight Application Classification for Network Management

Our Approach (cont.) Features ranked by importance Use Symmetric Uncertainty (based on entropy) (See paper and references therein for details.) Ranked features allows for a sensitivity analysis, and the removal of irrelevant and redundant features. Lightweight Application Classification for Network Management

Evaluation Dataset (not from AT&T!) Netflow Generation Full-duplex 1Gbps access-link; 1000 researchers Data was hand-classified into a number of application classes: e.g. web-browsing, email, FTP, attack, P2P, … Focused on TCP/IP flows only 800,000 simplex TCP/IP application-level flows (97% of traffic by byte-volume) Netflow Generation Software simulation of Cisco NetFlow v5 engine Independent training and test sets Flows randomly assigned to each Lightweight Application Classification for Network Management

Baseline and Derivative Features Category Baseline Derivative + Baseline Application + Baseline + Derivative Features srcIP/dstIP srcPort/dstPort ToS sTime/eTime tcpFlag bytes packets Duration pktSize byteRate pktRate tcpFxxx (syn/ack/fin/rst/psh/urg) Low port High port Accuracy 88.3% 89.1% 91.4% Comparison: Port based: 50-70%, Packet based: 95% Lightweight Application Classification for Network Management

Highly Relevant Features Refers to specific privileged services and protocols Differentiate Email and FTP from Web-browsing Compact features Lightweight Application Classification for Network Management

Reducing Feature Complexity Runtime: 600x (s) Runtime: 1x (s) Accuracy remains high even after removing irrelevant and redundant features. Lightweight Application Classification for Network Management

Reducing Training Set Size More features may lead-to noise (insufficiently representative) Lightweight Application Classification for Network Management

Impact of Packet Sampling NetFlow characteristic: Observed flow-count will decrease as sampling rate decreases Packet sampling has little impact on accuracy Lightweight Application Classification for Network Management

Conclusion & Future Works Application Classification can be done with Flow-level (NetFlow) information Trivially-derived features improve accuracy Packet sampling have minimal impact Future works NetFlow v9?? Other M-L methods? Lightweight Application Classification for Network Management

Thanks Lightweight Application Classification for Network Management