Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.

Slides:



Advertisements
Similar presentations
Abstract There is significant need to improve existing techniques for clustering multivariate network traffic flow record and quickly infer underlying.
Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Bayesian Belief Propagation
An Introduction of Botnet Detection – Part 2 Guofei Gu, Wenke Lee (Georiga Tech)
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
Decision Trees for Server Flow Authentication James P. Early and Carla E. Brodley Purdue University West Lafayette, IN 47907
Anomaly Detection Steven M. Bellovin Matsuzaki ‘maz’ Yoshinobu 1.
Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg Rule-Based Anomaly Detection on IP Flows.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Intrusion Detection Systems and Practices
Application Identification in Information-poor Environments Charalampos (Haris) Rotsos Computer Laboratory University of Cambridge
School of Computer Science and Information Systems
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Licentiate Seminar: On Measurement and Analysis of Internet Backbone Traffic Wolfgang John Department of Computer Science and Engineering Chalmers University.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
Automated malware classification based on network behavior
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Network Planète Chadi Barakat
A fast identification method for P2P flow based on nodes connection degree LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI- DONG MA Apperceiving Computing and.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
ICS-FORTH WISDOM Workpackage 3: New security algorithm design FORTH-ICS The next six months Cork, 29 January 2007.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
IIT Indore © Neminah Hubballi
Intrusion Prevention System. Module Objectives By the end of this module, participants will be able to: Use the FortiGate Intrusion Prevention System.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Network Management System The Concept –From a central computer, network administrator can manage entire network Collect data Give commands –Moving gradually.
ARO–MURI Thoughts on Visualization for Cyber Situation Awareness MURI Meeting July 8–9, 2015 Christopher G. Healey Lihua Hao Steve E. Hutchinson CS Department,
Networking Functions of windows NT Sever
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
CSCI 530 Lab Intrusion Detection Systems IDS. A collection of techniques and methodologies used to monitor suspicious activities both at the network and.
Linux Networking and Security
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
Second Line Intrusion Detection Using Personalization DISA Sponsored GWU-CS.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN.
Workpackage 3 New security algorithm design ICS-FORTH Ipswich 19 th December 2007.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
ICS-FORTH WISDOM Workpackage 3: New security algorithm design FORTH-ICS Update and plans for the next six months Heraklion, 4 th June 2007.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
Design Lines for a Long Term Competitive IDS Erwan Lemonnier KTH-IT / Defcom.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
J. Liebeher (modified by M. Veeraraghavan) 1 Introduction Complexity of networking: An example Layered communications The TCP/IP protocol suite.
Network Traffic Monitoring and Analysis - Shisheer Teli CCCF.
Unit 2 Personal Cyber Security and Social Engineering Part 2.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Snort – IDS / IPS.
Security Virtualization
The Devil and Packet Trace Anonymization
Lightweight Application Classification for Network Management
Assessing Targeted Attacks in Incident Response Threat Correlation
Monitoring Network Bias
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
Internet Traffic Classification Using Bayesian Analysis Techniques
Presentation transcript:

Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future plans Open questions

02/02/20102 Why? E.g., CoS, security, perfomance-analysis

Taxonomy of Application identification techniques Deep Packet Inspection Match payload with well know protocol signatures Statistical Analysis Extract network measurement ( packet size, pack interarrival time ) and search for patterns (ML, statistical analysis etc.) Behavioral/Graph Analysis Find connection pattern Create features based on the connection graph 02/02/20103

Statistical Analysis Focused on flow-features Which features are high-quality? Which features are computationally-simple? 02/02/20104 ??? Packet-size Inter-packet-rate TCP header information Flow duration

Progress so far The problem is solved – 5 packets sufficient to classify a flow – Achieve at least 90% accuracy on all classes But not really…. – Difficult to extract required features – Identification accuracy – Temporal stability is aweful – Technical issues: 02/02/20105 Long running connections are difficult to label What about new applications? What about simplex flows? Mesauring is hard Labelling traffic is EVEN harder. Hard to keep fast and lightweight and uptodate and ….. Things you need if this is part of your IDS

Can we do better? Restate the problem. Use information that can be extracted from current networks (a.k.a. SNMP, NetFlow). Use better machine learning. Define models that bridge the gap between statistical and behavioral properties. 02/02/20106

Better ML on NetFlow Semi-supervised learning on NetFlow data using Bayesian data analysis. Better performance than Bayes classifier in Weka Bayesian modeling provides good parameterization Efficient reduction of the effect of time dependence of the feature set. Temporal and Spatial decay Difficult to balance between a model both accurate and flexible NetFlow doesn’t provide clean separation of classes 02/02/20107

What is next? Richer dataset – Aggregate flows for ports/hosts/networks – Increase dimensions by simple feature engineering. Better mathematical models – Incorporate domain-specific knowledge. – Connection graph defined inference diagram. 02/02/20108

Inference Diagram 02/02/20109 AliceWeb Server Bob The flows between Alice - web server are correlated and respond to the same application. The flow of Alice - web server and Bob - web server also correspond to the same application. Research on application identification hasn’t found a framework to accommodate these observations. Web- browser

Inference Diagram – more difficult 02/02/ Alice Use random ports Bob Use random ports Ftp Server – port 22 Web Server – port 80 Database Server – port 1680 Computers will run multiple application in parallel. BUT, applications on a particular server will always use a specific port.

A first approach! Similar problem can be found in the case of node labeling – Aggregate flow records over some defined period – Use Markov Random Fields model for inference propagations – Apply approximate inference methods (Gibbs sampling, Message Passing) – In the end, apply some engineering ideas to refine results 02/02/201011

Open problems Is the model a good approximation? What am I classifying and for how long? Ports, Hosts or Networks? Is it possible to do multi-layer analysis? Are the approximation techniques converging? Turning the difficulty to “Eleven”… Compute the performance of an individual traffic within a VPN… by monitoring alone. 02/02/ Thank you!!!!

Results 02/02/201013

Graphical Model 02/02/201014