Application Identification in Information-poor Environments Charalampos (Haris) Rotsos Computer Laboratory University of Cambridge

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Monitoring the Network Connecting Networks.
Code-Red : a case study on the spread and victims of an Internet worm David Moore, Colleen Shannon, Jeffery Brown Jonghyun Kim.
CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Implementing IP Addressing Services Accessing the WAN – Chapter 7.
Chapter 8 Managing Windows Server 2008 Network Services
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Subnetting.
Big Data Analytics and Challenge Presented by Saurabh Rastogi Asst. Prof. in Maharaja Agrasen Institute of Technology B.Tech(IT), M.Tech(IT)
1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.
Passive traffic measurement Capturing actual Internet packets in order to measure: –Packet sizes –Traffic volumes –Application utilisation –Resource utilisation.
Lecture Week 7 Implementing IP Addressing Services.
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
INTRUSION DETECTION SYSTEMS Tristan Walters Rayce West.
Remote Monitoring and Desktop Management Week-7. SNMP designed for management of a limited range of devices and a limited range of functions Monitoring.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
NetComm Wireless VPN Functionality Feature Spotlight.
DYNAMIC HOST CONFIGURATION PROTOCOL (DHCP) BY: SAMHITA KAW IS 373.
HTML Comprehensive Concepts and Techniques Intro Project Introduction to HTML.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
China Science & Technology Network Computer Emergency Response Team Botnet Detection and Network Security Alert Tao JING CSTCERT,CNIC.
Layering and the TCP/IP protocol Suite  The TCP/IP Protocol only contains 5 Layers in its networking Model  The Layers Are 1.Physical -> 1 in OSI 2.Network.
A fast identification method for P2P flow based on nodes connection degree LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI- DONG MA Apperceiving Computing and.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Troubleshooting Your Network Networking for Home and Small Businesses.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
 Collection of connected programs communicating with similar programs to perform tasks  Legal  IRC bots to moderate/administer channels  Origin of.
Chapter Overview Network Communications.
Implementing IP Addressing Services Accessing the WAN – Chapter 7.
Networking Functions of windows NT Sever
1 Chapter 12: VPN Connectivity in Remote Access Designs Designs That Include VPN Remote Access Essential VPN Remote Access Design Concepts Data Protection.
 An Internet Protocol address (IP address) is a numerical label assigned to each device (e.g., computer, printer) participating in a computer network.
C3 confidentiality classificationIntegrated M2M Terminals Introduction Vodafone MachineLink 3G v1.0 1 Vodafone MachineLink 3G VPN functionality Feature.
Network and Perimeter Security Paula Kiernan Senior Consultant Ward Solutions.
Management for IP-based Applications Mike Fisher BTexaCT Research
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Vytautas Valancius, Nick Feamster, Akihiro Nakao, and Jennifer Rexford.
1 Chapter 3: Multiprotocol Network Design Designs That Include Multiple Protocols IPX Design Concepts AppleTalk Design Concepts SNA Design Concepts.
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN.
Prepared by: Azara Prakash L.. Contents:-  Data Transmission  Introduction  Socket Description  Data Flow Diagram  Module Design Specification.
Module 1: Configuring Routing by Using Routing and Remote Access.
Chapter 13 The Internet.
Module 10: Providing Secure Access to Remote Offices.
Allocating IP Addressing by Using Dynamic Host Configuration Protocol.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—3-1 Route Selection Using Policy Controls Using Multihomed BGP Networks.
Also known as hardware/physi cal address Customer Computer (Client) Internet Service Provider (ISP) MAC Address Each Computer has: Given by NIC card.
11 MAINTAINING A NETWORK INFRASTRUCTURE Chapter 9.
Unit 7: DHCP, APIPA and NTP. Static versus dynamic IP addressing Dynamic IP addresses can change each time you connect to the Internet, while static IP.
Chapters 4 & 5 Addressing Will go over Exam 1
Lec 2: Protocols.
Working at a Small-to-Medium Business or ISP – Chapter 7
Working at a Small-to-Medium Business or ISP – Chapter 7
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Working at a Small-to-Medium Business or ISP – Chapter 7
Chapter 8: Monitoring the Network
Chapter 4: Protecting the Organization
Allocating IP Addressing by Using Dynamic Host Configuration Protocol
Chapters 4 & 5 Addressing Will go over Exam 1
Chapters 4 & 5 Addressing Will go over Exam 1
Client/Server and Peer to Peer
DHCP: Dynamic Host Configuration Protocol
Computer Networks Protocols
Chapters 4 & 5 Addressing Will go over Exam 1
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Presentation transcript:

Application Identification in Information-poor Environments Charalampos (Haris) Rotsos Computer Laboratory University of Cambridge

Overview Application Identification allows: New Services (QoS/QoE) Administration (SLA) Understanding But it is difficult in a large network because: VPN / Multihoming where can I monitor your data? Data (2+TB/day/University) Sophisticated Users and Complex Networks Encrypted Applications & Overlay Networks Internet

What is the problem How can I Identify the application class from a flow of packets? Can I do this with sampled and summarised flow records(Netflow)? – Available in most routers – ISPs collect this as standard and often have been for many years – 25Gb per day for a 1 st layer ISP (x000’s of routers)

Current technologies Can we fuse these different approaches to achieve better performance by reducing the effect of the disadvantages and keeping the advantages? IDS / Anomaly detection fast depends on protocol implementations specifications, task specific Deep packet inspectionaccurate results Full payload, fails on encryption & protocol changes Statistical analysis Flow granularity, can run online on fast link Requires diverse ground truth data for training Connection Pattern / BLINC low information requirement host granularity, fails to adjust on small protocol changes, complex design

First Approach Using ground truth flow records and machine learning discover patterns from : Flow statistics Connection Pattern Host behavior (roles) G.T. DATA FLOW RECORDS

First Problems Netflow records have 20 fields. Some of them have no value for the identification. Flow records are unclear about client - server role and simplex Hints: Extract more information from the context of the network. Infer extra fields by analyzing ground truth data. What extra statistics can make a difference?

Time and space variance Space and time problems An example of temporal decay in accuracy A model with 92% accuracy decays to 62-81% accuracy 18 months later A naïve example of spatial decay A model with near 100% accuracy for one site might achieve % Long-term fragility comes from changes in IP addresses coding as AS numbers and subnets help a little (but not much)

More Issues Netflow data tend to be able to describe the situation for short time While many servers are stabile for long periods, the heavy-tail is not... (p2p, keyloggers, botnets). Solutions: Mix in prior knowledge; diverse datasets Capture behavior with better Mach.-Learn. Semi supervised learning to automatic- update

Behavioural models What is important for a behavioural model? Can we describe it in a compact way? Difficult to build automatically

Summary NetFlow (flow summary) records are a rich source of data, fused with other network data we can build a useful Application Identification System Machine-learning works – at least in the short-term Stabile/useful models need continuous update Behavioural model hold promise too… THANK YOU

More Issues Capitalizing on the short-term/high-accuracy of NetFlow type data Servers can be alive for (very) long periods of time, but not for the applications you WANT to detect (p2p, evil) semi-supervised learning can boost knowledge Behavioral Models Challenge to build automatically