An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.

Slides:



Advertisements
Similar presentations
Intrusion Detection Systems (I) CS 6262 Fall 02. Definitions Intrusion Intrusion A set of actions aimed to compromise the security goals, namely A set.
Advertisements

Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute.
Network and Application Attacks Contributed by- Chandra Prakash Suryawanshi CISSP, CEH, SANS-GSEC, CISA, ISO 27001LI, BS 25999LA, ERM (ISB) June 2006.
Computer Networks20-1 Chapter 20. Network Layer: Internet Protocol 20.1 Internetworking 20.2 IPv IPv6.
COEN 252 Computer Forensics Using TCPDump / Windump for package analysis.
1 Topic 2 – Lesson 4 Packet Filtering Part I. 2 Basic Questions What is packet filtering? What is packet filtering? What elements are inside an IP header?
Network Traffic Anomaly Detection Based on Packet Bytes Matthew V. Mahoney Florida Institute of Technology
1 Reading Log Files. 2 Segment Format
Firewalls and Intrusion Detection Systems
Snort - an network intrusion prevention and detection system Student: Yue Jiang Professor: Dr. Bojan Cukic CS665 class presentation.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Intruder Trends Tom Longstaff CERT Coordination Center Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Sponsored by.
Beyond the perimeter: the need for early detection of Denial of Service Attacks John Haggerty,Qi Shi,Madjid Merabti Presented by Abhijit Pandey.
FIREWALLS & NETWORK SECURITY with Intrusion Detection and VPNs, 2 nd ed. 6 Packet Filtering By Whitman, Mattord, & Austin© 2008 Course Technology.
Bro: A System for Detecting Network Intruders in Real-Time Presented by Zachary Schneirov CS Professor Yan Chen.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
USENIX LISA ‘99 Conference © Copyright 1999, Martin Roesch Snort - Lightweight Intrusion Detection for Networks Martin Roesch.
A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic A Dissertation by Matthew V. Mahoney Major Advisor: Philip.
Intrusion Protection Mark Shtern. Protection systems Firewalls Intrusion detection and protection systems Honeypots System Auditing.
ITIS 6167/8167: Network Security Weichao Wang. 2 Contents ICMP protocol and attacks UDP protocol and attacks TCP protocol and attacks.
Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.
FIREWALL Mạng máy tính nâng cao-V1.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
January 2009Prof. Reuven Aviv: Firewalls1 Firewalls.
Chapter 6: Packet Filtering
ECE4112 Lab 7: Honeypots and Network Monitoring and Forensics Group 13 + Group 14 Allen Brewer Jiayue (Simon) Chen Daniel Chu Chinmay Patel.
TCP/IP Essentials A Lab-Based Approach Shivendra Panwar, Shiwen Mao Jeong-dong Ryoo, and Yihan Li Chapter 5 UDP and Its Applications.
OV Copyright © 2013 Logical Operations, Inc. All rights reserved. Network Security  Network Perimeter Security  Intrusion Detection and Prevention.
Intrusion Detection and Prevention. Objectives ● Purpose of IDS's ● Function of IDS's in a secure network design ● Install and use an IDS ● Customize.
1 Firewalls Types of Firewalls Inspection Methods  Static Packet Inspection  Stateful Packet Inspection  NAT  Application Firewalls Firewall Architecture.
An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matthew V. Mahoney and Philip K. Chan.
OV Copyright © 2011 Element K Content LLC. All rights reserved. Network Security  Network Perimeter Security  Intrusion Detection and Prevention.
CSCI 530 Lab Intrusion Detection Systems IDS. A collection of techniques and methodologies used to monitor suspicious activities both at the network and.
Transmission Control Protocol TCP. Transport layer function.
Packet Filtering Chapter 4. Learning Objectives Understand packets and packet filtering Understand approaches to packet filtering Set specific filtering.
MonNet – a project for network and traffic monitoring Detection of malicious Traffic on Backbone Links via Packet Header Analysis Wolfgang John and Tomas.
1 Firewalls G53ACC Chris Greenhalgh. 2 Contents l Attacks l Principles l Simple filters l Full firewall l Books: Comer ch
TCP/IP Protocols Contains Five Layers
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
Linux Networking and Security
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
Firewalls  Firewall sits between the corporate network and the Internet Prevents unauthorized access from the InternetPrevents unauthorized access from.
Network Security. 2 SECURITY REQUIREMENTS Privacy (Confidentiality) Data only be accessible by authorized parties Authenticity A host or service be able.
1 Figure 4-1: Targeted System Penetration (Break-In Attacks) Host Scanning  Ping often is blocked by firewalls  Send TCP SYN/ACK to generate RST segments.
Verify that timestamps for debugging and logging messages has been enabled. Verify the severity level of events that are being captured. Verify that the.
Internet Protocol Formats. IP (V4) Packet byte 0 byte1 byte 2 byte 3 data... – up to 65 K including heading info Version IHL Serv. Type Total Length Identifcation.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Advanced Packet Analysis and Troubleshooting Using Wireshark 23AF
Chapter 8 Network Security Thanks and enjoy! JFK/KWR All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking:
1 Figure 3-13: Internet Protocol (IP) IP Addresses and Security  IP address spoofing: Sending a message with a false IP address (Figure 3-17)  Gives.
Machine Learning for Network Anomaly Detection Matt Mahoney.
Firewalls. Overview of Firewalls As the name implies, a firewall acts to provide secured access between two networks A firewall may be implemented as.
25/09/ Firewall, IDS & IPS basics. Summary Firewalls Intrusion detection system Intrusion prevention system.
Snort – IDS / IPS.
A quick intro to networking
IT443 – Network Security Administration Instructor: Bo Sheng
Internet Protocol Formats
Firewall – Survey Purpose of a Firewall Characteristic of a firewall
Wireshark Lab#3.
Principles of Computer Security
6.6 Firewalls Packet Filter (=filtering router)
Intrusion Detection Systems (IDS)
Internet Protocol Formats
46 to 1500 bytes TYPE CODE CHECKSUM IDENTIFIER SEQUENCE NUMBER OPTIONAL DATA ICMP Echo message.
Statistical based IDS background introduction
Session 20 INST 346 Technologies, Infrastructure and Architecture
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Presentation transcript:

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003

Is the DARPA/Lincoln Labs IDS Evaluation Realistic? The most widely used intrusion detection evaluation data set data used in KDD cup competition with 25 participants. 8 participating organizations submitted 18 systems to the 1999 evaluation. Tests host or network based IDS. Tests signature or anomaly detection. 58 types of attacks (more than any other evaluation) 4 target operating systems. Training and test data released after evaluation to encourage IDS development.

Problems with the LL Evaluation Background network data is synthetic. SAD (Simple Anomaly Detector) detects too many attacks. Comparison with real traffic – range of attribute values is too small and static (TTL, TCP options, client addresses…). Injecting real traffic removes suspect detections from PHAD, ALAD LERAD, NETAD, and SPADE.

1. Simple Anomaly Detector (SAD) Examines only inbound client TCP SYN packets. Examines only one byte of the packet. Trains on attack-free data (week 1 or 3). A value never seen in training is an anomaly. If there have been no anomalies for 60 seconds, then output an alarm with score 1. Train: Test: sec.

DARPA/Lincoln Labs Evaluation Weeks 1 and 3: attack free training data. Week 2: training data with 43 labeled attacks. Weeks 4 and 5: 201 test attacks. SunOSSolarisLinuxNT Router Internet Sniffer Attacks

SAD Evaluation Develop on weeks 1-2 (available in advance of 1999 evaluation) to find good bytes. Train on week 3 (no attacks). Test on weeks 4-5 inside sniffer (177 visible attacks). Count detections and false alarms using 1999 evaluation criteria.

SAD Results Variants (bytes) that do well: source IP address (any of 4 bytes), TTL, TCP options, IP packet size, TCP header size, TCP window size, source and destination ports. Variants that do well on weeks 1-2 (available in advance) usually do well on weeks 3-5 (evaluation). Very low false alarm rates. Most detections are not credible.

SAD vs Evaluation The top system in the 1999 evaluation, Expert 1, detects 85 of 169 visible attacks (50%) at 100 false alarms (10 per day) using a combination of host and network based signature and anomaly detection. SAD detects 79 of 177 visible attacks (45%) with 43 false alarms using the third byte of the source IP address.

1999 IDS Evaluation vs. SAD

SAD Detections by Source Address (that should have been missed) DOS on public services: apache2, back, crashiis, ls_domain, neptune, warezclient, warezmaster R2L on public services: guessftp, ncftp, netbus, netcat, phf, ppmacro, sendmail U2R: anypw, eject, ffbconfig, perl, sechole, sqlattack, xterm, yaga

2. Comparison with Real Traffic Anomaly detection systems flag rare events (e.g. previously unseen addresses or ports). “Allowed” values are learned during training on attack-free traffic. Novel values in background traffic would cause false alarms. Are novel values more common in real traffic?

Measuring the Rate of Novel Values r = Number of values observed in training. r 1 = Fraction of values seen exactly once (Good- Turing probability estimate that next value will be novel). r h = Fraction of values seen only in second half of training. r t = Fraction of training time to observe half of all values. Larger values in real data would suggest a higher false alarm rate.

Network Data for Comparison Simulated data: inside sniffer traffic from weeks 1 and 3, filtered from 32M packets to 0.6M packets. Real data: collected from Oct-Dec. 2002, filtered from 100M to 1.6M. Traffic is filtered and rate limited to extract start of inbound client sessions (NETAD filter, passes most attacks).

Attributes measured Packet header fields (all filtered packets) for Ethernet, IP, TCP, UDP, ICMP. Inbound TCP SYN packet header fields. HTTP, SMTP, and SSH requests (other application protocols are not present in both sets).

Comparison results Synthetic attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting. Too few sources: Client addresses, HTTP user agents, ssh versions. Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands.

TCP SYN Source Address SimulatedReal Packets, n r r1r1 045% rhrh 3%53% rtrt 0.1%49% r 1 ≈ r h ≈ r t ≈ 50% is consistent with a Zipf distribution and a constant growth rate of r.

Real Traffic is Less Predictable r (Number of values) Time Synthetic Real

3. Injecting Real Traffic Mix equal durations of real traffic into weeks 3-5 (both sets filtered, 344 hours each). We expect r ≥ max(r SIM, r REAL ) (realistic false alarm rate). Modify PHAD, ALAD, LERAD, NETAD, and SPADE not to separate data. Test at 100 false alarms (10 per day) on 3 mixed sets. Compare fraction of “legitimate” detections on simulated and mixed traffic for median mixed result.

PHAD Models 34 packet header fields – Ethernet, IP, TCP, UDP, ICMP Global model (no rule antecedents) Only novel values are anomalous Anomaly score = tn/r where –t = time since last anomaly –n = number of training packets –r = number of allowed values No modifications needed

ALAD Models inbound TCP client requests – addresses, ports, flags, application keywords. Score = tn/r Conditioned on destination port/address. Modified to remove address conditions and protocols not present in real traffic (telnet, FTP).

LERAD Models inbound client TCP (addresses, ports, flags, 8 words in payload). Learns conditional rules with high n/r. Discards rules that generate false alarms in last 10% of training data. Modified to weight rules by fraction of real traffic. If port = 80 then word1 = GET, POST (n/r = 10000/2)

NETAD Models inbound client request packet bytes – IP, TCP, TCP SYN, HTTP, SMTP, FTP, telnet. Score = tn/r + t i /f i allowing previously seen values. –t i = time since value i last seen –f i = frequency of i in training. Modified to remove telnet and FTP.

SPADE (Hoagland) Models inbound TCP SYN. Score = 1/P(src IP, dest IP, dest port). Probability by counting. Always in training mode. Modified by randomly replacing real destination IP with one of 4 simulated targets.

Criteria for Legitimate Detection Source address – target server must authenticate source. Destination address/port – attack must use or scan that address/port. Packet header field – attack must write/modify the packet header (probe or DOS). No U2R or Data attacks.

Mixed Traffic: Fewer Detections, but More are Legitimate Detections out of 177 at 100 false alarms

Conclusions SAD suggests the presence of simulation artifacts and artificially low false alarm rates. The simulated traffic is too clean, static and predictable. Injecting real traffic reduces suspect detections in all 5 systems tested.

Limitations and Future Work Only one real data source tested – may not generalize. Tests on real traffic cannot be replicated due to privacy concerns (root passwords in the data, etc). Each IDS must be analyzed and modified to prevent data separation. Is host data affected (BSM, audit logs)?

Limitations and Future Work Real data may contain unlabeled attacks. We found over 30 suspicious HTTP request in our data (to a Solaris based host). IIS exploit with double URL encoding (IDS evasion?) GET /scripts/..%255c%255c../winnt/system32/cmd.exe?/c+dir Probe for Code Red backdoor. GET /MSADC/root.exe?/c+dir HTTP/1.0

Further Reading An Analysis of the 1999 DARPA/Lincoln Laboratories Evaluation Data for Network Anomaly Detection By Matthew V. Mahoney and Philip K. Chan Dept. of Computer Sciences Technical Report CS