An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matthew V. Mahoney and Philip K. Chan.

Slides:



Advertisements
Similar presentations
Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Advertisements

Computer Networks20-1 Chapter 20. Network Layer: Internet Protocol 20.1 Internetworking 20.2 IPv IPv6.
IPv4 - The Internet Protocol Version 4
Introduction to TCP/IP TCP / IP –including 2 protocols Protocol : = a set of rules that govern the communication between different devices Protocol : =
Firewall Simulation Teaching Information Security Using: Visualization Tools, Case Studies, and Hands-on Exercises May 23, 2012.
Predicting Tor Path Compromise by Exit Port IEEE WIDA 2009December 16, 2009 Kevin Bauer, Dirk Grunwald, and Douglas Sicker University of Colorado Client.
Network Traffic Anomaly Detection Based on Packet Bytes Matthew V. Mahoney Florida Institute of Technology
Carnegie Mellon University Software Engineering Institute Pyrite or gold? It takes more than a pick and shovel SEI/CERT -CyLab Carnegie Mellon University.
Firewalls and Intrusion Detection Systems
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Snort - an network intrusion prevention and detection system Student: Yue Jiang Professor: Dr. Bojan Cukic CS665 class presentation.
Security Awareness: Applying Practical Security in Your World
Sanjay Goel, School of Business/Center for Information Forensics and Assurance University at Albany Proprietary Information 1 Unit Outline Information.
Examining IP Header Fields
© 2006 Cisco Systems, Inc. All rights reserved. Implementing Secure Converged Wide Area Networks (ISCW) Module 6: Cisco IOS Threat Defense Features.
CSCI 4550/8556 Computer Networks Comer, Chapter 20: IP Datagrams and Datagram Forwarding.
Report on statistical Intrusion Detection systems By Ganesh Godavari.
FIREWALLS & NETWORK SECURITY with Intrusion Detection and VPNs, 2 nd ed. 6 Packet Filtering By Whitman, Mattord, & Austin© 2008 Course Technology.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
John Felber.  Sources  What is an Intrusion Detection System  Types of Intrusion Detection Systems  How an IDS Works  Detection Methods  Issues.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
1 Advanced Application and Web Filtering. 2 Common security attacks Finding a way into the network Exploiting software bugs, buffer overflows Denial of.
PROS & CONS of Proxy Firewall
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic A Dissertation by Matthew V. Mahoney Major Advisor: Philip.
Port Knocking Software Project Presentation Paper Study – Part 1 Group member: Liew Jiun Hau ( ) Lee Shirly ( ) Ong Ivy ( )
Forensic and Investigative Accounting
1Federal Network Systems, LLC CIS Network Security Instructor Professor Mort Anvair Notice: Use and Disclosure of Data. Limited Data Rights. This proposal.
Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.
FIREWALL Mạng máy tính nâng cao-V1.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
COEN 252 Computer Forensics
Chapter 6: Packet Filtering
What is FORENSICS? Why do we need Network Forensics?
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
TCP/IP Yang Wang Professor: M.ANVARI.
ECE4112 Lab 7: Honeypots and Network Monitoring and Forensics Group 13 + Group 14 Allen Brewer Jiayue (Simon) Chen Daniel Chu Chinmay Patel.
COEN 252 Computer Forensics Collecting Network-based Evidence.
Access Control List ACL. Access Control List ACL.
Forensic and Investigative Accounting Chapter 14 Internet Forensics Analysis: Profiling the Cybercriminal © 2005, CCH INCORPORATED 4025 W. Peterson Ave.
CSCI 530 Lab Intrusion Detection Systems IDS. A collection of techniques and methodologies used to monitor suspicious activities both at the network and.
Postfix Mail Server Postfix is used frequently and handle thousands of messages. compatible with sendmail at command level. high performance program easier-
1 CHAPTER 3 CLASSES OF ATTACK. 2 Denial of Service (DoS) Takes place when availability to resource is intentionally blocked or degraded Takes place when.
Packet Filtering Chapter 4. Learning Objectives Understand packets and packet filtering Understand approaches to packet filtering Set specific filtering.
An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
Network Security. 2 SECURITY REQUIREMENTS Privacy (Confidentiality) Data only be accessible by authorized parties Authenticity A host or service be able.
Speaker:Chiang Hong-Ren Identifying Botnets Using Anomaly Detection Techniques Applied to DNS Traffic.
CHAPTER 9 Sniffing.
TCP/IP (Transmission Control Protocol / Internet Protocol)
Network Sniffer Anuj Shah Advisor: Dr. Chung-E Wang Department of Computer Science.
IP addresses IPv4 and IPv6. IP addresses (IP=Internet Protocol) Each computer connected to the Internet must have a unique IP address.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Network Forensics - III November 3, 2008.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Machine Learning for Network Anomaly Detection Matt Mahoney.
Unit 2 Personal Cyber Security and Social Engineering Part 2.
25/09/ Firewall, IDS & IPS basics. Summary Firewalls Intrusion detection system Intrusion prevention system.
Instructor & Todd Lammle
Snort – IDS / IPS.
The Transport Layer Implementation Services Functions Protocols
Internet Protocol Version 6 Specifications
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
The Devil and Packet Trace Anonymization
Firewall – Survey Purpose of a Firewall Characteristic of a firewall
Lecture # 7 Firewalls الجدر النارية. Lecture # 7 Firewalls الجدر النارية.
Topic 5: Communication and the Internet
Net 323 D: Networks Protocols
Protocol Application TCP/IP Layer Model
NET 323D: Networks Protocols
Intrusion Detection Systems
Presentation transcript:

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matthew V. Mahoney and Philip K. Chan

Data Mining for Computer Security Workshop at ICDM03 Melbourne, FL Nov 19,

Outline DARPA/Lincoln Laboratory IDS evaluation (IDEVAL) Analyze IDEVAL with respect to network anomaly detection Propose a remedy for identified simulation artifacts Measure effects on anomaly detection algorithms

1999 IDEVAL SolarisSunOSLinuxNT Router Simulated Internet Inside Sniffer 201 Attacks Outside Sniffer BSMAudit Logs, Directory and File System Dumps

Importance of 1999 IDEVAL Comprehensive –signature or anomaly –host or network Widely used (KDD cup, etc.) Produced at great effort No comparable benchmarks are available Scientific investigation –Reproducing results –Comparing methods

1999 IDEVAL Results Top 4 of 18 systems at 100 false alarms SystemAttacks detected/in spec Expert 185/169 (50%) Expert 281/173 (47%) Dmine41/102 (40%) Forensics15/27 (55%)

Partially Simulated Net Traffic tcpdump records sniffed traffic on a testbed network Attacks are “real”—mostly from publicly available scripts/programs Normal user activities are generated based on models similar to military users

Related Work IDEVAL critique (McHugh, 00) mostly based on methodology of data generation and evaluation –Did not include “low-level” analysis of background traffic Anomaly detection algorithms –Network based: SPADE, ADAM, LERAD –Host based: t-stide, instance-based

Problem Statement Does IDEVAL have simulation artifacts? If so, can we “fix” IDEVAL? Do simulation artifacts affect the evaluation of anomaly detection algorithms?

Simulation Artifacts? Comparing two data sets: –IDEVAL: Week 3 –FIT: 623 hours of traffic from a university departmental server Look for features with significant differences

# of Unique Values & % of Traffic Inbound client packetsIDEVALFIT Client IP addresses2924,924 HTTP user agents5807 SSH client versions132 TCP SYN options1103 TTL values9177 Malformed SMTPNone0.1% TCP checksum errorsNone0.02% IP fragmentationNone0.45%

Growth Rate in Feature Values Number of values observed Time IDEVAL FIT

Conditions for Simulation Artifacts 1.Are attributes easier to model in simulation (fewer values, distribution fixed over time)? Yes (to be shown next). 2.Do simulated attacks have idiosyncratic differences in easily modeled attributes? Not examined here

Exploiting Simulation Artifacts SAD – Simple Anomaly Detector Examines only one byte of each inbound TCP SYN packet (e.g. TTL field) Training: record which of 256 possible values occur at least once Testing: any value never seen in training signals an attack (maximum 1 alarm per minute)

SAD IDEVAL Results Train on inside sniffer week 3 (no attacks) Test on weeks 4-5 (177 in-spec attacks) SAD is competitive with top 1999 results Packet Byte ExaminedAttacks Detected False Alarms IP source third byte79/177 (45%)43 IP source fourth byte7116 TTL244 TCP header size152

Suspicious Detections Application-level attacks detected by low- level TCP anomalies (options, window size, header size) Detections by anomalous TTL (126 or 253 in hostile traffic, 127 or 254 in normal traffic)

Proposed Mitigation 1.Mix real background traffic into IDEVAL 2.Modify IDS or data so that real traffic cannot be modeled independently of IDEVAL traffic

Mixing Procedure Collect real traffic (preferably with similar protocols and traffic rate) Adjust timestamps to 1999 (IDEVAL) and interleave packets chronologically Map IP addresses of real local hosts to additional hosts on the LAN in IDEVAL (not necessary if higher-order bytes are not used in attributes) Caveats: –No internal traffic between the IDEVAL hosts and the real hosts

IDS/Data Modifications Necessary to prevent independent modeling of IDEVAL –PHAD: no modifications needed –ALAD: remove destination IP as a conditional attribute –LERAD: verify rules do not distinguish IDEVAL from FIT –NETAD: remove IDEVAL telnet and FTP rules –SPADE: disguise FIT addresses as IDEVAL

Evaluation Procedure 5 network anomaly detectors on IDEVAL and mixed (IDEVAL+FIT) traffic Training: Week 3 Testing: Weeks 4 & 5 (177 “in-spec” attacks) Evaluation criteria: –Number of detections with at most 10 false alarms per day –Percentage of “legitimate” detections (anomalies correspond to the nature of attacks)

Criteria for Legitimate Detection Anomalies correspond to the nature of attacks Source address anomaly: attack must be on a password protected service (POP3, IMAP, SSH, etc.) TCP/IP anomalies: attack on network or TCP/IP stack (not an application server) U2R and Data attacks: not legitimate

Mixed Traffic: Fewer Detections, but More are Legitimate Detections out of 177 at 100 false alarms

Concluding Remarks Values of some IDEVAL attributes have small ranges and do not continue to grow continuously. Lack of “crud” in IDEVAL. Artifacts can be “masked/removed” by mixing in real traffic. Anomaly detection models from the mixed data achieved fewer detections, but a higher percentage of legitimate detections.

Limitations Traffic injection requires careful analysis and possible IDS modification to prevent independent modeling of the two sources. Mixed traffic becomes proprietary. Evaluations cannot be independently verified. Protocols have evolved since Our results do not apply to signature detection. Our results may not apply to the remaining IDEVAL data (BSM, logs, file system).

Future Work One data set of real traffic from a university--analyze headers in publicly available data sets Analyzed features that can affect the evaluated algorithms--more features for other AD algorithms

Final Thoughts Real data –Pros: Real behavior in real environment –Cons: Can’t be released because of privacy concerns (i.e., results can’t be reproduced or compared) Simulated data –Pros: Can be released as benchmarks –Cons: Simulating real behavior correctly is very difficult Mixed data –A way to bridge the gap

Tough Questions from John & Josh?