Carnegie Mellon University Software Engineering Institute Pyrite or gold? It takes more than a pick and shovel SEI/CERT -CyLab Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Intrusion Detection Systems (I) CS 6262 Fall 02. Definitions Intrusion Intrusion A set of actions aimed to compromise the security goals, namely A set.
Advertisements

Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute.
Network Traffic Anomaly Detection Based on Packet Bytes Matthew V. Mahoney Florida Institute of Technology
Anomaly Detection Steven M. Bellovin Matsuzaki ‘maz’ Yoshinobu 1.
Web Defacement Anh Nguyen May 6 th, Organization Introduction How Hackers Deface Web Pages Solutions to Web Defacement Conclusions 2.
Cyber Threat Analysis  Intrusions are actions that attempt to bypass security mechanisms of computer systems  Intrusions are caused by:  Attackers accessing.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
EECS Presentation Web Tap: Intelligent Intrusion Detection Kevin Borders.
CSCI 4550/8556 Computer Networks Comer, Chapter 3: Network Programming and Applications.
Statistical based IDS background introduction. Statistical IDS background Why do we do this project Attack introduction IDS architecture Data description.
© 2006 Cisco Systems, Inc. All rights reserved. Implementing Secure Converged Wide Area Networks (ISCW) Module 6: Cisco IOS Threat Defense Features.
Report on statistical Intrusion Detection systems By Ganesh Godavari.
FIREWALLS & NETWORK SECURITY with Intrusion Detection and VPNs, 2 nd ed. 6 Packet Filtering By Whitman, Mattord, & Austin© 2008 Course Technology.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Intrusion Detection System Marmagna Desai [ 520 Presentation]
Network security policy: best practices
INTRUSION DETECTION SYSTEMS Tristan Walters Rayce West.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Information Systems CS-507 Lecture 40. Availability of tools and techniques on the Internet or as commercially available software that an intruder can.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Intrusion Detection for Grid and Cloud Computing Author Kleber Vieira, Alexandre Schulter, Carlos Becker Westphall, and Carla Merkle Westphall Federal.
Masquerade Detection Mark Stamp 1Masquerade Detection.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Chapter 6: Packet Filtering
IIT Indore © Neminah Hubballi
Improving Intrusion Detection System Taminee Shinasharkey CS689 11/2/00.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
ECE4112 Lab 7: Honeypots and Network Monitoring and Forensics Group 13 + Group 14 Allen Brewer Jiayue (Simon) Chen Daniel Chu Chinmay Patel.
Honeypot and Intrusion Detection System
Intrusion Detection and Prevention. Objectives ● Purpose of IDS's ● Function of IDS's in a secure network design ● Install and use an IDS ● Customize.
Forensic and Investigative Accounting Chapter 14 Internet Forensics Analysis: Profiling the Cybercriminal © 2005, CCH INCORPORATED 4025 W. Peterson Ave.
An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matthew V. Mahoney and Philip K. Chan.
Bug Localization with Machine Learning Techniques Wujie Zheng
Packet Filtering Chapter 4. Learning Objectives Understand packets and packet filtering Understand approaches to packet filtering Set specific filtering.
An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.
An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.
1 Impact of IT Monoculture on Behavioral End Host Intrusion Detection Dhiman Barman, UC Riverside/Juniper Jaideep Chandrashekar, Intel Research Nina Taft,
Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
Computer Network Forensics Lecture 6 – Intrusion Detection © Joe Cleetus Concurrent Engineering Research Center, Lane Dept of Computer Science and Engineering,
Week 10-11c Attacks and Malware III. Remote Control Facility distinguishes a bot from a worm distinguishes a bot from a worm worm propagates itself and.
Securing the Network Infrastructure. Firewalls Typically used to filter packets Designed to prevent malicious packets from entering the network or its.
Alexey A. Didyk Kherson national technical university, Ukraine
1 Network Firewalls CSCI Web Security Spring 2003 Presented By Yasir Zahur.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
DTRAB Combating Against Attacks on Encrypted Protocols through Traffic- Feature Analysis.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
1 A Network Security Monitor Paper By: Heberlein et. al. Presentation By: Eric Hawkins.
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Machine Learning for Network Anomaly Detection Matt Mahoney.
Role Of Network IDS in Network Perimeter Defense.
Hiding Intrusions : From the Abnormal to the Normal and Beyond Kymie Tan, John McHugh and Kevin Killourhy Presented in 5 th Information Hiding Workshop,
Regan Little. Definition Methods of Screening Types of Firewall Network-Level Firewalls Circuit-Level Firewalls Application-Level Firewalls Stateful Multi-Level.
Debugging Intermittent Issues
Debugging Intermittent Issues
Outline Introduction Characteristics of intrusion detection systems
Roland Kwitt & Tobias Strohmeier
Fault Injection: A Method for Validating Fault-tolerant System
Intrusion Prevention Systems
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
Statistical based IDS background introduction
Presentation transcript:

Carnegie Mellon University Software Engineering Institute Pyrite or gold? It takes more than a pick and shovel SEI/CERT -CyLab Carnegie Mellon University 20 August 2004 John McHugh, and a cast of thousands

Carnegie Mellon University Software Engineering Institute Pyrite or Gold?

Carnegie Mellon University Software Engineering Institute Failed promises Data mining and machine learning techniques have been proposed as a mechanism for detecting malicious activity by examining a number of data streams representing computer and network activity. Although some encouraging results have been obtained, most systems do not deliver in the field what they promised in the lab.

Carnegie Mellon University Software Engineering Institute Why is this? There are a number of reasons for this, but the most likely one is the failure of the developers of such systems to understand what the systems have learned and to relate it to the activity they are seeking to detect. –Put simply there are too many serendipitous detections, or –The distinguishing behavior is insufficient to establish either necessary or sufficient conditions for maliciousness.

Carnegie Mellon University Software Engineering Institute Is this all? Probably –At this point, you should know what you need to do to fix the problem and I could go home, but, to drive home the point, I’m going to give some examples that may help to illustrate the problem.

Carnegie Mellon University Software Engineering Institute If a tree falls in the forest … After some years of looking at the problem of relating abnormal behavior to malicious behavior, I am starting to realize that this is probably not fruitful. –Are there any necessarily normal behaviors? –Are there any behaviors that are necessarily both abnormal and malicious? –Are there any descriptors that are sufficient to identify either?

Carnegie Mellon University Software Engineering Institute Normal Program Behavior A number of suggestions have been made for defining normal program behavior. –Execution traces under normal conditions –Execution traces by symbolic execution –Formal specifications –Domain and type enforcement We will consider the first two further

Carnegie Mellon University Software Engineering Institute Observed Execution Traces If we run a program long enough, in a “typical” environment, we should collect a definitive set of “normal” traces. Under some measure of similarity, abnormal traces may indicate intrusive activity. The work begun by Forrest is based on this approach.

Carnegie Mellon University Software Engineering Institute Possible Execution Traces In theory, it is possible to determine all possible execution traces for most uesful programs. This could substitute for observation, but What if the traces admit malicious behavior? –We will see an example later. A trace ending with “exec” may or may not be normal

Carnegie Mellon University Software Engineering Institute Problems System call traces, per se, are not a good basis for detection. STIDE represents normal by a set of fixed length subtraces. –This is blind to minimal foreign subtraces longer than the length chosen –There are ways to modify attacks so as to be missed (Tan, Maxion, Wagner)

Carnegie Mellon University Software Engineering Institute Example #1 -- Hiding in Normal: Modifying the restore attack The restore attack manifests in the following sequence: write, read, write, munmap, exit The read, write is due to the fact that the program used by restore (created by the attacker) failed to connect to the remote computer, causing restore to print an error and exit. Using the normal data as training data and this attack as test data,we find the following two minimal foreign sequences: write,read,write, and write,munmap Since this attack contains a MFS of size 2, stide will detect this attack with a window size of 2 or greater. We now need to find new ``stide-friendly'' means to achieve the same goal.

Carnegie Mellon University Software Engineering Institute Example #1 (cont) “Stide-friendly” Method What if the attacker's program did not fail to connect to a remote computer, but performed the expected operations of the program normally invoked by restore ? We craft such a program to mimic the normal behavior of the program normally invoked by restore (e.g. the secure shell) while still giving the attacker root privileges. The attacker can modify the exploit to make the system program restore run our ``normal-mimicking'' program instead. Now rather than printing an error and exiting, restore executes successfully, as shown in the manifestation of this attack: write, read, read, read,... This manifestation contains no minimal foreign sequences, yet it still bestows root privileges upon the attacker.

Carnegie Mellon University Software Engineering Institute Network data NIDS look at various representations of network data. –Whole packets –Packet headers –Network flows Ground truth (labeling) is difficult for real data, artificial data presents other problems. Lets look at the Lincoln data.

Carnegie Mellon University Software Engineering Institute Context and Background (with a hint of a research agenda) In the summer of 1999, shortly after I joined CERT, I became involved in the production of a report on the state of practice in intrusion detection. This led to an investigation of efforts by MIT’s Lincoln Laboratory to evaluate IDSs developed under DARPA’s research programs. Critiques of the 1998 Lincoln work were presented and published in several forums. The conclusion of these efforts is that evaluating IDSs is very difficult under the best of circumstances. The reasons range from fundamental lack of theory to complex logistics.

Carnegie Mellon University Software Engineering Institute Why 1998? At the time this effort began, the 1998 evaluation was complete, 1999 was about to begin. The public record associated with the 1998 effort indicated a number of problems and left many questions unanswered. It appeared likely that the 1999 evaluation would not address many of these issues so that many of the questions raised with respect to the 1998 evaluation would remain relevant. A recent analysis of the 1999 data shows it has similar problems

Carnegie Mellon University Software Engineering Institute Data Characterization Artificial data was used for the evaluation. –Characterization of real environment incomplete, sample size unknown –The data is said to be based on typical traffic seen at Air Force bases. The sample may be as small as 1. –The abstractions used to define “typical” are not known except for word frequencies in traffic.

Carnegie Mellon University Software Engineering Institute Data Characterization Artificial data was used for the evaluation. –The criteria for background data generation are not completely known - No pathologies? –There is a lot of “crud” on the wire. Some of it looks intrusive, but apparently isn’t. This is not seen in the test data. –As far as we know, there is no probe data in the background data, but some things that look like probing are normal and others are not the result of malicious intent.

Carnegie Mellon University Software Engineering Institute Data Characterization Artificial data was used for the evaluation. –The test data set has not been validated. No pilot studies or data evaluations were performed. Investigators complained of having to debug test data while evaluating their systems False alarm characteristics critical to evaluation, but not known to be realistic in test data

Carnegie Mellon University Software Engineering Institute Data Characterization Artificial data was used for the evaluation. –Data rates appear low 10-50Kb/sec at border Contrast with university sites of similar size shows 10X or more difference. –Attack mix unrealistic - attack rate may be also Minor role for probes Limited variety of attacks (more in ‘99) Attacks per day may be high

Carnegie Mellon University Software Engineering Institute Other data issues Comparison of the Lincoln data, with other similar sources produces interesting results. BSM data shows very different conditional entropy from real data observed by Forrest, et. al. Lee S&P 2001 Clustering attempts on tcp dump data produce “linear” clusters indicating low variability. Taylor NSPW 2001 These are suspicious. Lets look in detail at the 1999 network data!

Carnegie Mellon University Software Engineering Institute 1999 IDEVAL SolarisSunOSLinuxNT Router Simulated Internet Inside Sniffer 201 Attacks Outside Sniffer BSMAudit Logs, Directory and File System Dumps Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute 1999 IDEVAL Results Top 4 of 18 systems at 100 false alarms SystemAttacks detected/in spec Expert 185/169 (50%) Expert 281/173 (47%) Dmine41/102 (40%) Forensics15/27 (55%) Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Problem Statement Does IDEVAL have simulation artifacts? If so, can we “fix” IDEVAL? Do simulation artifacts affect the evaluation of anomaly detection algorithms? Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Simulation Artifacts? Comparing two data sets: –IDEVAL: Week 3 –FIT: 623 hours of traffic from a university departmental server Look for features with significant differences Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute # of Unique Values & % of Traffic Inbound client packetsIDEVALFIT Client IP addresses2924,924 HTTP user agents5807 SSH client versions132 TCP SYN options1103 TTL values9177 Malformed SMTPNone0.1% TCP checksum errorsNone0.02% IP fragmentationNone0.45% Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Growth Rate in Feature Values Number of values observed Time IDEVAL FIT Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Conditions for Simulation Artifacts 1.Are attributes easier to model in simulation (fewer values, distribution fixed over time)? Yes (to be shown next). 2.Do simulated attacks have idiosyncratic differences in easily modeled attributes? Not examined here Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Exploiting Simulation Artifacts SAD – Simple Anomaly Detector Examines only one byte of each inbound TCP SYN packet (e.g. TTL field) Training: record which of 256 possible values occur at least once Testing: any value never seen in training signals an attack (maximum 1 alarm per minute) Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute SAD IDEVAL Results Train on inside sniffer week 3 (no attacks) Test on weeks 4-5 (177 in-spec attacks) SAD is competitive with top 1999 results Packet Byte ExaminedAttacks Detected False Alarms IP source third byte79/177 (45%)43 IP source fourth byte7116 TTL244 TCP header size152 Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Suspicious Detections Application-level attacks detected by low- level TCP anomalies (options, window size, header size) Detections by anomalous TTL (126 or 253 in hostile traffic, 127 or 254 in normal traffic) Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Mahoney & Chan Mix Mahoney and Chan mixed the IDEVAL 99 data with FIT data, taking care to modify the mix (and IDS rules) to avoid developing independent models for each component. They then tested the mix of a number of anomaly detection algorithms.

Carnegie Mellon University Software Engineering Institute Mixed Traffic: Fewer Detections, but More are Legitimate Detections out of 177 at 100 false alarms Thanks to Mahoney and Chan (RAID 2003)

Carnegie Mellon University Software Engineering Institute Affect on IDEVAL? If the results shown on the previous slides hold up for the IDEVAL 99 systems, the results of the DARPA program would be only about half as good as previously reported. It is known that a number of the systems evaluated during the 99 trials had difficulties when field deployments were attempted

Carnegie Mellon University Software Engineering Institute Real data is complex We are working with data from a customer network to characterize complexity and normal behaviors. The following are a few illustrations:

Carnegie Mellon University Software Engineering Institute One Day of Inside to Outside

Carnegie Mellon University Software Engineering Institute The Week’s Out/In Surface

Carnegie Mellon University Software Engineering Institute Workstation?

Carnegie Mellon University Software Engineering Institute Mail Server?

Carnegie Mellon University Software Engineering Institute Web Server

Carnegie Mellon University Software Engineering Institute Scanner

Carnegie Mellon University Software Engineering Institute What does it mean There is a forensic component to anomaly detection when it is used for detecting intrusions. In evaluating an anomaly detector used this way, it is incumbent on the evaluator to show that the features on which the detection is based are not serendipitous. –In addition to the above, I have reviewed numerous papers based on irrelevant detections.

Carnegie Mellon University Software Engineering Institute Consequences Artificial data such as the Lincoln data is useful, but: –Don’t take the results too seriously –Good results on artificial data should Encourage you to see if the results hold up in the wild (If you can’t do this was the idea really good?) Discourage you from seeking to publish, at least in the intrusion detection literature

Carnegie Mellon University Software Engineering Institute Conclusions Anomalies are not necessarily benign or malicious, they are just different or rare Just because an intrusion is associated with an anomalous feature does not mean that it has to be that way. If you don’t understand the forensics of intrusions, you are likely to be part of the problem, not part of the solution.

Carnegie Mellon University Software Engineering Institute Pyrite or Gold?