Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.

Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By John Mchugh Presented by Hongyu Gao Feb. 5, 2009

Outline  Lincoln Lab’s evaluation in 1998  Critic on data generation  Critic on taxonomy  Critic on evaluation process  Brief discussion on 1999 evaluation  Conclusion

The 1998 evaluation  The most comprehensive evaluation of research on intrusion detection systems that has been performed to date

The 1998 evaluation cont’d  Objective:  “To provide unbiased measurement of current performance levels.”  “To provide a common shared corpus of experimental data that is available to a wide range of researchers”

The 1998 evaluation, cont’d  Simulated a typical air force base network

The 1998 evaluation, cont’d  Collected synthetic traffic data

The 1998 evaluation cont’d  Researchers tested their system using the traffic  Receiver Operating Curve (ROC) was used to present the result

1. Critic on data generation  Both background (normal) and attack data are synthesized.  Said to represent traffic to and from a typical air force base.  It is required that such synthesized data should reflect system performance in realistic scenarios.

Critic on background data  Counter point 1  Real traffic is not well-behaved.  E.g. spontaneous packet storms that are indistinguishable from malicious attempts at flooding.  Not considered in background traffic

Critic on background data, cont’d  Counter point 2  Low average data rate

Critic on background data, cont’d  Possible negative consequences  System may produce larger amount of FP in realistic scenario.  System may drop packets in realistic scenario

Critic on attack data  The distribution of attack is not realisitic  The number of attacks, which are U2R, R2L, DoS, Probing, is of the same order U2RR2LDoSProbing 114349964

Critic on attack data, cont’d  Possible negative consequences  The aggregate detection rate does not reflect the detection rate in real traffic

Critic on simulated AFB network  Not likely to be realistic  4 real machines  3 fixed attack target  Flat architecture  Possible negative consequence  IDS can be tuned to only look at traffic targeting to certain hosts  Preclude the execution of “smurf” or ICMP echo attack

2. Critic on taxonomy  Based on the attacker’s point of view  Denial of service  Remote to user  User to root  probing  Not useful describing what an IDS might see

Critic on taxonomy, cont’d  Alternative taxonomy  Classify by protocol layer  Classify by whether a completed protocol handshake is necessary  Classify by severity of attack  Many others…

3. Critic on evaluation  The unit of evaluation  Session is used  Some traffic (e.g. message originating with Ethernet hubs) are not in any session  Is “session” an appropriate unit?

3. Critic on evaluation  Scoring and ROC  Denominator?

Critic on evaluation, cont’d  An non-standard variation of ROC  --Substitue x-axis with false alarms per day  Possible problem  The number of false alarms per unit time may increase significantly with data rate increasing  Suggested alternative  The total number of alert (both TP and FP)  Use the standard ROC

Evaluation on Snort

Evaluation on Snort, cont’d  Poor performance on Dos and Probe  Good performance on R2L and U2R  Conclusion on Snort:  Not sufficient to get any conclusion

Critic on evaluation, cont’d  False alarm rate  A crucial concern  The designated maximum value (0.1%) is inconsistent with the maximum operator load set by Lincoln lab (100/day)

Critic on evaluation, cont’d  Does the evaluation result really mean something?  ROC curve reflects the ability to detect attack against normal traffic  What does a good IDS consist of?  Algorithm  Reliability  Good signatures  …

Brief discussion on 1999 evaluation  Have some superficial improvements  Additional hosts and host types are added  New attacks are added  None of these addresses the flaws listed above

Brief discussion on 1999 evaluation, cont’d  Security policy is not clear  What is an attack, what is not?  Scan, probe

Conclusion  The Lincoln lab evaluation is a major and impressive effort.  This paper criticizes the evaluation from different aspects.

Follow-up Work  DETER - Testbed for network security technology.  Public facility for medium-scale repeatable experiments in computer security  Located at USC ISI and UC Berkeley.  300 PC systems running Utah's Emulab software.  Experimenter can access DETER remotely to develop, configure, and manipulate collections of nodes and links with arbitrary network topologies.  Problem with this is currently that there isn't realistic attack module or background noise generator plugin for the framework. Attack distribution is a problem.  PREDICT - Its a huge trace repository. It is not public and there are several legal issues in working with it.

Follow-up Work  KDD Cup - Its goal is to provide data-sets from real world problems to demonstrate the applicability of dierent knowledge discovery and machine learning techniques.  The 1999 KDD intrusion detection contest uses a labelled version of this 1998 DARPA dataset,  Annotated with connection features.  There are several problems with KDD Cup. Recently, people have found average TCP packet sizes as best correlation metrics for attacks, which is clearly points out the inefficacy.

Discussion  Can the aforementioned problems be addressed?  Dataset  Taxonomy  Unit for analysis  Approach to compare between IDSes  …

The End Thank you

Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.

Similar presentations

Presentation on theme: "Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.

Similar presentations

Presentation on theme: "Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By."— Presentation transcript:

Similar presentations

About project

Feedback