An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney mmahoney@cs.fit.edu Feb. 18, 2003

Is the DARPA/Lincoln Labs IDS Evaluation Realistic? The most widely used intrusion detection evaluation data set. 1998 data used in KDD cup competition with 25 participants. 8 participating organizations submitted 18 systems to the 1999 evaluation. Tests host or network based IDS. Tests signature or anomaly detection. 58 types of attacks (more than any other evaluation) 4 target operating systems. Training and test data released after evaluation to encourage IDS development.

Problems with the LL Evaluation Background network data is synthetic. SAD (Simple Anomaly Detector) detects too many attacks. Comparison with real traffic – range of attribute values is too small and static (TTL, TCP options, client addresses…). Injecting real traffic removes suspect detections from PHAD, ALAD LERAD, NETAD, and SPADE.

1. Simple Anomaly Detector (SAD) Examines only inbound client TCP SYN packets. Examines only one byte of the packet. Trains on attack-free data (week 1 or 3). A value never seen in training is an anomaly. If there have been no anomalies for 60 seconds, then output an alarm with score 1. Train: 001110111 Test: 010 2 03001 3 23011 60 sec.

DARPA/Lincoln Labs Evaluation Weeks 1 and 3: attack free training data. Week 2: training data with 43 labeled attacks. Weeks 4 and 5: 201 test attacks. SunOSSolarisLinuxNT Router Internet Sniffer Attacks

SAD Evaluation Develop on weeks 1-2 (available in advance of 1999 evaluation) to find good bytes. Train on week 3 (no attacks). Test on weeks 4-5 inside sniffer (177 visible attacks). Count detections and false alarms using 1999 evaluation criteria.

SAD Results Variants (bytes) that do well: source IP address (any of 4 bytes), TTL, TCP options, IP packet size, TCP header size, TCP window size, source and destination ports. Variants that do well on weeks 1-2 (available in advance) usually do well on weeks 3-5 (evaluation). Very low false alarm rates. Most detections are not credible.

SAD vs. 1999 Evaluation The top system in the 1999 evaluation, Expert 1, detects 85 of 169 visible attacks (50%) at 100 false alarms (10 per day) using a combination of host and network based signature and anomaly detection. SAD detects 79 of 177 visible attacks (45%) with 43 false alarms using the third byte of the source IP address.

1999 IDS Evaluation vs. SAD

SAD Detections by Source Address (that should have been missed) DOS on public services: apache2, back, crashiis, ls_domain, neptune, warezclient, warezmaster R2L on public services: guessftp, ncftp, netbus, netcat, phf, ppmacro, sendmail U2R: anypw, eject, ffbconfig, perl, sechole, sqlattack, xterm, yaga

2. Comparison with Real Traffic Anomaly detection systems flag rare events (e.g. previously unseen addresses or ports). “Allowed” values are learned during training on attack-free traffic. Novel values in background traffic would cause false alarms. Are novel values more common in real traffic?

Measuring the Rate of Novel Values r = Number of values observed in training. r 1 = Fraction of values seen exactly once (Good- Turing probability estimate that next value will be novel). r h = Fraction of values seen only in second half of training. r t = Fraction of training time to observe half of all values. Larger values in real data would suggest a higher false alarm rate.

Network Data for Comparison Simulated data: inside sniffer traffic from weeks 1 and 3, filtered from 32M packets to 0.6M packets. Real data: collected from www.cs.fit.edu Oct-Dec. 2002, filtered from 100M to 1.6M. Traffic is filtered and rate limited to extract start of inbound client sessions (NETAD filter, passes most attacks).

Attributes measured Packet header fields (all filtered packets) for Ethernet, IP, TCP, UDP, ICMP. Inbound TCP SYN packet header fields. HTTP, SMTP, and SSH requests (other application protocols are not present in both sets).

Comparison results Synthetic attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting. Too few sources: Client addresses, HTTP user agents, ssh versions. Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands.

TCP SYN Source Address SimulatedReal Packets, n50650210297 r2924924 r1r1 045% rhrh 3%53% rtrt 0.1%49% r 1 ≈ r h ≈ r t ≈ 50% is consistent with a Zipf distribution and a constant growth rate of r.

Real Traffic is Less Predictable r (Number of values) Time Synthetic Real

3. Injecting Real Traffic Mix equal durations of real traffic into weeks 3-5 (both sets filtered, 344 hours each). We expect r ≥ max(r SIM, r REAL ) (realistic false alarm rate). Modify PHAD, ALAD, LERAD, NETAD, and SPADE not to separate data. Test at 100 false alarms (10 per day) on 3 mixed sets. Compare fraction of “legitimate” detections on simulated and mixed traffic for median mixed result.

PHAD Models 34 packet header fields – Ethernet, IP, TCP, UDP, ICMP Global model (no rule antecedents) Only novel values are anomalous Anomaly score = tn/r where –t = time since last anomaly –n = number of training packets –r = number of allowed values No modifications needed

ALAD Models inbound TCP client requests – addresses, ports, flags, application keywords. Score = tn/r Conditioned on destination port/address. Modified to remove address conditions and protocols not present in real traffic (telnet, FTP).

LERAD Models inbound client TCP (addresses, ports, flags, 8 words in payload). Learns conditional rules with high n/r. Discards rules that generate false alarms in last 10% of training data. Modified to weight rules by fraction of real traffic. If port = 80 then word1 = GET, POST (n/r = 10000/2)

NETAD Models inbound client request packet bytes – IP, TCP, TCP SYN, HTTP, SMTP, FTP, telnet. Score = tn/r + t i /f i allowing previously seen values. –t i = time since value i last seen –f i = frequency of i in training. Modified to remove telnet and FTP.

SPADE (Hoagland) Models inbound TCP SYN. Score = 1/P(src IP, dest IP, dest port). Probability by counting. Always in training mode. Modified by randomly replacing real destination IP with one of 4 simulated targets.

Criteria for Legitimate Detection Source address – target server must authenticate source. Destination address/port – attack must use or scan that address/port. Packet header field – attack must write/modify the packet header (probe or DOS). No U2R or Data attacks.

Mixed Traffic: Fewer Detections, but More are Legitimate Detections out of 177 at 100 false alarms

Conclusions SAD suggests the presence of simulation artifacts and artificially low false alarm rates. The simulated traffic is too clean, static and predictable. Injecting real traffic reduces suspect detections in all 5 systems tested.

Limitations and Future Work Only one real data source tested – may not generalize. Tests on real traffic cannot be replicated due to privacy concerns (root passwords in the data, etc). Each IDS must be analyzed and modified to prevent data separation. Is host data affected (BSM, audit logs)?

Limitations and Future Work Real data may contain unlabeled attacks. We found over 30 suspicious HTTP request in our data (to a Solaris based host). IIS exploit with double URL encoding (IDS evasion?) GET /scripts/..%255c%255c../winnt/system32/cmd.exe?/c+dir Probe for Code Red backdoor. GET /MSADC/root.exe?/c+dir HTTP/1.0

Further Reading An Analysis of the 1999 DARPA/Lincoln Laboratories Evaluation Data for Network Anomaly Detection By Matthew V. Mahoney and Philip K. Chan Dept. of Computer Sciences Technical Report CS-2003-02 http://cs.fit.edu/~mmahoney/paper7.pdf

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.

Similar presentations

Presentation on theme: "An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.

Similar presentations

Presentation on theme: "An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003."— Presentation transcript:

Similar presentations

About project

Feedback