Download presentation
Presentation is loading. Please wait.
Published byLetitia Grant Modified over 9 years ago
1
Machine Learning for Network Anomaly Detection Matt Mahoney
2
Network Anomaly Detection Network – Monitors traffic to protect connected hosts Anomaly – Models normal behavior to detect novel attacks (some false alarms) Detection – Was there an attack?
3
Host Based Methods Virus Scanners File System Integrity Checkers (Tripwire, DERBI) Audit Logs System Call Monitoring – Self/Nonself (Forrest)
4
Network Based Methods Firewalls Signature Detection (SNORT, Bro) Anomaly Detection (eBayes, NIDES, ADAM, SPADE)
5
User Modeling Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap) Destination address – IP scans Destination port – port scans
6
Frequency Based Models Used by SPADE, ADAM, NIDES, eBayes, etc. Anomaly score = 1/P(event) Event probabilities estimated by counting
7
Attacks on Public Services PHF – exploits a CGI script bug on older Apache web servers GET /cgi-bin/phf?Qalias=x%0a/usr /bin/ypcat%20passwd
8
Buffer Overflows 1988 Morris Worm – fingerd 2003 SQL Sapphire Worm char buf[100]; gets(buf); bufstackExploit code Return Address 0100
9
TCP/IP Denial of Service Attacks Teardrop – overlapping IP fragments Ping of Death – IP fragments reassemble to > 64K Dosnuke – urgent data in NetBIOS packet Land – identical source and destination addresses
10
Protocol Modeling Attacks exploit bugs Bugs are most common in the least tested code Most testing occurs after delivery Therefore unusual data is more likely to be hostile
11
Protocol Models PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP) ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)
12
Time Based Models Training and test phases Values never seen in training are suspicious Score = t/p = tn/r where –t = time since last anomaly –n = number of training examples –r = number of allowed values –p = r/n = fraction of values that are novel
13
Example tn/r Training: 0000111000 n/r = 10/2 Testing: 01223 –0: no score –1: no score –2: tn/r = 6 x 10/2 = 30 –2: tn/r = 1 x 10/2 = 5 –3: tn/r = 1 x 10/2 = 5
14
PHAD – Fixed Rules 34 packet header fields –Ethernet (address, protocol) –IP (TOS, TTL, fragmentation, addresses) –TCP (options, flags, port numbers) –UDP (port numbers, checksum) –ICMP (type, code, checksum) Global model
15
LERAD – Learns conditional Rules Models inbound client TCP (addresses, ports, flags, 8 words in payload) Learns conditional rules If port = 80 then word1 = GET, POST (n/r = 10000/2)
16
LERAD Rule Learning If word1 = GET then port = 80 (n/r = 2/1) word1 = GET, HELO (n/r = 3/2) If address = Marx then port = 80, 25 (n/r = 2/2) AddressPortWord1Word2 Hume80GET/ Marx80GET/index.html Marx25HELOPascal
17
LERAD Rule Learning Randomly pick rules based on matching attributes Select nonoverlapping rules with high n/r on a sample Train on full training set (new n/r) Discard rules that discover novel values in last 10% of training (known false alarms)
18
DARPA/Lincoln Labs Evaluation 1 week of attack-free training data 2 weeks with 201 attacks SunOSSolarisLinuxNT Router Internet Sniffer Attacks
19
Attacks out of 201 Detected at 10 False Alarms per Day
20
Problems with Synthetic Traffic Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting Too few sources: Client addresses, HTTP user agents, ssh versions Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands
21
Real Traffic is Less Predictable r (Number of values) Time Synthetic Real
22
Mixed Traffic: Fewer Detections, but More are Legitimate
23
Project Status Philip K. Chan – Project Leader Gaurav Tandon – Applying LERAD to system call arguments Rachna Vargiya – Application payload tokenization Mohammad Arshad – Network traffic outlier analysis by clustering
24
Further Reading Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD.Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC.Network Traffic Anomaly Detection Based on Packet Bytes http://cs.fit.edu/~mmahoney/dist/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.