Download presentation
Presentation is loading. Please wait.
1
Benchmarking Anomaly-based Detection Systems Ashish Gupta Network Security May 2004
2
Overview The Motivation for this paper –Waldo example The approach Structure in data Generating the data and anomalies Injecting anomalies Results –Training and Testing: the method –Scoring –Presentation –The ROC curves: somewhat obvious
3
Motivation Does anomaly detection depend on regularity/randomness of data ?
4
Where’s Waldo !
7
The aim Hypothesis: –Differences in data regularity affect anomaly detection –Different environments different regularity Regularity –Highly redundant or random ? –Example of environment’s affect 010101010101010101010101 Or 0100011000101000100100101
8
Consequences One IDS : Different False Alarm Rates Need custom system/training for each environment ? Temporal affects: Regularity may vary over time ?
9
Structure in data Measuring randomness
10
010101010101010101010101 Or 0100011000101000100100101 Measuring Randomness Relative EntropySequential Dependence + Conditional Relative Entropy
11
The benchmark datasets Three types: –Training data ( the background data) –Anomalies –Testing data ( background + anomalies ) Generating the sequences –5 sets, each set 11 files ( for increasing regularity) –Each set different alphabet size –Alphabet size decides complexity
12
Anomaly Generation What’s a surprise ? –Different from the expected probability Types: –Juxta-positional : different arrangements of data 001001001001001001111 –Temporal Unexpected periodicities –Other types ?
13
Types in this paper Foreign symbol –AAABABBBABAB C BBABABBA Foreign n-gram –AAABABAABAABAAABB B BA Rare n-gram –AABBBABBBABBBABBBABBBABB AA
14
Injecting anomalies –Make sure not more than 0.24 %
15
The experiments The Hypothesis is true
16
The hypothesis: –Nature of “normal” background noise affects signal detection The anomaly detector –To detect anomalous subsequences –Learning phase n-gram probability table –Unexpected event anomaly ! –Anomaly threshold decides level of surprise
17
Example of anomaly detection AAA0.12 AAB0.13 ABA0.20 BAA0.17 BBB0.15 BBA0.12 AAC ANOMALY !
18
Scoring Event outcomes –Hits –Misses –False alarms Threshold –Decides level of surprise –0 completely unsurprising, 1 astonishing –Need to calibrate
19
Presentation of results Presents two aspects: –% correct detections –% false detections Detector operates through a range of sensitivities –Higher sensitivity ? –Need the right sensitivity
21
Interpretation Nothing overlaps regularity affects detection !
22
What does this mean ? Detection metrics are data dependent Cannot say: –My XYZ product will flag down 75% percent anomalies with 10% false hit rate ! –Sir, are you sure ?
23
Real world data Regularity index for system calls for different users
24
Is this surprising ? What about network traffic ?
25
Conclusions Data Structure Anomaly Detection Effectiveness Evaluation is data dependent
26
Conclusions Change in regularity Different system Or Change the parameters
27
Quirks ? Assumes rather naïve detection systems –“Simple retraining will not suffice” An intelligent detection can take this into account. What is really an anomaly ? –If data is highly irregular, won’t randomness produce some anomalies by itself Anomaly is a relative term –Here anomalies are generated independently
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.