Intrusion Detection using Sequences of System Calls By S. Hofmeyr & S. Forrest
Overview zFocus: privileged processes zDiscriminator: system call sequences zBuilding a database: defining “normal” zDetecting anomalies: how to measure zResults: promising numbers zConcerns: remaining doubts zExtensions of research: Jones, Li & Lin
Inspiration zHuman immune system zRecognition of self zRejection of nonself zHow would we describe “self” for a software system, or a program?
Focus and Motivation zFocus on privileged processes yExploitation can give a user root access yThey provide a natural boundary xe.g. telnet daemon, login daemon yPrivileged processes are easier to track xSpecific, limited function xStable over time xContrast with the diversity of user actions
Where do we look? zNeed to distinguish when: yPrivileged process runs normally yPrivileged process exhibits an anomaly zThe discriminator is the observable entity used to distinguish between these two zUse sequences of system calls as the discriminator, the signature
How much detail? zDiscriminator is sequences of system calls ySimple temporal ordering is chosen yIgnore parameters yIgnore specific timing information yIgnore everything else! zWhy? As much as possible, work with simple assumptions zIs it “enough”?
Is it enough detail? zDoes the discriminator include enough detail for this hypothesis to hold? yAnswer seems to be yes ! zExtra complication: due to the variability in configuration and use of individual systems, the set of “normal” sequences of system calls will be different on different systems
Design Decisions zRemember temporal ordering of calls yNot total sequence, but sequences of length k zWhat size should k be? yLong enough to detect anomalies, short as possible yEmpirical observation: length 6 to 10 is sufficient zSo “self” is a database of (unordered) short call sequences
Building the “normal” database zSynthetic yAssurance that the normal database contains no intrusions; reproducible yBut does not reflect any particular real user activity zActual use yNecessary to generate from actual use in order to have a unique “self” yHow long to accumulate? Is it clean?
The normal database zDatabase of normal sequences does not contain all legal sequences yIf it did, anomalies would not be detected ySome rare sequences will not be used during database initialization zDatabase is stored as a forest to save space
Signature Database Structure (length 3) fopenfreadstrcmp freadstrcmp fopen strcmpfopenfread fopenfreadstrcmp fopenfreadstrcmp fopen fread strcmp fread strcmp fopen strcmp fopen fread
Derive Robust Signature Database
Detecting anomalies zA call sequence not in the database is an anomalous sequence zStrength of that anomalous sequence is measured by “Hamming distance” to the closest normal sequence (called d min ) zAny call trace with an anomalous sequence is an anomalous trace
Detecting anomalies zStrength of an anomalous trace is the maximum d min of the trace normalized for the value of k (length of sequences in the database): yŜ A = max{d min values for the trace} / k yValue is between 0 and 1 zBy adjusting the threshold value for Ŝ A, false positives can be reduced
Efficiency zComplexity of computing d min yO(k(R A N + 1)) xk is sequence length, R A is ratio of anomalous to normal sequences, N is the number of sequences in the database zd min is calculated after every system call yThe constant associated with this algorithm is very important yNot yet running in real time
Results (synthetic) zSanity test: If different programs are not distinguishable, anomalies within one program will certainly not be either zEasy to distinguish between programs; mismatches on well more than 50% of the instruction sequences (and Ŝ A >= 0.6) zAll intrusions (both attempted & successful) produced anomalies of varying strengths
Results (real environment) zThe conjecture of unique normal databases yExperiments in two configurations (at UNM and MIT) had very different databases for the same program (lpr) yIs this typical?
Closing concerns zFalse positives vs false negatives yIf forced to choose, UNM prefers to have false negatives because layering can mitigate zSaw 1 per 100 print jobs (lpr) yDue to system problems zIs Ŝ A a good measure? yIt could help generate false positives ySingle extra system call might make Ŝ A = 0.5
Annex Material Some UVa experiments S. Li, Y. Lin, and A. Jones
Signature Length Has Little Effect zIllustrated by two attacks on Apache zVaried sequence length from 2 to 30 zWe chose length 10 to have margin of error
Effectiveness: Buffer Overflow zSuccessfully detected buffer overflow attacks against wu-ftpd zWork well because attacker code adds new sequences of library calls #Mismatch es %Mismatc hes Normalized Anomaly Signal Stack Overwrite Realpath Vulnerability High normalized anomaly signals indicate attacks
Effectiveness: Denial of Service zSimulated DOS attack that uses up all available memory zAs attack progresses, library calls requesting memory return abnormally and are re-issued zDOS attack caused application to invoke new library call, fsync Program - vi#Mismat ches %Mismat ches Normalized Anomaly Signal Normal Run000 DOS Attack No intrusion detected High normalized anomaly signal indicates attack