Download presentation
Presentation is loading. Please wait.
Published byMarian Riley Modified over 8 years ago
1
Kernel Based Anomaly Detection Andrew Arnold (aoa5) 2nd Annual Project Student Day Columbia University -- 4/26/01 Intrusion Detection Systems -- IDS Machine Learning Group -- ML
2
Goal If feasible, implement a Kernel Based SVM to automatically separate IDS data into normal or anomalous distributions. Use these machines as rule base for other modules
3
What is an Anomaly? Greek: uneven: probably from an-, not; + homalos, even –Dictionary.com In our domain, Two Definitions: Semantic: Something that we don’t want to happen, and want to know about when it does. Abnormal; bad Literal: A statistical outlier
4
What Is a Kernel? Basically a Function, mapping one data space to a higher-dimensional one. Data points that may not be separable in n dimensions might be separable in higher, or even infinite, dimensionality. Different Kernels for different data/ domains – still an emerging field, lots of trial and error. No hard-fast rules
5
Why Use a Kernel? Data spaces are complex and dynamic New attacks/network configurations cannot be constantly updated and hard- coded Need more efficient and robust method of differentiating Normal behavior from Anomalies
6
HOW Use Kernel to “spread” training-data out so trends and relationships can be more easily seen Once data is sufficiently “flat,” various algorithms can separate them into concentrations, densities, clusters These clusters are our normal states
7
How (cont.) Separate our data into two regions: Normal (within bounds) Anomalous (outliers) Those data points that define the boundary between the normal and the anomalous are called support vectors, thus Support Vector Machines. These “define” normalcy.
8
Sample SVM
9
Current Status / Plan of Action Continue reading papers, books, attending ML meetings Build some available SVM packages, try to replicate papers’ results Expand scope to IDS Train on IDS data, review/share findings, modify as necessary. Repeat.
10
Resources “Estimating the Support of a High- Dimensional Distribution” – MSRD (Recognizing numbers in USPS data) Our very own Machine Learning group http://www.kernel-machines.org/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.