Download presentation
1
Anomaly Detection in Data Science
One-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia
2
Find the Panda
3
Anomaly Detection: Hadlum vs Hadlum
The birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service Average human pregnancy period is 280 days (40 weeks) Statistically, 39 days is an outlier
4
An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980
5
Defining Anomaly Detection
Digital representation vectors describing observations Mixture of “nominal” and “abnormal” points Anomaly points are generated by different generative process than the nominal points
6
Possible Settings in CS
Supervised (Know attacks) Training data labeled with “nominal” or “anomaly” Clean (Zero-day attacks) Training data are all “nominal”, test data may be contaminated with “anomaly” Unsupervised (Unknown attacks) Training data consists of mixture of “nominal” and “anomaly” points
7
Real World Data Problems
Data is multivariate There is usually more than one generating mechanism underlying the “normal” data Anomalies may represent a different class of objects, so there sre many of them Domain specific definition of what to count as anomaly Normality evaolves in time
8
Anomaly Taxonomy Point Anomaly
9
Anomaly Taxonomy Contextual Anomaly
10
Anomaly Taxonomy Causal Anomaly
11
Taxonomy
12
Imbalanced classification
Normal data - a lot of samples Abnormal - very few Standard methods do not work as expected Standard metrics do not apply
13
Imbalanced classification
Weights for classes Proved not to be helpful in most cases Resampling methods Oversampling (Bootstrap, SMOTE, etc.) Undersampling How to choose which method to use? How to choose resampling parameter? We compared several methods We proposed a meta-model that on average gives best results [Papanov, Erofeev, Burnaev, 2015]
14
Statistics-based models
Assumption on normal data generation procedure (e.g. Gaussian distribution, etc.) PCA is a method commonly used to extract most variant combinations in data PCA based anomaly detection is good for highly correlated environments
15
Density-based models SVM-based and nearest neighbours based
How to choose best kernel parameter?
16
One-class SVM with Privileged Information
Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS
17
One-Class SVM
18
One-Class SVM
19
One-Class SVM
20
One-Class SVM Kernel Trick
21
Kernel Trick
22
Hyper-parameter Influence
23
Decision Functions
24
Learning with Privileged Info
Example: Image classification with textual description
25
Learning with Privileged Info
26
Learning with Privileged Info
27
Learning with Privileged Info
28
Microsoft Malware Classification Challenge
Kaggle.com competition data (2015)
29
Problem Description 9 malware families Raw data
Rumnit, Lollipop, Kelihos ver3, Vundo, Simda, Tracur, Kelihos ver1, Obfuscator.ACY, Gatak Raw data Hexadecimal representation of the raw binary content Meta-data extracted from the binaries, including function calls, strings, etc.
30
Features Original features Privileged features
Information from binary files such as Frequencies of bytes Number of different N-grams, etc. Privileged features Information from code disassemble such as Frequencies of commands Number of calls to external dlls Bytecode as an image Features based on image texture which is commonly used for image classification
31
Features
32
Experimental Setup
33
Results
34
Thanks! Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.