Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anomaly Detection in Data Science

Similar presentations


Presentation on theme: "Anomaly Detection in Data Science"— Presentation transcript:

1 Anomaly Detection in Data Science
One-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia

2 Find the Panda

3 Anomaly Detection: Hadlum vs Hadlum
The birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service Average human pregnancy period is 280 days (40 weeks) Statistically, 39 days is an outlier

4 An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980

5 Defining Anomaly Detection
Digital representation vectors describing observations Mixture of “nominal” and “abnormal” points Anomaly points are generated by different generative process than the nominal points

6 Possible Settings in CS
Supervised (Know attacks) Training data labeled with “nominal” or “anomaly” Clean (Zero-day attacks) Training data are all “nominal”, test data may be contaminated with “anomaly” Unsupervised (Unknown attacks) Training data consists of mixture of “nominal” and “anomaly” points

7 Real World Data Problems
Data is multivariate There is usually more than one generating mechanism underlying the “normal” data Anomalies may represent a different class of objects, so there sre many of them Domain specific definition of what to count as anomaly Normality evaolves in time

8 Anomaly Taxonomy Point Anomaly

9 Anomaly Taxonomy Contextual Anomaly

10 Anomaly Taxonomy Causal Anomaly

11 Taxonomy

12 Imbalanced classification
Normal data - a lot of samples Abnormal - very few Standard methods do not work as expected Standard metrics do not apply

13 Imbalanced classification
Weights for classes Proved not to be helpful in most cases Resampling methods Oversampling (Bootstrap, SMOTE, etc.) Undersampling How to choose which method to use? How to choose resampling parameter? We compared several methods We proposed a meta-model that on average gives best results [Papanov, Erofeev, Burnaev, 2015]

14 Statistics-based models
Assumption on normal data generation procedure (e.g. Gaussian distribution, etc.) PCA is a method commonly used to extract most variant combinations in data PCA based anomaly detection is good for highly correlated environments

15 Density-based models SVM-based and nearest neighbours based
How to choose best kernel parameter?

16 One-class SVM with Privileged Information
Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

17 One-Class SVM

18 One-Class SVM

19 One-Class SVM

20 One-Class SVM Kernel Trick

21 Kernel Trick

22 Hyper-parameter Influence

23 Decision Functions

24 Learning with Privileged Info
Example: Image classification with textual description

25 Learning with Privileged Info

26 Learning with Privileged Info

27 Learning with Privileged Info

28 Microsoft Malware Classification Challenge
Kaggle.com competition data (2015)

29 Problem Description 9 malware families Raw data
Rumnit, Lollipop, Kelihos ver3, Vundo, Simda, Tracur, Kelihos ver1, Obfuscator.ACY, Gatak Raw data Hexadecimal representation of the raw binary content Meta-data extracted from the binaries, including function calls, strings, etc.

30 Features Original features Privileged features
Information from binary files such as Frequencies of bytes Number of different N-grams, etc. Privileged features Information from code disassemble such as Frequencies of commands Number of calls to external dlls Bytecode as an image Features based on image texture which is commonly used for image classification

31 Features

32 Experimental Setup

33 Results

34 Thanks! Any questions?


Download ppt "Anomaly Detection in Data Science"

Similar presentations


Ads by Google