Anomaly Detection in Data Science

Name: Anomaly Detection in Data Science
Uploaded: 2018-01-15T04:30:38+00:00
Duration: PTM6S2
Channel: Luke Hines
Description: Anomaly Detection in Data Science

Anomaly Detection in Data Science
One-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia

Find the Panda

Anomaly Detection: Hadlum vs Hadlum
The birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service Average human pregnancy period is 280 days (40 weeks) Statistically, 39 days is an outlier

An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980

Defining Anomaly Detection
Digital representation vectors describing observations Mixture of “nominal” and “abnormal” points Anomaly points are generated by different generative process than the nominal points

Possible Settings in CS
Supervised (Know attacks) Training data labeled with “nominal” or “anomaly” Clean (Zero-day attacks) Training data are all “nominal”, test data may be contaminated with “anomaly” Unsupervised (Unknown attacks) Training data consists of mixture of “nominal” and “anomaly” points

Real World Data Problems
Data is multivariate There is usually more than one generating mechanism underlying the “normal” data Anomalies may represent a different class of objects, so there sre many of them Domain specific definition of what to count as anomaly Normality evaolves in time

Anomaly Taxonomy Point Anomaly

Anomaly Taxonomy Contextual Anomaly

Anomaly Taxonomy Causal Anomaly

Taxonomy

Imbalanced classification
Normal data - a lot of samples Abnormal - very few Standard methods do not work as expected Standard metrics do not apply

Imbalanced classification
Weights for classes Proved not to be helpful in most cases Resampling methods Oversampling (Bootstrap, SMOTE, etc.) Undersampling How to choose which method to use? How to choose resampling parameter? We compared several methods We proposed a meta-model that on average gives best results [Papanov, Erofeev, Burnaev, 2015]

Statistics-based models
Assumption on normal data generation procedure (e.g. Gaussian distribution, etc.) PCA is a method commonly used to extract most variant combinations in data PCA based anomaly detection is good for highly correlated environments

Density-based models SVM-based and nearest neighbours based
How to choose best kernel parameter?

One-class SVM with Privileged Information
Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

One-Class SVM

One-Class SVM Kernel Trick

Kernel Trick

Hyper-parameter Influence

Decision Functions

Learning with Privileged Info
Example: Image classification with textual description

Learning with Privileged Info

Microsoft Malware Classification Challenge
Kaggle.com competition data (2015)

Problem Description 9 malware families Raw data
Rumnit, Lollipop, Kelihos ver3, Vundo, Simda, Tracur, Kelihos ver1, Obfuscator.ACY, Gatak Raw data Hexadecimal representation of the raw binary content Meta-data extracted from the binaries, including function calls, strings, etc.

Features Original features Privileged features
Information from binary files such as Frequencies of bytes Number of different N-grams, etc. Privileged features Information from code disassemble such as Frequencies of commands Number of calls to external dlls Bytecode as an image Features based on image texture which is commonly used for image classification

Features

Experimental Setup

Results

Thanks! Any questions?

Anomaly Detection in Data Science

Similar presentations

Presentation on theme: "Anomaly Detection in Data Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Anomaly Detection in Data Science

Similar presentations

Presentation on theme: "Anomaly Detection in Data Science"— Presentation transcript:

Similar presentations

About project

Feedback