Anomaly Detection in Data Science

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Aggregating local image descriptors into compact codes
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Machine learning continued Image source:
An Overview of Machine Learning
Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,
Locally Constraint Support Vector Clustering
Quaternion Colour Constancy
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Introduction to machine learning
Crash Course on Machine Learning
Anomaly detection Problem motivation Machine Learning.
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
This week: overview on pattern recognition (related to machine learning)
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Class Imbalance in Text Classification
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #23.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Anomaly Detection Nathan Dautenhahn CS 598 Class Lecture March 3, 2011.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Intrusion Detection using Deep Neural Networks
MATH-138 Elementary Statistics
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Ch8: Nonparametric Methods
Outlier Processing via L1-Principal Subspaces
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —
The Elements of Statistical Learning
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Central Tendency and Variability
Nonparametric Methods: Support Vector Machines
Machine Learning Basics
Overview of Supervised Learning
Outlier Discovery/Anomaly Detection
Descriptive Statistics
K Nearest Neighbor Classification
Predictive Learning from Data
Learning with information of features
CSSE463: Image Recognition Day 20
PixelGAN Autoencoders
A survey of network anomaly detection techniques
Multivariate Methods Berlin Chen
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Junheng, Shengming, Yunsheng 11/09/2018
Multivariate Methods Berlin Chen, 2005 References:
EM Algorithm and its Applications
Jia-Bin Huang Virginia Tech
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
Modeling IDS using hybrid intelligent systems
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Anomaly Detection in Data Science One-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia

Find the Panda

Anomaly Detection: Hadlum vs Hadlum The birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service Average human pregnancy period is 280 days (40 weeks) Statistically, 39 days is an outlier

An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980

Defining Anomaly Detection Digital representation vectors describing observations Mixture of “nominal” and “abnormal” points Anomaly points are generated by different generative process than the nominal points

Possible Settings in CS Supervised (Know attacks) Training data labeled with “nominal” or “anomaly” Clean (Zero-day attacks) Training data are all “nominal”, test data may be contaminated with “anomaly” Unsupervised (Unknown attacks) Training data consists of mixture of “nominal” and “anomaly” points

Real World Data Problems Data is multivariate There is usually more than one generating mechanism underlying the “normal” data Anomalies may represent a different class of objects, so there sre many of them Domain specific definition of what to count as anomaly Normality evaolves in time

Anomaly Taxonomy Point Anomaly

Anomaly Taxonomy Contextual Anomaly

Anomaly Taxonomy Causal Anomaly

Taxonomy

Imbalanced classification Normal data - a lot of samples Abnormal - very few Standard methods do not work as expected Standard metrics do not apply

Imbalanced classification Weights for classes Proved not to be helpful in most cases Resampling methods Oversampling (Bootstrap, SMOTE, etc.) Undersampling How to choose which method to use? How to choose resampling parameter? We compared several methods We proposed a meta-model that on average gives best results [Papanov, Erofeev, Burnaev, 2015]

Statistics-based models Assumption on normal data generation procedure (e.g. Gaussian distribution, etc.) PCA is a method commonly used to extract most variant combinations in data PCA based anomaly detection is good for highly correlated environments

Density-based models SVM-based and nearest neighbours based How to choose best kernel parameter?

One-class SVM with Privileged Information Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

One-Class SVM

One-Class SVM

One-Class SVM

One-Class SVM Kernel Trick

Kernel Trick

Hyper-parameter Influence

Decision Functions

Learning with Privileged Info Example: Image classification with textual description

Learning with Privileged Info

Learning with Privileged Info

Learning with Privileged Info

Microsoft Malware Classification Challenge Kaggle.com competition data (2015)

Problem Description 9 malware families Raw data Rumnit, Lollipop, Kelihos ver3, Vundo, Simda, Tracur, Kelihos ver1, Obfuscator.ACY, Gatak Raw data Hexadecimal representation of the raw binary content Meta-data extracted from the binaries, including function calls, strings, etc.

Features Original features Privileged features Information from binary files such as Frequencies of bytes Number of different N-grams, etc. Privileged features Information from code disassemble such as Frequencies of commands Number of calls to external dlls Bytecode as an image Features based on image texture which is commonly used for image classification

Features

Experimental Setup

Results

Thanks! Any questions? pavel.erofeev@phystech.edu