Anomaly Detection in Data Science

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,

Aggregating local image descriptors into compact codes

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:

Machine learning continued Image source:

An Overview of Machine Learning

Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,

Locally Constraint Support Vector Clustering

Quaternion Colour Constancy

Visual Recognition Tutorial

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

Introduction to machine learning

Crash Course on Machine Learning

Anomaly detection Problem motivation Machine Learning.

A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.

Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.

This week: overview on pattern recognition (related to machine learning)

Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.

Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.

1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Class Imbalance in Text Classification

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #23.

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Anomaly Detection Nathan Dautenhahn CS 598 Class Lecture March 3, 2011.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Intrusion Detection using Deep Neural Networks

MATH-138 Elementary Statistics

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

Ch8: Nonparametric Methods

Outlier Processing via L1-Principal Subspaces

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 12 —

The Elements of Statistical Learning

BotCatch: A Behavior and Signature Correlated Bot Detection Approach

Central Tendency and Variability

Nonparametric Methods: Support Vector Machines

Machine Learning Basics

Overview of Supervised Learning

Outlier Discovery/Anomaly Detection

Descriptive Statistics

K Nearest Neighbor Classification

Predictive Learning from Data

Learning with information of features

CSSE463: Image Recognition Day 20

PixelGAN Autoencoders

A survey of network anomaly detection techniques

Multivariate Methods Berlin Chen

Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.

Junheng, Shengming, Yunsheng 11/09/2018

Multivariate Methods Berlin Chen, 2005 References:

EM Algorithm and its Applications

Jia-Bin Huang Virginia Tech

Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun

Modeling IDS using hybrid intelligent systems

Outlines Introduction & Objectives Methodology & Workflow

Presentation transcript:

Anomaly Detection in Data Science One-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia

Find the Panda

Anomaly Detection: Hadlum vs Hadlum The birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service Average human pregnancy period is 280 days (40 weeks) Statistically, 39 days is an outlier

An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980

Defining Anomaly Detection Digital representation vectors describing observations Mixture of “nominal” and “abnormal” points Anomaly points are generated by different generative process than the nominal points

Possible Settings in CS Supervised (Know attacks) Training data labeled with “nominal” or “anomaly” Clean (Zero-day attacks) Training data are all “nominal”, test data may be contaminated with “anomaly” Unsupervised (Unknown attacks) Training data consists of mixture of “nominal” and “anomaly” points

Real World Data Problems Data is multivariate There is usually more than one generating mechanism underlying the “normal” data Anomalies may represent a different class of objects, so there sre many of them Domain specific definition of what to count as anomaly Normality evaolves in time

Anomaly Taxonomy Point Anomaly

Anomaly Taxonomy Contextual Anomaly

Anomaly Taxonomy Causal Anomaly

Taxonomy

Imbalanced classification Normal data - a lot of samples Abnormal - very few Standard methods do not work as expected Standard metrics do not apply

Imbalanced classification Weights for classes Proved not to be helpful in most cases Resampling methods Oversampling (Bootstrap, SMOTE, etc.) Undersampling How to choose which method to use? How to choose resampling parameter? We compared several methods We proposed a meta-model that on average gives best results [Papanov, Erofeev, Burnaev, 2015]

Statistics-based models Assumption on normal data generation procedure (e.g. Gaussian distribution, etc.) PCA is a method commonly used to extract most variant combinations in data PCA based anomaly detection is good for highly correlated environments

Density-based models SVM-based and nearest neighbours based How to choose best kernel parameter?

One-class SVM with Privileged Information Evgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

One-Class SVM

One-Class SVM

One-Class SVM

One-Class SVM Kernel Trick

Kernel Trick

Hyper-parameter Influence

Decision Functions

Learning with Privileged Info Example: Image classification with textual description

Learning with Privileged Info

Learning with Privileged Info

Learning with Privileged Info

Microsoft Malware Classification Challenge Kaggle.com competition data (2015)

Problem Description 9 malware families Raw data Rumnit, Lollipop, Kelihos ver3, Vundo, Simda, Tracur, Kelihos ver1, Obfuscator.ACY, Gatak Raw data Hexadecimal representation of the raw binary content Meta-data extracted from the binaries, including function calls, strings, etc.

Features Original features Privileged features Information from binary files such as Frequencies of bytes Number of different N-grams, etc. Privileged features Information from code disassemble such as Frequencies of commands Number of calls to external dlls Bytecode as an image Features based on image texture which is commonly used for image classification

Features

Experimental Setup

Results

Thanks! Any questions? pavel.erofeev@phystech.edu