Anomaly detection Problem motivation Machine Learning.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Linear Regression.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Machine learning continued Image source:
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Indian Statistical Institute Kolkata
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
Locally Constraint Support Vector Clustering
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Evaluation.
Decision Theory Naïve Bayes ROC Curves
Part I: Classification and Bayesian Learning
Ensemble Learning (2), Tree and Forest
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Introduction to machine learning
Evaluating Classifiers
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Gaussian process modelling
Anomaly detection with Bayesian networks Website: John Sandiford.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Machine learning system design Prioritizing what to work on
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Classification Vikram Pudi IIIT Hyderabad.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Support Vector Machines Optimization objective Machine Learning.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Experience Report: System Log Analysis for Anomaly Detection
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Evaluating Classifiers
Ch8: Nonparametric Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Data Mining 101 with Scikit-Learn
Linear regression project
Machine Learning Feature Creation and Selection
Baselining PMU Data to Find Patterns and Anomalies
A survey of network anomaly detection techniques
Text Categorization Rong Jin.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Learning Algorithm Evaluation
Pattern Recognition and Machine Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Machine Learning Algorithms – An Overview
Evaluating Classifiers
Machine Learning – a Probabilistic Perspective
Jia-Bin Huang Virginia Tech
Using Clustering to Make Prediction Intervals For Neural Networks
Lecture 16. Classification (II): Practical Considerations
Jia-Bin Huang Virginia Tech
Presentation transcript:

Anomaly detection Problem motivation Machine Learning

Andrew Ng Anomaly detection example Aircraft engine features: = heat generated = vibration intensity … (vibration) (heat) Dataset: New engine:

Andrew Ng Density estimation Dataset: Is anomalous? (vibration) (heat)

Andrew Ng Anomaly detection example Fraud detection: = features of user ’s activities Model from data. Identify unusual users by checking which have Manufacturing Monitoring computers in a data center. = features of machine = memory use, = number of disk accesses/sec, = CPU load, = CPU load/network traffic. …

Anomaly detection Gaussian distribution Machine Learning

Andrew Ng Gaussian (Normal) distribution Say. If is a distributed Gaussian with mean, variance.

Andrew Ng Gaussian distribution example

Andrew Ng Parameter estimation Dataset:

Anomaly detection Algorithm Machine Learning

Andrew Ng Density estimation Training set: Each example is

Andrew Ng Anomaly detection algorithm 1.Choose features that you think might be indicative of anomalous examples. 2.Fit parameters 3.Given new example, compute : Anomaly if

Andrew Ng Anomaly detection example

Anomaly detection Developing and evaluating an anomaly detection system Machine Learning

Andrew Ng When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm. The importance of real-number evaluation Assume we have some labeled data, of anomalous and non- anomalous examples. ( if normal, if anomalous). Training set: (assume normal examples/not anomalous) Cross validation set: Test set:

Andrew Ng good (normal) engines 20 flawed engines (anomalous) Aircraft engines motivating example Training set: 6000 good engines CV: 2000 good engines ( ), 10 anomalous ( ) Test: 2000 good engines ( ), 10 anomalous ( ) Alternative: Training set: 6000 good engines CV: 4000 good engines ( ), 10 anomalous ( ) Test: 4000 good engines ( ), 10 anomalous ( )

Andrew Ng Fit model on training set On a cross validation/test example, predict Algorithm evaluation Possible evaluation metrics: - True positive, false positive, false negative, true negative - Precision/Recall - F 1 -score Can also use cross validation set to choose parameter

Anomaly detection Anomaly detection vs. supervised learning Machine Learning

Andrew Ng Anomaly detectionSupervised learningvs. Very small number of positive examples ( ). (0-20 is common). Large number of negative ( ) examples. Many different “types” of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look nothing like any of the anomalous examples we’ve seen so far. Large number of positive and negative examples. Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set.

Andrew Ng Anomaly detectionSupervised learningvs. Fraud detection Manufacturing (e.g. aircraft engines) Monitoring machines in a data center spam classification Weather prediction (sunny/rainy/etc). Cancer classification

Anomaly detection Choosing what features to use Machine Learning

Non-gaussian features

Error analysis for anomaly detection Wantlarge for normal examples. small for anomalous examples. Most common problem: is comparable (say, both large) for normal and anomalous examples

Monitoring computers in a data center Choose features that might take on unusually large or small values in the event of an anomaly. = memory use of computer = number of disk accesses/sec = CPU load = network traffic

Anomaly detection Multivariate Gaussian distribution Machine Learning

Andrew Ng Motivating example: Monitoring machines in a data center (CPU Load) (Memory Use)

Andrew Ng Multivariate Gaussian (Normal) distribution. Don’t model etc. separately. Model all in one go. Parameters:(covariance matrix)

Andrew Ng Multivariate Gaussian (Normal) examples

Andrew Ng Multivariate Gaussian (Normal) examples

Andrew Ng Multivariate Gaussian (Normal) examples

Andrew Ng Multivariate Gaussian (Normal) examples

Andrew Ng Multivariate Gaussian (Normal) examples

Andrew Ng Multivariate Gaussian (Normal) examples

Anomaly detection Anomaly detection using the multivariate Gaussian distribution Machine Learning

Andrew Ng Multivariate Gaussian (Normal) distribution Parameters Parameter fitting: Given training set

Andrew Ng 2. Given a new example, compute Flag an anomaly if Anomaly detection with the multivariate Gaussian 1. Fit model by setting

Andrew Ng Relationship to original model Original model: Corresponds to multivariate Gaussian where

Andrew Ng Original modelMultivariate Gaussianvs. Manually create features to capture anomalies where take unusual combinations of values. Automatically captures correlations between features Computationally cheaper (alternatively, scales better to large ) Computationally more expensive OK even if (training set size) is small Must have, or else is non-invertible.