Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Florida International University COP 4770 Introduction of Weka.
UC Berkeley Online System Problem Detection by Mining Console Logs Wei Xu* Ling Huang † Armando Fox* David Patterson* Michael Jordan* *UC Berkeley † Intel.
Indian Statistical Institute Kolkata
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Cyber Threat Analysis  Intrusions are actions that attempt to bypass security mechanisms of computer systems  Intrusions are caused by:  Attackers accessing.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Classification and risk prediction
Intrusion Detection Systems and Practices
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Adaboost and its application
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
School of Computer Science and Information Systems
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
Anomaly detection Problem motivation Machine Learning.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
1 Intrusion Detection Systems. 2 Intrusion Detection Intrusion is any use or attempted use of a system that exceeds authentication limits Intrusions are.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
This week: overview on pattern recognition (related to machine learning)
Active Learning for Class Imbalance Problem
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
IIT Indore © Neminah Hubballi
Data mining and machine learning A brief introduction.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao, UC Santa Barbara, Usenix Security.
An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.
Network and Perimeter Security Paula Kiernan Senior Consultant Ward Solutions.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
1 HoneyNets. 2 Introduction Definition of a Honeynet Concept of Data Capture and Data Control Generation I vs. Generation II Honeynets Description of.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
Post-Ranking query suggestion by diversifying search Chao Wang.
Venus Project Brief Description. What It Do What Monitor Log Analyze Block Narrow Report Search Where Single stations Internet Gates Special Devices Web.
I NTRUSION P REVENTION S YSTEM (IPS). O UTLINE Introduction Objectives IPS’s Detection methods Classifications IPS vs. IDS IPS vs. Firewall.
NTU & MSRA Ming-Feng Tsai
Anomaly Detection in GPS Data Based on Visual Analytics Kyung Min Su - Zicheng Liao, Yizhou Yu, and Baoquan Chen, Anomaly Detection in GPS Data Based on.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Introduction to Azure Machine Learning and Data Mining algorithms Oleksandr Krakovetskyi CEO, DevRain Solutions PhD, Microsoft Regional
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Machine Learning – Classification David Fenyő
An Empirical Comparison of Supervised Learning Algorithms
(A CORPORATE NETWORK APPROACH)
COMP61011 : Machine Learning Ensemble Models
An Enhanced Support Vector Machine Model for Intrusion Detection
Transfer Learning: Analyst-Sourcing Behavioral Classification
Inside Job: Applying Traffic Analysis to Measure Tor from Within
Features & Decision regions
Students: Meiling He Advisor: Prof. Brain Armstrong
A survey of network anomaly detection techniques
100+ Machine Learning Models running live: The approach
Jia-Bin Huang Virginia Tech
Semi-Supervised Learning
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active Learning for Statistical Intrusion Detection NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Motivation  Metadata of Microsoft’s external internet traffic is logged using ISA Server Firewall  ISA – Internet Security and Acceleration  Up to 35 million log entries per day  Security analysts must search for and identify new anomalies  Looking for new malware, bad PTP, etc.  Can machine learning help? NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Active Learning  Human interactively provides labels for new sample  Network traffic metadata logged to SQL  ALADIN evaluates and ranks samples  Security Analyst labels samples  ALADIN reranks samples and repeats NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

ALADIN  Multiclass classifier for monitoring network traffic  Goal: Minimize analyst labeling time  Weights can be adaptively improved at user’s site 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security

Choosing Samples for Labeling – Active Anomaly Detection  Label only anomalies (Pelleg, Moore, NIPS04)  Discover rare and interesting classes  Multiclass model  Avoid “Normal” vs. “Not Normal” problem  Leads to high error rates 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security

Choosing Samples for Labeling – Active Learning  Label only samples closest to the decision boundary (Almgren, Jonsson, CSFW04)  RBF SVM  Ignore samples located away from the decision boundaries  May not find new classes 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security

ALADIN: Combines Active Anomaly Detection and Active Learning NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Classification Stage 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security  Discriminative Learning, Logistic Regression  Minimize cross entropy function  Uncertainty Score  Fast computation for interactive labeling  Scales well

Modeling Stage 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security  naïve Bayes Model  Training Data  labeled data  predicted labels of the unlabeled data  Anomaly Score  Fast computation for interactive labeling  Scales well

Network Intrusion Detection Results  KDD-Cup 99 Data Set  Provides Oracle Labels  100K Samples  Use All Features in the Data  Label 10 Initial Samples Randomly  100 Samples Labeled per Iteration NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Results – Anomaly Detection 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security

Results – Prediction Accuracy 12/8/2007NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security

FP/FN Per Class True Label Num Labeled Samples True Predicted Label TP Count Incorrectly Predicted Label FN CountFP RateFN Rate normal551normal55715satan34.12%0.20% guess_passwd10 ipsweep67 back2 neptune57neptune % smurf82smurf18904normal70.00%0.04% back36back5normal %99.75% ipsweep58ipsweep675normal270.07%3.85% satan49satan470normal200.00%4.08% portsweep54portsweep223normal10.00%0.45% NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Malware Detection on Microsoft Network Logs  Analyzed several daily log files.  Identified “5.exe” on the corporate network which was not previously identified  Trojan.Esteems.D. 5.exe monitors user Internet activity and private information. It sends stolen data to a hacker site.  Identified several other worms (NewApt Worm, Win32.Bropia.T, W32.MyDoom.B), and keyloggers (svchqs.exe)  All of which were currently logged  Some waiting to be labeled  All currently blocked by ISA firewall rules NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Conclusions  ALADIN discovers rare and interesting classes  ALADIN maintains low classification error  Scales due to fast learning with logistic regression and naïve Bayes  Identifies network intrusion attacks  Identifies malware via network traffic patterns  Tech Report: NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007

Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active Learning for Statistical Intrusion Detection NIPS Workshop 2007 – Machine Learning in Adversarial Environments for Computer Security12/8/2007