Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Applications of one-class classification
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Topic models Source: Topic models, David Blei, MLSS 09.
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
Supervised Learning Recap
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Overview Full Bayesian Learning MAP learning
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Assuming normally distributed data! Naïve Bayes Classifier.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Multiple-Instance Learning Paper 1: A Framework for Multiple-Instance Learning [Maron and Lozano-Perez, 1998] Paper 2: EM-DD: An Improved Multiple-Instance.
Computational Learning Theory
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Presented by Zeehasham Rasheed
Application Identification in Information-poor Environments Charalampos (Haris) Rotsos Computer Laboratory University of Cambridge
Scalable Text Mining with Sparse Generative Models
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Host Load Prediction in a Google Compute Cloud with a Bayesian Model Sheng Di 1, Derrick Kondo 1, Walfredo Cirne 2 1 INRIA 2 Google.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
A Brief Introduction to Graphical Models
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Bayesian Networks. Male brain wiring Female brain wiring.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Lightweight Application Classification for Network Management
Probabilistic Data Management
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
LECTURE 07: BAYESIAN ESTIMATION
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Discriminative Probabilistic Models for Relational Data
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Internet Traffic Classification Using Bayesian Analysis Techniques
Presentation transcript:

Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer Laboratory and Engineering Department, University of Cambridge

Traffic classification Traffic classification is the problem of defining the application class of a network flow by inspecting its packets. port-based  pattern match  statistical analysis. Useful in order to perform other network functions: Security: Fine grain access control, valuable dimension for analysis Network Management: network planning, QoS Performance measurement: Performance dependence on traffic class

Problem Space So far research focuses on packet-level measurement with good results. But no systems implementations, because…  Required measurements are difficult Focus on flow records. Existing research exhibit encouraging results.  Inflexible and generic models use modern ML techniques (Bayesian Modeling, Probabilistic graphical models) Develop a problem specific ML-model with well defined parameters Since records are sensitive to minor network changes, use semi- supervised learning

Outline Model Presentation Results Related work Further Development

Problem definition N flows extracted from a router each having M feauture. Each flow is represented by a vector x i that has set of features x ij with 0 < j ≤ M and 0< I ≤ N. Each flow has an application class c i. Assume that we have L flows labeled and U flow unlabeled with L+U = N. Define f(.) such as, If X i ∈ U, f( X i | C L, L) = c i Assume that flow records are generated without any sampling applied and x ij are independent.

Probabilistic Graphical Models Diagrammatic representations of probability distributions Directed acyclic graphs represent conditional dependence among R.V. Easy to perform inference Simple graph manipulation can give us complex distributions. Advantages: Modularity Iterative design Unifying framework P(a,b,c) = P(a) P(b | a) P(c | a,b)

φ is the parameter of the class distribution and θ kj is the parameter of the distribution of feature j for class k. Graph model similar to supervised Naïve Bayes Model. Assume θ kj ~ Dir(α θ ) and φ ~ Dir(α φ ). Use bayesian approach to calculate parameter distribution. Generative model

Semi supervised learning Hybrid approach of supervised and unsupervised learning Train using a labeled dataset and extend model by integrating newly labelled datapoints. Advantages: Reduced training dataset. Increased accuracy when the model is correct. Highly configurable when used with Bayesian modeling. Disadvantages Computationally complex.

Calculating parameter increases exponentially as new unlabled datapoint are added. Hard assignment: Add newly labelled datapoint to the Cx with the highest posterior probability. Soft assignment: update the posterior for each parameter according to the predicted weight of the datapoint. Define class using: Semi supervised graphical model

Outline Model Presentation Results Related work Further Development

Data 2 day trace from research facility [Li et al, Computer Networks 2009]. Appr. 6 million tcp flows. Ground-truth using GTVS tool. Netflow records exported using nProbe. Settings similar to a Tier-1 ISP. Model implemented in C#. Also used the Naïve Bayes with kernel estimation implementation from the Weka Platform. Feature set: srcIp/dstIPsrcPort/dstPortip tosstart/end time tcpFlagsbytes# packetstime length avg. packet sizebyte ratepacket ratetcpF* (uniq. flag)

Application statistics App% % % database4.3services0.03peer-to-peer11.47 mail2.5Spam filter0.48web72.33 ftp6.25streaming0.31vpn0.1 im0.6voip0.16Remote access0.61

Baseline comparison

Baseline comparison – Class accuracy

Dataset size

Model parameters

Outline Model Presentation Results Related work Further Development

Related work Lots of work on traffic classification using machine learning  Survey paper [Ngyen et al, IEEE CST 2008] and method comparison [Kim et al, Connext08]  Semi-supervised learning used on packet-level measurements in [Erman et al, Sigmetrics07]  Traffic classification using NetFlow data is quite recent  First attempt using a Naïve Bayes classifier introduced in [Jiang et al, INM07]  Approach to the problem using C4.5 classifier in [Carela-Espanol et al, Technical report 09]

Outline Model Presentation Results Related work Further Development

Further development Packet sampling Difficult problem – multi view points could simplify the problem Adapt model for host characterization problem Aggregate traffic on the host level and enrich data dimensions Incorporate graph level information in the model Computer networks bares similarities with social networks

Conclusion Flow records may be a good data primitive for traffic classification. Modeling using probabilistic graphical model is not very difficult. Semi supervised learning is an effective concept, but is not a one- solves-all solution. Our model achieves 5-10% better performance than generic classifier and exhibits a good stability in short scale. Bayesian modeling and graphical models allow easy integration of domain knowledge and adaptation to the requirements of the user. Model can be extended to achieve better results. Thank you!!!!

Dirichlet Process Continuous multivariate distribution Probability of the probability distribution of a set K rival value for a RV with a vector of parameters α. Conjugate prior of Multivariate distribution and multi dimension extension of the Beta distribution. The parameter α controls the mean shape and sparsity of θ. Symmetric Dirichlet Dir(α) is a case of Dirichlet distribution where α i = α

Dirichlet process (Taken from Wikipedia)