Internet Traffic Classification Using Bayesian Analysis Techniques

Internet Traffic Classification Using Bayesian Analysis Techniques
Presentation by Umamaheswararao K

Overview Statistical Method Uses Supervised Machine learning
Uses only flow records Based on descriminators of the flows - port, inter-packet gap etc… Applies Naïve Bayesian techniques Reasonably high accuracy

Machine Learned Classification
Deterministic Approach Assigns data points to one of mutually exclusive classes Probabilistic Approach assigns the flow with probabilties of belonging to certain class - Current technique falls into this category

Probabilistic Approach:
Can Identify similar Characteristics of flows after their probabilistic class assignment Robust to measurement error Provides a mechanism for quantifying class assignment probabilities Available in many implementations

Terminology Objects: Entities to be classfied – here traffic-flows which is a tuple of src/dst IP, protocol, src/dst port Discriminators: Characteristics parameterizing the flow behaviour – flow duration, TCP port etc - Here only complete TCP connections are considered

Discriminators/Categories

Analysis Tools Naïve Bayesian Classifier

Bayes Tech: Contd.. Assumptions – Discriminators Independent
TCP header length proportional to pak len or vice versa Discriminator distribution is assumed to be normal (Gaussian) - Distribution can be multimodal

Example

Naïve Bayes: Kernel Estimation
Descriminator distribution is not Gaussian

Naïve Bayes vs Kernel

Descriminator selection
Remove Irrelevant descriminators Cannot differentiate the class Same distribution for all classes Remove Redundant descriminators highly correlated with another discriminator

Descriminator reduction:
Filter Uses characteristics of training data to see how relevant the descriminator to the class degree of correlation b/w discriminator & class Wrapper uses results of a classifier to build optimal set

FCBF Fast-correlation based filter for discriminator filtering
Two stage process Identifying the relevance of a discriminator Identifying the redundancy of a feature with respect to discriminators

Results

Results: contd.. Accuracy: Correctly classified flows/Total number of flows Trust: Probability that a flow that has been classified into some class in fact from this class

Naïve Bayes- Trust

Trust: Kernel est.

Results for new data set

Identification of discriminators

Strengths Payload access not needed High accuracy and Trust with FCBF
Easily implementable Single flow based (a strength and a weakness) Allows any categorization

Weaknesses Bunch of them but then …?
Accuracy/Trust depends mainly on how good the training set is Trust of some classes is really poor works on flow based, characterization some flows require to see many flows (eg. Attacks) Temporal stability is not really good Discriminators are dependent on network dynamics

Weaknesses: Contd… Training is not automatic
Assumes discriminator independence Gaussian distribution assumption inaccurate

Future Work A significantly new approach hence can lead to many ideas
Spatial independence of traffic classification Check from weaknesses section

Internet Traffic Classification Using Bayesian Analysis Techniques

Similar presentations

Presentation on theme: "Internet Traffic Classification Using Bayesian Analysis Techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Internet Traffic Classification Using Bayesian Analysis Techniques

Similar presentations

Presentation on theme: "Internet Traffic Classification Using Bayesian Analysis Techniques"— Presentation transcript:

Similar presentations

About project

Feedback