Download presentation
Presentation is loading. Please wait.
Published byJeffery Lambert Modified over 9 years ago
1
DPNM, POSTECH 1/23 NOMS 2010 Jae Yoon Chung 1, Byungchul Park 1, Young J. Won 1 John Strassner 2, and James W. Hong 1, 2 {dejavu94, fates, yjwon, johns, jwkhong}@postech.ac.kr April 20, 2010 1 Dept. of Computer Science and Engineering, POSTECH, Korea 2 Division of IT Convergence Engineering, POSTECH, Korea An Effective Similarity Metric for Application Traffic Classification
2
DPNM, POSTECH 2/23 NOMS 2010 Contents Introduction Related Work Research Goal Proposed Methodology Evaluation Conclusion and Future Work
3
DPNM, POSTECH 3/23 NOMS 2010 Introduction Traffic classification for network management Network planning QoS management Security Etc. Diversity of today’s Internet traffic New types of network applications Increase of P2P traffic Various techniques for avoiding detection Document classification Traffic classification Document classification in natural language processing Comparing packet payload vectors is analogous to document classification
4
DPNM, POSTECH 4/23 NOMS 2010 Related Work Well-known port-based classification Low complexity Low accuracy (approximately 50~70%) Signature-based classification High reliability Exhaustive tasks for searching signatures E.g.) Snort, LASER Behavior-based classification Focusing on traffic patterns and connection behaviors Questionable accuracy E.g.) BLINC Machine Learning-based classification Utilize statistical information A huge computing resource consumption E.g.) SVM, Bayesian Network Similarity-based classification Utilize document classification approach Questionable scalability E.g.) Flow similarity calculation [IPOM ‘09]
5
DPNM, POSTECH 5/23 NOMS 2010 Summary of IPOM 2009 Proposed new traffic classification approach Utilize document classification approach using Cosine similarity calculation Propose new packet representation using Vector Space Model Propose flow similarity calculation methodology which is to compare packets in flow sequentially Methodology validation using real-world traffic on our campus backbone network Cannot classify flows in asymmetric routing environment No comparison of Cosine similarity and other similarity metrics Cosine similarity that is common similarity metric for human- document classification High variation of similarity value according to term-frequency
6
DPNM, POSTECH 6/23 NOMS 2010 Research Goals Propose new traffic classification algorithm Automation of signature generation step Generate application vector, which is an alternative signature, using simple vector operation Make groups according to traffic type and operation within single- application traffic Accurate and feasible traffic classification algorithm Classify application traffic using similarity calculation Solve asymmetric routing classification problem Validation using real-world network traffic to compare similarity metrics Complexity analysis Compare three similarity metrics for traffic classification Jaccard similarity – counting fragment of signature Cosine similarity – high weighting scheme for signature RBF similarity – Euclidean distance between packets
7
DPNM, POSTECH 7/23 NOMS 2010 Proposed Methodology
8
DPNM, POSTECH 8/23 NOMS 2010 Vector Space Modeling Vector Space Modeling An algebraic model representing text documents as vectors Widely used to document classification Categorize electronic document based on its content (e.g. E-mail spam filtering) Document classification vs. Traffic classification Document classification Find documents from stored text documents which satisfy certain information queries Traffic classification Classify network traffic according to the type of application based on traffic information
9
DPNM, POSTECH 9/23 NOMS 2010 Payload Vector Conversion (1/2) Definition of word in payload Payload data within an i-bytes sliding window |Word set| = 2 (8*sliding window size) Definition of payload vector A term-frequency vector in NLP Payload Vector = [w 1 w 2 … w n ] T
10
DPNM, POSTECH 10/23 NOMS 2010 Payload Vector Conversion (2/2) Word The word size is 2 and the word set size is 2 16 –The simplest case for representing the order of content in payloads
11
DPNM, POSTECH 11/23 NOMS 2010 Similarity Metrics for Traffic Classification Jaccard similarity The size of the intersection of the sample sets X and Y divided by the size of the union of the sample sets X and Y Cosine similarity Two vectors X and Y of n dimensions by fining the cosine angle between them RBF similarity Radius based function of Euclidean distance between two vectors X and Y
12
DPNM, POSTECH 12/23 NOMS 2010 Application Vector Heuristics Application vector Represent typical packets that are generated by target applications as the center (basis) of each cluster Application vector generator Read packets from the target application trace Divide the packets into several types of clusters without any pre- processing Application vector generator Application trace Application vector 1 Application vector 2 Application vector 3 Traffic cluster 1 Traffic cluster 2
13
DPNM, POSTECH 13/23 NOMS 2010 Application Vector Generation Unsupervised grouping within single-application traffic Provide fine-grained classification Classify single-application traffic according to traffic types packet6 packet5 packet4 packet3 packet2 packet1 Application vector 1 Application vector 2 Application Traffic Cluster 1 Cluster 2
14
DPNM, POSTECH 14/23 NOMS 2010 Two-stage Traffic Classification Packet level clustering Classify signal packets regardless of flow information Compare payload vectors with application vectors by calculating similarity value Mark on each packet with its application and priority Allow the permutation of packet sequence Flow level classification Rearrange packets according to flow information Ignore mis-clustered packets that are caused by protocol ambiguities HTTP for Web HTTP for P2P
15
DPNM, POSTECH 15/23 NOMS 2010 Two-stage Traffic Classification Flow 2Flow 1 Cluster 3 Cluster 2 Cluster 1 F2 P2 F2 P3 F2 P1 F2 P4 F1 P1 F1 P2 F1 P4 F1 P3 F1 P2 F1 P4 F1 P3 F1 P1 F2 P2 F2 P3 F2 P1 F2 P4 Application Vector 1 Application Vector 2 Application Vector 3 F1 P2 F1 P4 F1 P3 F1 P1 F2 P2 F2 P1 F2 P4 F2 P3 Stage 1Stage 2BackboneTraffic BitTorrent Traffic FileGuri Traffic BitTorrent FileGuri Melon BitTorrentFileGuri Mis- clustered
16
DPNM, POSTECH 16/23 NOMS 2010 Evaluation
17
DPNM, POSTECH 17/23 NOMS 2010 Classifying Real-world Traffic Fix-port Applications Traffic trace on one of two Internet junctions at POSTECH using optical tap Ground-truth traffic Some active flows among application traffic distinguished by usage of active port number Target Applications FileGuri, ClubBox, Melon, BigFile Untraceable-port Applications Traffic Measurement Agent (TMA) Monitoring the network interface of the host Recording log data (five-flow tuples, process name, packet count, etc) Target Applications eMule, BitTorrent Backbone Traffic Target Application Traffic Ground-truth Traffic Target Application Traffic Ground-truth Traffic
18
DPNM, POSTECH 18/23 NOMS 2010 Classification Accuracy Classification accuracy comparison Fixed-port application FileGuri, ClubBox, Melon, BigFile Untraceable-port application eMule, BitTorrent Jaccard similarity Reliable – count common segment Cosine similarity Emphasize common segment – cannot distinguish ambiguous packets RBF similarity Difficulty of setting parameter – need guideline how to set parameter BitTorrent traffic on Backbone network Traffic over-classification by Cosine similarity High false positive rate of Cosine similarity
19
DPNM, POSTECH 19/23 NOMS 2010 Histogram of Similarity Values
20
DPNM, POSTECH 20/23 NOMS 2010 CDF of Distance among Payload Vectors
21
DPNM, POSTECH 21/23 NOMS 2010 Complexity Analysis
22
DPNM, POSTECH 22/23 NOMS 2010 Conclusion and Future Work Develop new traffic classification research Utilizing document classification approach to traffic classification Unsupervised classification to make cluster within a single-application traffic Two-stage classification algorithm to solve asymmetric routing classification problem Linear time complexity Compare three similarity metrics Provide guideline for selecting similarity metrics for traffic classification Provide soft-classification that represents similarity as a numerical value ranges from 0 to 1 Future Work Enhance unsupervised classification methodology for automated signature generation Extract orthogonal application vectors to improve scalability
23
DPNM, POSTECH 23/23 NOMS 2010
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.