Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai.

Similar presentations


Presentation on theme: "A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai."— Presentation transcript:

1 A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai

2 Typical IDS Data Collection Data Pre- Processing Intrusion Identification Response This work mainly focused on Intrusion Identification

3 Architecture

4 Attribute Selection “With more data, the simpler solution can be more accurate than the sophisticated solution.” Selection process based on means and modes of numeric attributes A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes

5 Selected Attributes logged_in Serror_rate srv_serror_rate Same_srv_rate diff_srv_rate dst_host_serror_rate dst_host_srv_serror_rate A strong contrast between the trends of a selected and discarded attribute visible

6 Training Set Selection (using LDA) Latent Dirichlet Allocation is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.

7 Sample LDA Output Topic 0th: 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly 0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly 0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly 0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly Topic 1th: 0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly 0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly ………………

8 Genetic Algorithm

9 Applied on Normal and Anomaly packets separately Threshold value taken for providing a negative weight Run for 3 generations Top 3 values for anomaly and normal packets used

10 Identifying nature of incoming packet For each selected attribute value Fi in incoming packet ◦ If Fi ∈ Vi  Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal) ◦ Else  Si= 0 C = Σ Si If C > 0 ◦ Then Anomaly Else Normal

11 Additional Weight Multiplied to the anomaly frequency Why ?  generic anomalies having diverse values  unlike the normal packets that contain values in a particular range Trade-off between the accuracy and the false positive rate required

12 Additional Weight

13 Results Tested against 50000 anomaly and 50000 normal packets from KDDCup’99 dataset. 88.5% Accuracy with 6% FPR

14 Future Work Focus on specific anomaly types Better Attribute Selection algorithm ? ◦ oneR ◦ Entropy based ◦ Chi-squared ◦ randomForest Better classification technique ? ◦ Clustering – Hierarchical, K-Means ◦ Decision Trees

15 REFERENCES 1. Valeur, Fredrik, and Giovanni Vigna. Intrusion detection and correlation: challenges and solutions. Vol. 14. Springer, 2005. 2. Kim, Dong Seong, and Jong Sou Park. "Network-based intrusion detection with support vector machines." Information Networking. Springer Berlin Heidelberg, 2003. 3. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research,Volume 3, pp.993-1022,2003. 4. Cramer, Christopher, and Lawrence Carin. "Bayesian topic models for describing computer network behaviors." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011. 5. Newton, Benjamin D. "Anomaly Detection in Network Traffic Traces Using Latent Dirichlet Allocation." 6. Li, Wei. "Using genetic algorithm for network intrusion detection." Proceedings of the United States Department of Energy Cyber Security Group,pp1-8,2004.

16 REFERENCES (Contd.) 7. Bing-Yi Zhang,Ya-Min Sun,Yu-Lan,Bian,Hong Ke Zhang,”Linear Discriminant Analysis in network traffic modeling”, International Journal of Communication Systems”,Volume 19,Issue 1,pp.53-65,2006. 8. A.Gomathy and B.Lakshmi,”Network intrusion detection using Genetic algorithm and Neural Network”, Communications in Computer and Information Science,Volume 198,pp.399-408,2011. 9. Siva S,Sivatha Sindhu,S.Geetha,A.Kannan,”Decision tree based light weight intrusion detection using a wrapper approach”,Expert Systems with applications,Volume 39,pp.129-141,2012. 10. B.Kavitha,S.Karthikeyan,P.Sheeba Maybell,”An ensemble design of intrusion detection system for handling uncertainity using neutrosophic logicclassifier”,Knowledge based systems,Volume 28,pp.88-96,2012. 11. Saini, Shubham, Bhavesh Kasliwal, and Shraey Bhatia. "Spam Detection using G-LDA." International Journal of Advanced Research in Computer Science and Software Engineering,Volume 3,Issue 10,pp.406-409,2013. 12. Cup, K. D. D. "Available on: http://kdd. ics. uci. edu/databases/kddcup 99/kddcup99. html.",2007.

17 REFERENCES (Contd.) 13. Phan, Xuan-Hieu, and Cam-Tu Nguyen. "Jgibblda: A java implementation of latent Dirichlet allocation (lda) using gibbs sampling for parameter estimation and inference”,2006. 14. Shekhar R Gaddam, Vir V Phoha and Kiran S Balagani,”A novel method for supervised anomaly detection by cascading K-Means clustering and ID3 deicsion tree learning methods”, IEEE transactions on knowledge and data engineering,Volume.19,pp.345- 354,2007. 15. Amor, Nahla Ben, Salem Benferhat, and Zied Elouedi. “Naive Bayes vs decision trees in intrusion detection systems” Proceedings of the 2004 ACM symposium on Applied computing, pp.420-424,2004. 16. Benferhat, S. and Tabia, K., “On the combination of Naive Bayes and decision trees for intrusion detection”, International Conference on Intelligent Agents, Web Technologies and Internet Commerce,Volume 1, pp. 211–216,2006. 17. [17] Xiang, C., and Lim, S. M, “Design of multiple-level hybrid classifier for intrusion detection system”, IEEE Transaction on System, Man and Cybernetics, Part A: Cybernetics, Volume 2, pp.117–122,2005. 18. [18] Sumaiya Thaseen and Ch. Aswani Kumar, “An Analysis of supervised tree based classifiers for intrusion detection system”, IEEE International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME), February 2013.

18 QUESTIONS?


Download ppt "A Hybrid Anomaly Detection Model using G-LDA Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. VIT University – Chennai."

Similar presentations


Ads by Google