Advanced Science and Technology Letters Vol.53 (AITS 2014), pp.429-433 A Network Intrusion Detection Method.

Slides:



Advertisements
Similar presentations
QR Code Recognition Based On Image Processing
Advertisements

Clustering Basic Concepts and Algorithms
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Noam Segev, Israel Chernyak, Evgeny Reznikov Supervisor: Gabi Nakibly, Ph. D.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Chapter 5 Data mining : A Closer Look.
Radial Basis Function Networks
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Evaluating Performance for Data Mining Techniques
Business Logic Abuse Detection in Cloud Computing Systems Grzegorz Kołaczek 1st International IBM Cloud Academy Conference Research Triangle Park, NC April.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Intrusion Detection for Grid and Cloud Computing Author Kleber Vieira, Alexandre Schulter, Carlos Becker Westphall, and Carla Merkle Westphall Federal.
How to make a presentation (Oral and Poster) Dr. Bernard Chen Ph.D. University of Central Arkansas July 5 th Applied Research in Healthy Information.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns Fang Yu Department of Management Information Systems National Chengchi University.
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment International Conference of Embedded and Ubiquitous Computing.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Robustness of complex networks with the local protection strategy against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Wayne.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
Technical Report of Web Mining Group Presented by: Mohsen Kamyar Ferdowsi University of Mashhad, WTLab.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Wireless communications and mobile computing conference, p.p , July 2011.
Alexey A. Didyk Kherson national technical university, Ukraine
Mitigation strategies on scale-free networks against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Chris Chang.
Advanced Science and Technology Letters Vol.31 (ACN 2013), pp Application Research of Wavelet Fusion Algorithm.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Project Seminar on STABLE CLUSTERING ALGORITHM TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN GRID ENVIRONMENT Under the guidance of Prof. Lakshmi Rajamani.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Time-Space Trust in Networks Shunan Ma, Jingsha He and Yuqiang Zhang 1 College of Computer Science and Technology 2 School of Software Engineering.
A New Threat Evaluation Method Based on Cloud Model Wang Bailing 1*, Guo Shi 1, Qu Yun 1, Wang Xiaopeng 1, Liu Yang 1 1 Harbin Institute of Technology,
Efficient Load Balancing Algorithm for Cloud Computing Network Che-Lun Hung 1, Hsiao-hsi Wang 2 and Yu-Chen Hu 2 1 Dept. of Computer Science & Communication.
Advanced Science and Technology Letters Vol.28 (AIA 2013), pp Local Contour Features for Writer Identification.
Ahmad Salam AlRefai.  Introduction  System Features  General Overview (general process)  Details of each component  Simulation Results  Considerations.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Advanced Science and Technology Letters Vol.74 (ASEA 2014), pp Development of Optimization Algorithm for.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Identifying Ethnic Origins with A Prototype Classification Method Fu Chang Institute of Information Science Academia Sinica ext. 1819
Advanced Science and Technology Letters Vol.28 (EEC 2013), pp Fuzzy Technique for Color Quality Transformation.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
IEEE AI - BASED POWER SYSTEM TRANSIENT SECURITY ASSESSMENT Dr. Hossam Talaat Dept. of Electrical Power & Machines Faculty of Engineering - Ain Shams.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Advanced Science and Technology Letters Vol.53 (AITS 2014), pp An Improved Algorithm for Ad hoc Network.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Experience Report: System Log Analysis for Anomaly Detection
Big data classification using neural network
Cluster Analysis II 10/03/2012.
Efficient Load Balancing Algorithm for Cloud
Aziz Nasridinov and Young-Ho Park*
A Modified Naïve Possibilistic Classifier for Numerical Data
Presentation transcript:

Advanced Science and Technology Letters Vol.53 (AITS 2014), pp A Network Intrusion Detection Method Based on Improved K-means Algorithm Meng Gao 1,1, Nihong Wang 1, 1 Information and Computer Engineering College, Northeast Forestry University, Harbin, China Abstract. K-means algorithm could be used in intrusion detection, and selection of initial cluster centers was one of the most important factor that influenced the clustering performance, traditional method had a certain degree of randomness in dealing with this problem, therefore, information entropy was introduced into the process of cluster centers selection, and a fusion algorithm combining with information entropy and K-means algorithm was proposed, information entropy value was used to measure the similarity degree among records, it could help to choose a least similar record to be a cluster center. Comparison results show that the detection ratio and false alarm ratio of the proposed method is better than traditional K-means algorithm. Keywords: K-means; Information entropy; Intrusion detection 1 Introduction Network intrusion detection is a process which includes series of actions, such as collecting data related to network status and behaviors from key nodes, analyzing these data, discovering abnormal behavior as well as providing early warning [1-2], it can achieve the purpose of monitoring network behavior and defending network intrusion. As intrusion behaviors tend to have uncertainly in some degree, so, it is of great significance to identify unknown behaviors by extracting hidden information in intrusion data [3-4]. Li Wenhua proposed a FCM cluster network intrusion detection model based on fuzzy c-means [5]; Zhang Guosuo proposed an improved FCM cluster algorithm, it could solve the boundedness in dealing with big dataset by using traditional FCM [6]; Reda M. Elbasiony used random forests and weighed k-means algorithm to build intrusion patterns and choose anomalous clusters [7]; Luo Min researched on the non-supervised intrusion detection model based on K-means algorithm [8]; Li Heling proposed the improved K-means algorithm and carried out experiments aiming at the problem of uneven data distribution [9]; Researches above focused on solving the problem of data size that the algorithm can deal with, they 1 Meng Gao, female (1989- ), Ph.D., mainly engaged in forestry informatization and system security, ISSN: ASTL Copyright © 2014 SERSC

Advanced Science and Technology Letters Vol.53 (AITS 2014) ignored the kernel of algorithm itself. This paper uses K-means algorithm to detect intrusion behaviors, as selection of initial cluster centers is the key factor that influences the cluster results, the information entropy technology is introduced to auxiliary determine cluster centers, experiments show that the improved fusion algorithm has a good detection ratio and false alarm ratio. 2 Algorithm Combing with Information Entropy and K-means 2.1 K-means Algorithm This paper uses Euclidean distance to measure the similarity among records, and use formula (1) to evaluate the clustering results.. (1) where is the sum of all objects’ mean squared error; is a data cluster ; is a data object; is the number of objects in ; is the number of clusters; the smaller value of, the better of the clustering effect. Data clustering process using K-means algorithm can be described as follows: 1.Define the number of clusters to be finally generated; 2.Choose records to be the initial cluster centers; 3.Divide the original data into the clusters, and recalculate the center of each cluster; 4.Break the clustering result in last stage, put object into the corresponding cluster according to the principle of minimum Euclidean distance, then, form the new clusters, and calculate the value of at the same time; 1.Repeat stage (4) until the new clusters are same as the previous clusters. We can know that performance of the algorithm is mainly determined by stage (1) and (2), cluster number is often determined according to actual situations [10-11], therefore, selection of initial cluster centers is the key factor that influences the algorithm performance. 2.2 Information Entropy Information entropy is used to measure the uncertainty of a random variable information, the bigger of it, the more disordered of the data; otherwise, the more ordered and similar of the data [11-12]. If using information entropy to evaluate clustering effect, then the smaller of the entropy, the more similar of data in a same cluster and better of the clustering effect [13-14]. 430 Copyright © 2014 SERSC

Advanced Science and Technology Letters Vol.53 (AITS 2014) Information entropy of a random variable can be described as:. (2) Where is the possible value set of ; is the probability function of. 2.3 Improved K-means algorithm based on information entropy Assume that sample space includes records, first, calculate the information entropy value of each record, and then start from the first record, compare the value of current record with other records, finally, regard the minimum value as the information entropy baseline of the current record, the comparison matrix is shown as Table 1. Table 1. Comparison matrix of information entropy value j n Baseline i 1 E(M 1,M 1 ) E(M 1,M 2 ) E(M 1,M 3 )... E(M 1,M n ) min E(M 1,M j ) 2 E(M 2,M 1 ) E(M 2,M 2 ) E(M 2,M 3 )... E(M 2,M n ) minE(M 2,M j ) 3 E(M 3,M 1 ) E(M 3,M 2 ) E(M 3,M 3 )... E(M 3,M n ) minE(M 3,M j ) N E(M n,M 1 ) E(M n,M 2 ) E(M n,M 3 )... E(M n,M n ) minE(M n,M j ) Calculate the information entropy baseline set, and order the baseline from big to small, get the ordered baseline set, the bigger of the information entropy value, the less similar between the corresponding record and other records, and the more suitable to be center of the initial cluster. Combine with the cluster number determined in stage (1) of K-means algorithm, choose the top- k records corresponding with the information entropy values in as the least similar records, and these records can be regarded as the initial cluster centers. 2.4 Network Intrusion Detection Algorithm Based on IE-K-means The process of detecting network intrusion using IE-K-means algorithm can be described as: 1.Define the number of clusters to be finally generated, and set the instance threshold of clusters; 1.Choose records as the initial cluster center using IE-K-means algorithm, calculate the Euclidean distance between and other records; 431 Copyright © 2014 SERSC

Advanced Science and Technology Letters Vol.53 (AITS 2014) (4) According to the minimum, divide each record into clusters with the minimum Euclidean distance, and generate new clusters, recalculate of the new clusters, and record the instance number of each cluster. 6.Break the clustering result in last stage, and repeat stage (3)-(5) until the current clusters are the same as the previous clusters. 1.Record and of the each generated cluster; 2.If, mark as the center of abnormal cluster ; if, mark as the center of normal cluster ; 1.When new connection is coming, calculate the Euclidean distance between new connection and each, if is closer with, mark the new connection as the abnormal intrusion; If is closer with, mark the new connection as the normal intrusion. 3 Simulation Experiment and Analysis Use KDDCUP99 data packets to verify the feasibility and effectiveness of IE-K- means algorithm, choose 7200 DoS attack data, of which 5500 records are used as training data for training model, and the other 1700 records are used as testing data for testing the effectiveness of the intrusion detection model. The experiment adopts different cluster number, cluster the training data at first to get the cluster center set, and then send the testing data into the anomaly detection system for intrusion detection, calculate the and of each data set at the same time, experiment results are shown in Table 2. It can be seen that the network intrusion detection model based on IE-K-means is feasible, and the improved algorithm is better than traditional K-means algorithm in detection ratio and false alarm ratio based on different cluster amount. Table 2. Comparison experiment results k K-means algorithm IE-K-means algorithm DetectRate/% FalseDetectRate/% Conclusions According to the characteristics of network intrusion data, aiming at the problems existed in the current intrusion detection researches, this paper proposes up a network intrusion detection method based on the fusion algorithm combining with information entropy and K-means, experiment results show that the fusion algorithm has improved the detection ratio and reduced the false alarm ratio compared with 432 Copyright © 2014 SERSC

Advanced Science and Technology Letters Vol.53 (AITS 2014) traditional K-means algorithm. However, the implementation of the fusion algorithm did not consider the algorithm execution efficiency, which requires the further study. Acknowledgments. This work is supported by Special Fund for Scientific Research in the Public Interest ( ) and The Fundamental Research Funds for the Central Universities ( AB22). References 1.Jonathan, J.D., Andrew, J.C.: Data Processing for anomaly based network intrusion detection: A review. Computers & Security. 30, (2013) 2.Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications – A decade review from 2000 to Expert Systems with Applications. 39, (2012) 3.Mohammand, S.A., Hamid, M., Jafar, H.: Design and analysis of genetic fuzzy systems for intrusion detection in computer networks. Expert Systems with Applications. 38, (2011) 4.Chen, X.H.: Intrusion Detection Method Baed on Data Mining Algorithm. Computer Engineering. 36, (2010) 5.Li, W.H.: Network Intrusion Detection Model Based on Clustering Analysis. Computer Engineering. 37, (2011) 6.Zhang, G.S., Zhou, C.M., Lei, Y.J.: Improved fuzzy C-means clustering algorithm and its application to intrusion detection. Journal of Computer Applications. 29, (2009) 7.Reda, M.E., Elsayed, A.S., Tarek, E.E., Mahmoud, M.F.: A hybrid network intrusion detection framework based on random forests and weighed k-means. Ain Shames Engineering Journal. 4, (2013) 8.Luo, M., Wang, L.N., Zhang, H.G.: An Unsupervised Clustering-Based Intrusion Detection Method. ACTA ELECTRONICA SINICA. 31, (2003) 9.Li, H.L.: Study on Application of data mining in network intrusion detection. JiLin University, JiLin (2013) 10.Li, Y.: Application of K-means Clustering Algorithm in Intrusion Detection. Computer Engineering. 33, (2007) 11.Du, Q., Sun, M.: Intrusion detection system based on improved clustering algorithm. Computer Engineering and Applicatins. 47, (2011) 12.Ye, Z.W.: The Research of Intrusion Detection Algorithms Based on the Clustering of Information Entropy. Procedia Environmental Sciences. 12, (2012) 13.Feng, J., Sui, Y.F., Cao, C.G.: An Information entropy-based approach to outlier detection in rough sets. Expert Systems with Applications. 37, (2010) 14.Jin, C.X., Li, F.C., Li, Y.: A generalized fuzzy ID3 algorithm using generalized information entropy. Knowledge-Based Systems. 64, (2014) 433 Copyright © 2014 SERSC