Download presentation
Presentation is loading. Please wait.
1
CYBERCRIME and Avoidance Techniques
Evaluating and Selecting Different Classifiers on KDD99 Dataset with Feature Reduction for Network Intrusion Detection
2
Contents Introduction Problem Statement Objectives
Technical Requirements Methodology Results Conclusion Future Work
3
Cyber Crime and Types The act wherein the computer is the tool for an unlawful act and also involves a modification of a conventional crime by using computers is called Cyber Crime Financial crimes Theft of information contained in electronic form Cyber pornography bombing Sale of illegal articles Data diddling Online gambling Salami attacks Intellectual Property crimes Denial of Service attack spoofing Virus / worm attacks Forgery Logic bombs Cyber Defamation Trojan attacks Cyber stalking Unauthorized access to computer systems or networks Internet time thefts Theft of computer system Web jacking Physically damaging a computer system
4
Various Security Tools Used
5
Network Protection Techniques used
(Source: CII – PricewaterhouseCoopers)
6
Data Protection Techniques used by Companies
(Source: CII – PricewaterhouseCoopers)
7
Percentage Break-up of companies keeping Data Protection
(Source: “Cyber Crime and Punishment: Archaic Laws Threaten Global Punishment”, McConnell International LLC)
8
Electronic Crime Targets
(Source: Hollis Stambaugh, David S. Beaupre and all, “Electronic Crime needs Assessment for State and Local Law Enforcement”, National Institute for Justice Research Report, U.S. Department of Justice).
9
Purposes of Data Mining Efforts in Departments and Agencies
10
Intrusion Detection Techniques
Intrusion Detection Systems can be of two types: Network Intrusion Detection Database Intrusion Detection Different Types of Intrusion Detection Techniques are: Bayesian Networks Neural Networks Data Mining
11
Intrusion detection system (IDS) used to detect and monitor the small and big networks to find an intrusions/attackers There are two types of intrusion detection system which are: NIDS (network based IDS) HIDS (host based IDS) one of them can be installed on a host as a software Other can be part of the network such as a device responsible for detection within the network.
12
The intrusions are making a big risk on the information which is transmitted in the organizations
High performance system is the IDS which can prevent and monitor these intrusions on the network. We are going to propose an optimized method Studies on the relevant work in this field (intrusion detection systems).
13
Objectives Review and analysis of the existing security threats in the networks. Analyze, characterize, compare, and design intrusion detection systems using soft computing technique. Proposing a network intrusion detection system that detect the threats on the network. Reach a high intrusion detection (classification) accuracy of the network using Naïve Bayes Algorithm.
14
Relevant Work Literature survey of around 50 papers has been covered and categorized into four groups which are; Pattern Matching, Genetic Algorithm, Signature Based, and Machine Learning. The topic of primary important is the Machine Learning technique which has been selected in this study for review, analysis and application improvement.
15
Technical Requirements
Personal Computer. KDDcup99 dataset files. Converted KDDcup99 dataset to ARFF file type. WEKA to use the classifiers (Training and testing data). Matlab for testing Statistica. RapidMiner Studio. Miktex for referencing. Excel. Word office. Notepad. Access to Sicencedirect.
16
The Data Mining Techniques can be further classified into:
Misuse Detection Analomy Detection Signature Based Analysis Statistical or Data Mining Analysis
17
Methodology Since our algorithm is on the Machine Learning. Most of the work relevant are using KDDcup99 dataset so we had to download this dataset from but this dataset files were not able to be opened by WEKA for testing so we had to do some search and then we found the dataset converted to ARFF file type However we used this dataset which is in ARFF file type and we made analysis on it to see how many attributes does it contain. The whole data was huge so we used 10% of the whole data since it contains enough attacks on it to use it for our algorithm.
18
Methodology Second part was to do a study on the dataset (10% of KDDcup99 in ARFF file type) we found that it contains 41 attributes and the most features that can be extracted from each attribute is been studied.
19
Methodology Name Description Type of the data duration
How long the connection lasted (seconds) Continuous protocol_type Type of protocol such as TCP Discrete service What service was requested (e.g. http) src_bytes The bytes which was transmitted from source to destination dst_bytes The bytes which was transmitted from destination to source flag Status of the connection. land 1 for the connection to the same host wrong_fragment Total number of wrong fragments. urgent Total number of urgent packets. hot Total number of hot indicators. num_failed_logins Total number of the failed logins. logged_in 1 if logged, 0 if not. num_compromised Total number of compromised conditions. root_shell 1 if the root is obtained, 0 if not. su_attempted 1 if su root command been used, 0 if not. num_root Total number of root access. num_file_creations Total number of the files created (operations). num_shells Total number of shell prompts. num_access_files Total number of accessing and controlling files. num_outbound_cmds Total number of the outbound commands such as ftp sessions. is_hot_login 1 if the login is hot, 0 if not. is_guest_login 1 if the login is a guest, 0 if not.
20
Methodology The traffic of TCP connections which is recorded within 2seconds time as following table: Name Description Type of the data count Total number of connections to the same host within the 2seconds. Continuous serror_rate Percentage of the connections which have SYN errors. rerror_rate Percentage of the connections which have REJ errors. same_srv_rate Percentage of the connections to the same service. diff_srv_rate Percentage of the connections to different services. srv_count Total number of connections withing same connection link to the same service. srv_serror_rate srv_rerror_rate srv_diff_host_rate Percentage of the connections to different hosts.
21
Methodology The attacks in this dataset have been categorized into four groups as follows: DOS: Denial of service R2L: Unauthorized access remotely U2R: Unauthorized access to the root privileges Probing: Probing such as port scanning. KDDcup99 Dataset Total samples DoS Probe R2L U2R Normal connections Full Dataset 4,898,430 3,883,370 41,102 1,126 52 972,780 10% of data 494,020 391,458 4,107 97,277
22
Methodology Next part was to do a data preprocessing, transformation, reduction. Experiments have been done on different classifiers using. Feature Reduction and Discretization have been implemented for results improvements.
24
Methodology Naïve Bayes is selected and works as follow:
25
Results A better result has been found after testing different feature selection on Naïve Bayes algorithm as can be seen in Table next slide. The obtained accuracy was 97.92% with discretization. It was found that using discretization and CFsSubsetEvaluator for feature selection are the best to classify KDDcup99 dataset and they gave better results.
26
Results Data used/Paper Feature selection tool Algorithm used Accuracy
Time taken to build the Model. Mrutyunjaya Panda’s paper On Kdd99 65,525 Records with full 41 atributes Naïve Bayes 94.9% 1.89s Dr.Saurabh’s paper on KDD99 62,986 records with reduced attributes to 24 using CFsSubsetEva 10 using FVBRM method as highest result. 97.55% 97.78% 6.81s 9.42s Our Results On KDD99 268,187 records used after SMOTE and discretization with reduced attributes to 10 using Random Forest 9 using CFsSubsetEva 92.09% 97.92% 9.49s 3.35s
27
Conclusion Data mining was the main part of this research and helped us to find and classify the intrusions within a network logs (KDDcup99 dataset). IDS systems that are mainly use classification methods to classify the intrusions from the normal connections.
28
Conclusion
29
Data Mining Techniques for Intrusion Detection
INFORMATION KNOWLEDGE Transformed Database Data Selection and Transformation Data Cleaning Data Warehouse Discovery Modeling Visualization Intrusion Detection Sniffers and Sensors Human Analysis and Verification Query Selection and Feed-Back Loop Data Mining Ops Intrusion Detection Systems and Multi-sensor Data Fusion
30
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.