Download presentation
Presentation is loading. Please wait.
Published byAnnabelle Baldwin Modified over 9 years ago
1
Implementation of Machine Learning and Chaos Combination for Improving Attack Detection Accuracy on Intrusion Detection System (IDS) Bisyron Wahyudi Kalamullah Ramli Department of Electrical Engineering Universitas Indonesia
2
Network Security
3
The most important element In the network security: IDS Intrusion detection principles: Misuse detection (signature base) Anomaly detection (statistics) Classification with Machine Learning (research) Background: IDS
4
Intrusion detection too many false alarm More often arise new types of attack Required effective and adaptable detection method Classification with Machine Learning gives the best result depend on the kernel function and its parameters, and network data attributes/features. There are no systematic theories concerning how to choose the appropriate kernel/parameters. Background: Problem
5
1.Capturing packets transferred on the network. 2.Extracting an extensive set of attributes/features of the network packets data that can describe a network connection or a host session. 3.Learning a model that can accurately describe the behavior of abnormal and normal activities by applying data mining techniques. 4.Detecting the intrusions by using the learned models. Data Mining Approach for IDS
6
Classification (Supervised) Clustering (Unsupervised) K Nearest Neighbor (K-NN)K-Means Naïve BayesHierarchical Clustering Artificial Neural NetworkDBSCAN Support Vector MachineFuzzy C-Means Fuzzy K-NNSelf Organizing Map Data Mining Approach
7
Machine Learning Input Training Data (x,y) Input Training Data (x,y) Model Development Learning Algorithm Model Implementation Input Test Data (x,?) Input Test Data (x,?) Output Test Data (x,y) Output Test Data (x,y)
8
SVM Classification
9
Kernel NameDefinition of Function Linear K(x,y)= x.y PolynomialK(x,y)= (x.y + c) d Gaussian RBF K(x,y)= exp(- II x-y II 2 /2.σ 2 ) Sigmoid (Tangent Hyperbolic) K(x,y)= tanh(σ(x.y) + c) Inverse Multiquadric K(x,y)= 1 / √ II x-y II 2 + c Kernel Function x and y pair of data from train dataset σ, c, d > 0 constant parameter
10
How to choose the optimal/significant input dataset feature. How to set the best kernel function and parameters: σ, ε and C. SVM Performance
11
Three important dynamic properties: the intrinsic stochastic property, ergodicity and regularity Advantage of chaos escape from local minima More efficient to obtain optimization parameters by means of its powerful global searching ability Chaos
12
System Design
13
Metodologi Data Collection Data Preprocessing Model Development Data Classification Training Dataset Test Dataset KDDCUP ’99 DARPA Dataset Predicted Intrusion Data
14
Data Preprocessing Dataset Transformation Dataset Normalization Range Discretization Format Conversion Dataset Division: Training & Test KDDCUP ’99 DARPA Dataset Test Dataset Training Dataset
15
Model Development Input Training Data (x,y) Input Training Data (x,y) Parameter Selection with Chaos Optimization Learning Algorithm (SVM) Learning Algorithm (SVM) Model Implementation Input Test Data (x,?) Input Test Data (x,?) Output Test Data (x,y) Output Test Data (x,y) Kernel Function Selection
16
Fitur 1-9 : intrinsic feature extracted from header paket Fitur 10-22 : atribut konten yang didapat dari pengetahuan ahli dari paket Fitur 23-31 : atribut konten dari koneksi 2 detik sebelumnya Fitur 32-41 : atribut trafik dari mesin yang didapat dari 100 koneksi sebelumnya Fitur Payload : payload berdasarkan waktu (minggu) Feature in KddCup
17
Intrinsic Attributes These attributes are extracted from the headers' area of the network packets
18
Content Attributes These attributes are extracted from the contents area of the network packets based on expert person knowledge
19
Time Traffics Attributes To calculate these attributes we considered the connections that occurred in the past 2 seconds
20
Machine Traffic Attributes To calculate these attributes we took into account the previous 100 connections
21
21 Network Traffic Classification
22
The features that used in previous works are eight features from Mukkamala are: src_bytes, dst_bytes, Count, srv_count, dst_host_count, dst_host_srv_count, dst_host_same_src_port_rate, dst_host_srv_diff _host_rate. Selected Features
23
The features that used in previous works are 24 features from Natesan are: Duration, protocol_type, Service, Flag, src_bytes, dst_bytes, Hot, num_failed_logins, logged-in, num_compromised, root_shell, num_root, num_file_creations, num_shells, num_access_files, is_host_login, is_guest_login, Count, serror_rate, rerror_rate, diff_srv_rate, dst_host_count, dst_host_diff_srv_rate, dst_host_srv_serror_rate. Selected Features
24
Proposed Features
26
Data Pre-processing
27
Simulation Experiment
28
Simulation Process Design
29
Using payload can improve accuracy of IDS in detecting R2L. Using SVM with RBF kernel, accuracy detection rates up to 98.2%. Based on experiment, average detection of all features are best using 28 features using payload : Experiment Result
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.