Distributed Network Traffic Feature Extraction for a Real-time IDS

Slides:



Advertisements
Similar presentations
Live migration of Virtual Machines Nour Stefan, SCPD.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
Multi-granular, multi-purpose and multi-Gb/s monitoring on off-the-shelf systems TELE9752 Group 3.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
Spark: Cluster Computing with Working Sets
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
2009/9/151 Rishi : Identify Bot Contaminated Hosts By IRC Nickname Evaluation Reporter : Fong-Ruei, Li Machine Learning and Bioinformatics Lab In Proceedings.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Layered Approach using Conditional Random Fields For Intrusion Detection.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
VMware vSphere Configuration and Management v6
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Web Log Data Analytics with Hadoop
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
NFP: Enabling Network Function Parallelism in NFV
- Inter-departmental Lab
Presented by: Omar Alqahtani Fall 2016
Chapter 1: Introduction
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Parallel Density-based Hybrid Clustering
Introduction to HDFS: Hadoop Distributed File System
Algorithms for Big Data Delivery over the Internet of Things
Chapter 6: Network Layer
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Selectivity Estimation of Big Spatial Data
Overview Introduction VPS Understanding VPS Architecture
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
DDoS Attack Detection under SDN Context
Distributed File Systems
NFP: Enabling Network Function Parallelism in NFV
CS110: Discussion about Spark
Hadoop Technopoints.
Introduction to Apache
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
Overview of big data tools
Declarative Transfer Learning from Deep CNNs at Scale
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Apache Hadoop and Spark
A Cross-layer Monitoring Solution based on Quality Models
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Accelerating Regular Path Queries using FPGA
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Distributed Network Traffic Feature Extraction for a Real-time IDS Presented by: Dr. Ahmad Javaid Co-authors: Ahmad Karimi, Quamar Niyaz, Dr. Weiqing Sun, and Dr. Vijay Devabhaktuni

Outline Introduction Tools Overview IDS Architecture Experimental Setup & Results Conclusion

Introduction Intrusion Detection System (IDS) monitors attacks on the network Installed at different hierarchical layers Backbone Distribution Access Modern networks observe huge amount of traffic

Introduction Challenge: IDS has to monitor all the incoming traffic Solution: Distributed Systems provide parallel processing Huge disk space Reliable Scalable We emphasize on efficient network traffic processing and feature extraction 1. IDS has to monitor all the traffic within the same stipulated time interval. 2. Distributed system utilizes resource of all the participating nodes. 3. Same Disk space of all the individual machine. 4. Reliable because of redundancy of data and there’s no single point of failure

Outline Introduction Tools Overview IDS Architecture Experimental Setup & Results Conclusion

Tools Overview Our proposed system involves following stages Traffic collection Data storage Feature extraction Traffic classification Different tools are used at different stages

Tools Overview Traffic Collection - Netmap-libpcap: Framework for high- speed packet I/O Real-time traffic collection with negligible loss of less than 1%. Other tools used for comparison are Dumpcap and Tshark. Significant losses observed when packet frequency exceede .5 million per second

Tools Overview Data Storage – HDFS: Feature extraction – Apache Spark: Hadoop Distributed File System Provides scalable disk space Fault-tolerant Feature extraction – Apache Spark: Distributed and high-speed data processing framework In-memory processing faster compared to others Each node processes blocks of data, hence parallel execution Uses commodity machines. Can be expanded to virtually any number of machines. Fault tolerant because of redundant data or data replication.

Outline Introduction Tools Overview IDS Architecture Experimental Setup & Results Conclusion

IDS Architecture The IDS consists of: Traffic collection Traffic feature extraction Traffic classification Traffic collection has not been focused now. For future work using Spark Mlib library that provides many machine learning algorithms like Naïve Bayesian, random forest, decision tress, etc.

IDS Architecture Traffic collection Traffic feature extraction Traffic is mirrored to a particular port Every packet is copied to the IDS from there Traffic feature extraction Stores captured packets on HDFS Extracts features using Spark Sends extracted features to monitoring system

Outline Introduction Tools Overview IDS Architecture Experimental Setup & Results Conclusion

Experimental Setup And Results Six Spark nodes and HDFS cluster Nodes hosted on VMWare ESXi host ESXi runs on Supermicro SYS-6028RWTRT Nodes assigned 4 vCPUs, 8 GB RAM, and 60 GB disk storage Performance evaluated by modelling CAIDA DDoS attack dataset tcpreplay used to run a dataset and traffic collected at 5 min interval Supermicro server shipped with Intel (R) Xeon (R) @ 2.30 GHz, 96 GB RAM, and 20 CPU core × 2.99 GHz

Experimental Setup And Results CAIDA data collected for an hour at intervals of 5 mins Maximum data observed in a time window was 2.9 GB We generated upto ≈3.0 GB in 5 mins for TCP traffic

Experimental Setup And Results Comparison on varying cluster and file size The 3 GB files took 3.17 ± 0.05 and 3.5±0.1 min on 6 and 4 nodes 1 or 2 nodes took >5 mins for 2 GB and 3 GB files 1 GB files were processed within 5 mins in all cases 5 minute threshold

Experimental Setup And Results Current work focuses on TCP traffic feature extraction for TCP based attack detection Following headers were collected Source IP Destination IP Source Port Destination Port IP Payload TCP Flags Features extracted using these headers Extracted features are described in table 1.

Experimental Setup And Results Feature extraction output on Spark cluster

Outline Introduction Tools Overview IDS Architecture Experimental Setup & Results Conclusion

Conclusion Feature extraction time is less than the period of traffic generation Supports real-time evaluation for a fixed time interval Useful for network with high traffic Current system can be implemented for small organizations System may be applicable to larger organizations if number of nodes in the cluster increased Useful for network with high traffic : because for low traffic networks, there tendency that inter node communication neutralizes the parallel processing feature of spark. Good for huge header data files.

Thank you Questions?