MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Chapter 7 Protocol Software On A Conventional Processor.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
Hermes: An Integrated CPU/GPU Microarchitecture for IPRouting Author: Yuhao Zhu, Yangdong Deng, Yubei Chen Publisher: DAC'11, June 5-10, 2011, San Diego,
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Improving the Performance of Network Intrusion Detection Using Graphics Processors Giorgos Vasiliadis Master Thesis Presentation Computer Science Department.
1 PC-base Software Routers: High Performance and Application Service Support Author: Raffaele Bolla, Roberto Bruschi Publisher: PRESTO’08 Presenter: Hsin-Mao.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
Router Architectures An overview of router architectures.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Router Architectures An overview of router architectures.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Srihari Makineni & Ravi Iyer Communications Technology Lab
GPU Architecture and Programming
GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
CS 4396 Computer Networks Lab Router Architectures.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Kargus: A Highly-scalable software-based network intrusion detection awoo100 Anthony Wood.
Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.
MemcachedGPU Scaling-up Scale-out Key-value Stores Tayler Hetherington – The University of British Columbia Mike O’Connor – NVIDIA / UT Austin Tor M. Aamodt.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
SWM: Simplified Wu-Manber for GPU- based Deep Packet Inspection Author: Lucas Vespa, Ning Weng Publisher: The 2012 International Conference on Security.
My Coordinates Office EM G.27 contact time:
Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy
Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
NFV Compute Acceleration APIs and Evaluation
GPUNFV: a GPU-Accelerated NFV System
CS427 Multicore Architecture and Parallel Computing
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Multi-PCIe socket network device
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Linchuan Chen, Xin Huo and Gagan Agrawal
Xen Network I/O Performance Analysis and Opportunities for Improvement
Implementing an OpenFlow Switch on the NetFPGA platform
Graphics Processing Unit
6- General Purpose GPU Programming
2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.
Presentation transcript:

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October 17–21, 2011, Chicago, Illinois, USA. Presenter: Ye-Zhi Chen Date: 2012/05/09

Introduction MIDeA, a new model for network intrusion detection systems, which combines commodity, general purpose hardware components in a single-node design, tailored for high-performance network traffic analysis.

Design Principle 1. traffic has to be separated at the network flow level using existing, commodity solutions, without incurring any serialization on the processing path 2. per-flow traffic processing should be parallelized beyond simple per-flow load balancing across different CPU cores 3. a pipelining scheme that allows CPU and GPU execution to overlap, and consequently hides the added latencies

Architecture

This design has a number of benefits: 1. it does not require any synchronization or lock mechanisms since different cores process different data in isolation. 2. having several smaller data structures (such as the TCP reassembly tables) instead of sharing a few large ones, not only reduces the number of table look-ups required to find a matching element, but also reduces the size of the working set in each cache, increasing overall cache efficiency.

Architecture Packet Capturing : Multiqueue NICs Receive-Side Scaling (RSS) : a Microsoft Scalable Networking initiative technology that enables receive processing to be balanced across multiple processors in the system while maintaining in-order delivery of the data The Rx-queues are not shared between the CPU cores, eliminating the need of synchronization To avoid costly packet copies and context switches between user and kernel space, we use the PF_RING network socket. The most efficient way to integrate a PF_RING socket with a multiqueue NIC is to dedicate a separate ring buffer for each available Rx-queue. Network packets of each Rx-queue are stored into a separate ring buffer and are pulled by the user-level process

Architecture Load Balancing :Hash-based The hash function, computed on the typical 5-tuple <ip_src, ip_dst, port_src, port_dst, protocol> achieves good distribution among the different queues.

Processing Engine Preprocessing: Flow reassembly :TCP packets are reassembled into TCP streams to build the entire application dialog before they are forwarded to the pattern matching engine.

Processing Engine Parallel Multi-Pattern Engine: Unfortunately, in the CUDA runtime system, on which MIDeA is based, each CPU thread is executed in its own CUDA context. In other words, a different memory space has to be allocated in the GPU for each process, since they cannot share memory on the GPU device

Processing Engine

OPTIMIZATION GPU Memory Access : Batch Page-locked memory Int4 Texture memory

OPTIMIZATION Pipelined Execution :

EXPERIMENTAL EVALUATION Hardware : Our base system has two processor sockets, each with one Intel Xeon E5520 Quad-core CPU at 2.27GHz and 8192KB of L3-cache. Each socket has an integrated memory controller, connected to memory via a memory bus two PCIe 2.0 ×16 slots, which we populated withtwo GeForce GTX480 graphics cards, and one PCIe 2.0 ×8 slot holding one Intel 82599EB 10GbE NIC Each NVIDIA GeForce GTX 480 is equipped with 480 cores, organized in 15 multiprocessors, and 1.5GB of GDDR5 memory.

EXPERIMENTAL EVALUATION

Software : OS : Linux with the ioatdma and dca modules loaded The ioatdma driver is required for supporting QuickPath Interconnect architecture of recent Intel processors. DCA (Direct Cache Access) is a NIC technology that directly places incoming packets into the cache of the CPU core for immediate access by the application. default rule set of Snort 2.8.6, which consists of 8,192 rules, comprising about 193,000 substrings for string searching

EXPERIMENTAL EVALUATION