Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.

Slides:



Advertisements
Similar presentations
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Advertisements

Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
Threads Section 2.2. Introduction to threads A thread (of execution) is a light-weight process –Threads reside within processes. –They share one address.
Snort - an network intrusion prevention and detection system Student: Yue Jiang Professor: Dr. Bojan Cukic CS665 class presentation.
Page: 1 Director 1.0 TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
TCP Servers: Offloading TCP/IP Processing in Internet Servers
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises.
A Brief Taxonomy of Firewalls
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Penetration Testing Security Analysis and Advanced Tools: Snort.
Computer System Architectures Computer System Software
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Network Server Performance and Scalability June 9, 2005 Scott Rixner Rice Computer Architecture Group
Flash An efficient and portable Web server. Today’s paper, FLASH Quite old (1999) Reading old papers gives us lessons We can see which solution among.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
LWIP TCP/IP Stack 김백규.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
SpliceNP: A TCP Splicer using a Network Processor Li Zhao +, Yan Luo*, Laxmi Bhuyan University of California Riverside Ravi Iyer Intel Corporation + Now.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Srihari Makineni & Ravi Iyer Communications Technology Lab
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
A record and replay mechanism using programmable network interface cards Laurent Lefèvre INRIA / LIP (UMR CNRS, INRIA, ENS, UCB)
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 5 (Deep Packet Inspection)
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Hardened IDS using IXP Didier Contis, Dr. Wenke Lee, Dr. David Schimmel Chris Clark, Jun Li, Chengai Lu, Weidong Shi, Ashley Thomas, Yi Zhang  Current.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
COMP2322 Lab 1 Introduction to Wireshark Weichao Li Jan. 22, 2016.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 4 (Network Packet Filtering)
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Introduction to threads
Snort – IDS / IPS.
Potential Project.
Distributed Network Traffic Feature Extraction for a Real-time IDS
Chapter 4: Threads.
Internetworking: Hardware/Software Interface
IP Control Gateway (IPCG)
Presentation transcript:

Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '08) Presenter : JHAO-YAN JIAN Date : 2010/11/10 A scalable multithreaded L7-filter design for multi-core servers 1

Introduction Traditional packet classifications make the decision based on packet header information. But many applications, such as P2P and HTTP, hide their application characteristics in the payload. The original L7-filter is a sequential DPI(Deep packet Inspection) program that identifies protocol information in a given connection. Traditional single core server is insufficient to satisfy DPI functionality. (high speed networks, such as 10 Gigabit Ethernet) In spite of its enhanced processing power, efficient core utilization in a multi-core architecture remains a challenge. 2

Introduction 3 Network traffic in original L7-filter is captured by Netfilter, which consists of a set of hooks inside the Linux kernel that allows kernel modules to traverse the network stack. Inside the network stack of the kernel, a series of operations are executed to establish a connection buffer based on 5- tuple connection information in the packet header. Operations : TCP/IP packets checksum verification, TCP/IP reassembling, IP refragmentation, etc. After such a preprocessing stage. L7-filter starts to match all the application layer data of the arriving packets in the same connection against the protocol database in a sequential fashion.

Decoupling Linux L7-filter operations 4 Previous research from both academia and industry have demonstrated that the performance of L7-filter is bounded by the cost of pattern matching. Therefore, the authors have developed a decoupled model to separate the packet arrival handling and focus on optimizing the pattern matching operations at the application layer. To parallelize the L7-filter operations based on a user space version.

Modeling Single-Threaded L7-filter 5 choose libnids as a user space module. Libnids reads tcpdump trace files and simulates kernel network stack behaviors in user space. Libnids offers IP defragmentation, TCP stream assembly and TCP port scan detection. The original online L7-filter is substituted by a combination of a Preprocessing Thread(P T) and a Matching Thread(M T). At any point of processing, a connection can only have one of the three statuses: 1 ) MATCHED or 2) NO_MATCH 3) NO_MATCH_YET.

Modeling Single-Threaded L7-filter

Parallelizing L7-filter at Connection Level 7 Once more MTs are created, each MT executes on a connection buffer basis. When a new packet is reassembled for a connection, randomly selecting a non-empty runqueue of a thread introduces additional cache over head by copying packets of the same connection to different cores. In addition, it also wastes the thread resources. we believe dispatching an independent thread to a dedicated core saves the cost of scheduling overhead and reduces cache misses introduced by live migrations of unbalanced work loads.

Parallelizing L7-filter at Connection Level

10

Experiment Platform 11 This server system has two CPU sockets, each embeds a quad-core Xeon X GHz processors, and 16GB of 667MHz DDR2 SDRAM. Each socket has two 4MB shared L2 caches. To Use Linux kernel as default OS.

Throughput and Core Utilization 12 With 7 concurrent threads, the system throughput increases by 51% compared to the naive OS scheduling. The system scales near linearly ( a speedup of 6.5X when 7 threads are applied.) to the number of MTs.

Cache Performance 13

A Life-of-Packet Analysis 14

15