Download presentation
Published byStanley Adams Modified over 8 years ago
1
Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy
Scalable High-Performance Parallel Design for NIDS on Many-Core Processors Haiyang Jiang, Gaogang Xie, Kave Salamatian and Laurent Mathy 哪几个学校?
2
Background & Motivation Our Approach Evaluation Conclusion
4/27/2017 Outline Background & Motivation Our Approach Evaluation Conclusion
3
Network Intrusion Detection Systems
4/27/2017 Network Intrusion Detection Systems Signature based NIDS (de-facto standard) Deep Packet Inspection(DPI) is a crucial component of NIDS Consumes 70%-80% processing time Decoded in the Packet Decoder; The Preprocessor module processes the TCP/IP protocol, dealing with IP fragments and TCP reassembly;
4
Performance Challenges
4/27/2017 Performance Challenges Due to increase in traffic and ruleset CPU (2.5GHZ) Cycle for processing a packet 1Gbps 20 Cycle 10Gbps 2 Cycle 40Gbps 0.5 Cycles Traffic ↑ Ruleset ↑
5
Many-core Processors Beyond Single Core Processor
4/27/2017 Many-core Processors Beyond Single Core Processor Due to powerful parallelism The Mother of All CPU Charts 2005/2006, Bert Töpelt, Daniel Schuhmann, Frank Völkel, Tom's Hardware Guide, Nov
6
The State of the ART Many-core Processor-based NDIS
4/27/2017 The State of the ART Many-core Processor-based NDIS Higher flexibility and lower cost But lower performance than other solutions Underlying Performance Flexibility Price TCAM High Low FPGA GPU Medial Many-core Processor Network Processor NP
7
Limitations of Prior Art
4/27/2017 Limitations of Prior Art Two kinds of parallel models for NIDS Data parallelism Advantages Thread isolation Disadvantages Memory consumption Reference Locality Data parallelism is achieved when each processor performs the same task on different pieces of distributed data.
8
Limitations of Prior Art
4/27/2017 Limitations of Prior Art Two kinds of parallel models for NIDS Function parallelism Advantages Fine-grained Reference locality Disadvantages Stage contentions Message transfer among stages Functional parallelism: focuses on distributing execution processes (threads) across different parallel computing nodes。The threads may execute the same or different code. In the general case, different execution threads communicate with one another as they work. Communication usually takes place by passing data from one thread to the next as part of a workflow.
9
Parallel Design Issues
4/27/2017 Parallel Design Issues Communication Contention Bottleneck Coherence, cooperation and communications Contention Bottleneck
10
Features of Many-core Processors
4/27/2017 Features of Many-core Processors Dozens of cores (TILERAGX with 36 cores) Accelerated hardware modules mPIPE: packet capturing engine User Dynamic Network (UDN): communication chip among cores ----- 会议笔记( :37) ----- 片上网络,有几种网络: 1、cache访问 2、UDN 传递信息 Example many-core processor (TILERAGX 36)
11
Our Approach Goal: Two Schemes High-performance Flexible Scalable
4/27/2017 Our Approach Performance Goal: High-performance Flexible Scalable Inexpensive Two Schemes Hybrid parallel scheme Hybrid load balancing scheme Inflexible Expensive Unscalable Hardware Designs Flexible High performance Inexpensive Scalable Flexible Inexpensive Software Designs Flexibility
12
Hybrid Parallel Scheme
4/27/2017 Hybrid Parallel Scheme Combination of two models Data parallel among Packet Processing Modules (PPM) Function parallel in PPM reference Packet Capture: receive packets and fills MSG Parse protocol: information and update the MSG accordingly
13
Hybrid Parallel Scheme
4/27/2017 Hybrid Parallel Scheme Shared Resource among PPMs Message (MSG) pool reference ----- 会议笔记( :37) ----- protocol processing
14
MSG POOL Contentions Due to the lock of MSG pool
4/27/2017 MSG POOL Contentions Due to the lock of MSG pool Exploit mPIPE to access to MSG pool in parallel Each packet has an individual MSG structure The Lock for MSG pool is eliminated as each RAW packet has its corresponding MSG This capture buffer index is ex- posed by the mPIPE and sent along with the packet de- scriptors to a thread managing the packet.
15
MSG Propagation Contentions
4/27/2017 MSG Propagation Contentions Due to MSG propagation among stages Exploit UDN to transfer MSG Higher bandwidth and lower latency Bandwidth latency UDN 60T bps (1 + core_hop) cycles Shared Memory Based Queue 170G bps L1 hit: 2 cycles L2 hit: 11 cycles Remote L2 hit: 40 cycles Main Memory: 80 cycles
16
Hybrid load balancing SCHEME
4/27/2017 Hybrid load balancing SCHEME Specifically, the first level load balancing scheme dispatches the incoming flows among the Packet Processing modules in the system. The second level load balancing scheme schedules the incoming flows among the concurrent Protocol Processing threads. In order to guarantee packets that belong to the same flow are handled by the same Protocol Processing thread, we use flow based scheme in the first level and the second level. In the third level, we use a Ruleset Partition Balancing(RPB) scheme to break the bottlenect First level: PPMs Flow based hashing for load balancing in mPIPE Second level: Protocol processing threads Flow based hashing for load balancing in pipeline Third level: Detection engine threads Rule partition balancing (RPB)
17
Rule partition balancing (RPB)
4/27/2017 Rule partition balancing (RPB) Each engine works on a sub-ruleset Offline partition Small detection engine Packet skipping If one engine finds any intrusion in a packet, the other engines can skip over it. See the details in our paper
18
Optimal Thread allocation for Each PPM
4/27/2017 Optimal Thread allocation for Each PPM 1.5 Mpps with 9 cores 1 Packet Capture thread 2 Protocol Processing threads 6 Detection Engine threads ----- 会议笔记( :46) ----- 1、首先看1 protocol processing,随着Detection Engine不断增加直至到达瓶颈 2、再看当 protocol数量不断增加时,基本线性增长
19
Background & Motivation Our Approach Evaluation Conclusion
4/27/2017 Outline Background & Motivation Our Approach Evaluation Conclusion
20
Evaluation platform TILERAGX36 processor
4/27/2017 Evaluation platform TILERAGX36 processor 1.2GHZ * 36 Suricata (Open Source NIDS) implementation Snort Ruleset 7571 rules Synthetic traffic generator
21
Throughput (9 cores per PPM, 4 PPMs)
4/27/2017 Throughput (9 cores per PPM, 4 PPMs) 7.2Gbps (100 Bytes packet)
22
4/27/2017 Comparision
23
Throughput-Cost 17.40 Mbps/$ 8 times larger than MIDeA
4/27/2017 Throughput-Cost 17.40 Mbps/$ 8 times larger than MIDeA 3 times larger than Kargus name Throughput (Gbps) Processor Cost ($) Through per dollar (Mbps/$) MIDeA 3.2 1138 2.8 Kargus 19.0 3164 6.0 Proposed design 11.0 650 17.4
24
Conclusion Two parallel designs NIDS Evaluation on TILERAGX 36
4/27/2017 Conclusion Two parallel designs Hybrid parallel scheme Hybrid load balancing scheme NIDS Evaluation on TILERAGX 36 High throughput per dollar cost
25
4/27/2017 Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.