Download presentation
Presentation is loading. Please wait.
Published byMyles Geoffrey Dawson Modified over 9 years ago
1
A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet Project High-performance Computing & Simulation Research Lab ECE Department, University of Florida, Gainesville This work was supported by the U.S. National Science Foundation * Also affiliated with NSF Center for High Performance Reconfigurable Computing
2
Introduction INTERNET 2 of 25 Internet has grown at an alarming rate – 305% between 2000 and 2008
3
Introduction INTERNET Edge devices are left idle 75% of the time with power management features disabled to maintain network connectivity. IDLE 3 of 25
4
Introduction IDLE A solution to save power on the idle devices is power proxying The idle PC is allowed to sleep Z Z z z The PC delegates responsibility to the NIC to handle network traffic Additionally, NICs can enhance network security through Network Intrusion Detection INTERNET 4 of 25
5
Introduction Next Generation Interfaces – Also known as Smart NICs are expected to take increased network responsibility Key Requirement – Packet Inspection HEADER PAYLOAD Content Inspection Header Inspection Packet This presentation focuses on Content Inspection. Content inspection is the process of searching the payload of the packet for the occurrence of known set of patterns called signatures. 5 of 25
6
Software techniques cannot support high speed links with large signature sets FPGAs – Exploits Parallelism – Prohibitive price, area, and power for wide scale deployments TCAMs – Popular Option – Performance O(1) – However, prohibitive energy, price, and auxiliary data structure requirements for existing implementations. Motivation Existing Methodologies Software Hardware Boyer-MooreAho Corasick Wu Manber FPGAsTCAMsBloom Filters Boyer-MooreAho Corasick Wu Manber FPGAs Bloom Filters – Energy efficient and moderate throughput – False positives required further inspection on payload matching, imposes parallelism limits (scalability) TCAMsBloom Filters Auxiliary data structures such as SRAM are used to store pattern combinations to help determine a pattern match 6 of 25
7
Background – TCAM Methodology w = 4 A B C D E F G H A B C D J K L M E F G Sample Signature: A B C D E F G H A B C D J K L M E F G * When w=4: Prefix Pattern Suffix Pattern TCAM A B C D E F G H J K L M E F G * TCAMs are attractive candidates for pattern matching due to their inherent simplicity in pattern matching, small look up time, high throughput, high density, and scalability. 7 of 25
8
Background – TCAM Methodology w = 4 A B C D E F G H J K L M E F G * TCAM A B C D E F G H J K L M E F G U I Auxiliary SRAM structures contain several pattern permutations to identify valid patterns A B C D E F G H J K L M E F G U I O(N 2 ) – Auxiliary SRAM structure space requirement. Proposed by Lakshman et. al Gao et. al reduced this requirement to O(NlogN) by storing address permutations. Auxiliary SRAM Structures Combined Pattern Table Matching Table Partial Hit List Matched Index Stores information on type of matched pattern i.e, prefix, suffix Stores the valid combination of all possible prefix and suffix entries Records the index of the constructed prefix pattern 8 of 25
9
Proposed Solution Simplest and fastest technique - O(1) look up. Can match future speed limits of 10 Gbps. Highly scalable with no parallelism limits. Can accommodate signatures of varying length and different signature set sizes with ease TCAM Techniques are : However they suffer from : Increased energy consumption Prohibitive price Increased auxiliary data structure requirements Making them unsuitable for wide scale deployment in SNICs 9 of 25
10
We propose a hybrid TCAM based solution Our Technique solves Energy efficiency – Through partitioned architecture Proposed Solution More suitable for wide scale deployment due to high energy efficiency and reduced memory requirements. Meets throughput requirements of high speed links such as 1 Gbps/ 10 Gbps with ease Additional further reduction in power consumption through caching by exploiting network locality Auxiliary data structure requirement reduction using bloom filter or software techniques 10 of 25
11
STCAM E F G H A B C D J K L M E F G * Hybrid TCAM Methodology Partition the single TCAM into a prefix TCAM (PTCAM) and a suffix TCAM (STCAM) w = 4 TCAM Store signatures in the STCAM and PTCAM accordingly. The signature is then expressed as permutation of STCAM and PTCAM address. PTCAM w = 4 A B C D E F G H A B C D J K L M E F G P0 S0 S1 S2 S3 This permutation is then stored in bloom filter or in software PTCAMSTCAM A B C D E F G H J K L M E F G * 11 of 25
12
Our experimentation indicates that there exists sufficient locality in network traces. To reduce unwanted switching we exploit this property and introduce a cache between the PTCAM and STCAM Exploiting Signature Locality 12 of 25
13
PTCAMSTCAM E F G H A B C D J K L M E F G * A B C D w = 4 PTCAM A B C D w = 4 STCAM E F G H A B C D J K L M E F G * w = 4 Suffix Cache $ Ctrl Hybrid TCAM Methodology 13 of 25
14
PTCAM w = 4 STCAM E F G H A B C D J K L M E F G * w = 4 Suffix Cache $ Ctrl A B C D The cache is activated (w-1) clock cycles after a TCAM hit Activator Right Shift 1000 Enabler Enable 0 th..(w-1)th Enable Buffer Hit Miss Enable Hit Pause A cache miss pauses shifting to allow searching the suffix TCAM for the pattern A B C D E F G H J K L M E F G U I Left Shift Payload is fed to the inspection system, shifted at the rate of 1 byte/clock Cache controller ($ ctrl) updates suffix cache Hybrid TCAM Methodology 14 of 25
15
PTCAM w = 4 STCAM E F G H A B C D J K L M E F G * w = 4 Suffix Cache $ Ctrl A B C D Activator Right Shift 1000 Enabler Enable 0 th..(w-1)th Enable Buffer Hit Miss Enable Hit Pause A B C D E F G H J K L M E F G U I Left Shift 11 P1 01 S1 00… … 01 S1 00 Left Shift P1S1 ……… To Bloom Filter or Software unit to verify the combination Hybrid TCAM Methodology 15 of 25
16
PTCAM w = 4 STCAM E F G H A B C D J K L M E F G * w = 4 Suffix Cache $ Ctrl A B C D Activator Right Shift 1000 Enabler Enable 0 th..(w-1)th Enable Buffer Hit Miss Enable Hit Pause A B C D E F G H J K L M E F G U I Left Shift 11 P1 01 S1 00… … 01 S1 00 Left Shift A contention resolution unit handles contention between identical PTCAM and STCAM patterns. Preference is given to PTCAM match over STCAM match Contention Resolution Match Addr Match Addr Hit Hybrid TCAM Methodology 16 of 25
17
Experimental Setup Packet traces – Malicious traces from MIT – LL and capture the flag contest from DEFCON Festival No available power proxying traces and is an ongoing research C-based custom simulator written to behaviorally simulate the entire system. Packets are reassembled and fed to the simulator STCAM accesses saved to analyze the effect of caching TCAM energy consumption obtained from Agarwal et. al TCAM modelling tool SNORT and ClamAV used as signature sets 17 of 25
18
Results – Signature Distribution ClamAV and SNORT rule sets : SNORT smaller patterns (70% <= 4 bytes ClamAV medium sized patterns (72% 100 bytes) 18 of 25
19
Results Effect of partitioning on Size Partitioning circumvents natural TCAM compression. However, negligible increase in TCAM size. 19 of 25
20
Results EDP Reduction Partitioning reduces Energy-Delay Product (EDP). Two smaller TCAMs are faster than One single big TCAM. Higher EDP savings for widths of 8 and 16 bytes. 20 of 25
21
Energy Savings Results 1.Energy reduction for a partitioned system compared to a non-partitioned system verses TCAM width for real-time traffic traces. 2.Energy savings range from 6% to 69% (SNORT) and 6% to 87% (ClamAV) 3.Smaller TCAMs widths give greater energy savings. 4.Larger TCAM accesses use more “don’t care” bits. 21 of 25
22
Results Effect of Caching – Hit rate 1.Caching on STCAM width of 4 bytes analyzed. 2.Hit rates range from 28% to 88% for cache sizes of only 40 to 60 entries 3.A cache containing 40 to 60 entries represents only 0.002% to 0.004%, respectively, of the S_TCAM entries 22 of 25
23
Results Energy savings for a partitioned TCAM system (w=4) with a suffix cache compared to a partitioned TCAM system with no suffix cache for varying number of cache entries. 13% to 64% additional Savings Effect of Caching – Energy Savings 23 of 25
24
Conclusion 1.Developed an energy efficient partitioned TCAM-based content inspection system for SNICs. 2.Energy and throughput aware 3.Energy Delay Product improvements of up to 62% compared to previous non- partitioned TCAM systems. 4.Up to 87% energy savings (average) compared to a non-partitioned TCAM system. 5.A simple cache with a random replacement policy further reduces the energy consumption by 64% compared to a partitioned TCAM system. 6.Caching incurs a throughput reduction of 5.5%. 24 of 25
25
1. Evaluating proposed bloom filter based architecture 2. Improved caching techniques 3. Attack robustness to counter maliciously engineered packets 4. A pipelined architecture to hide cache misses and improve throughput. Future Work 25 of 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.