Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC.
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
Multi-Core Packet Scattering to Disentangle Performance Bottlenecks Yehuda Afek Tel-Aviv University.
Deep Packet Inspection as a Service Yaron Koral† Joint work with Anat Bremler-Barr‡, Yotam Harchol† and David Hay† †The Hebrew University, Israel ‡IDC.
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral, Shimrit Tzur David, David Hay Publisher:
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
MCA 2: Multi Core Architecture for Mitigating Complexity Attacks Yaron Koral (TAU) Joint work with: Yehuda Afek (TAU), Anat Bremler-Barr (IDC), David Hay.
Reviewer: Jing Lu Gigabit Rate Packet Pattern- Matching Using TCAM Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004.
SDN and Openflow.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Snort - an network intrusion prevention and detection system Student: Yue Jiang Professor: Dr. Bojan Cukic CS665 class presentation.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
PEDS: Parallel Error Detection Scheme for TCAM Devices David Hay, Politecnico di Torino Joint work with Anat Bremler Barr (IDC), Danny Hendler (BGU) and.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Introduction.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.
1 Intrusion Detection Systems. 2 Intrusion Detection Intrusion is any use or attempted use of a system that exceeds authentication limits Intrusions are.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Workpackage 3 New security algorithm design ICS-FORTH Paris, 30 th June 2008.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
PEDS: Parallel Error Detection Scheme for TCAM Devices David Hay, Politecnico di Torino Joint work with Anat Bremler Barr (IDC, Israel), Danny Hendler.
Deep Packet Inspection as a Service Anat Bremler-Barr IDC Herzliya Joint work with Yotam Harchol, David Hay and Yaron Koral The Hebrew University Appeared.
INTERNATIONAL NETWORKS At Indiana University Hans Addleman TransPAC Engineer, International Networks University Information Technology Services Indiana.
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,
ORange: Multi Field OpenFlow based Range Classifier Liron Schiff Tel Aviv University Yehuda Afek Tel Aviv University Anat Bremler-Barr Inter Disciplinary.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer.
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Deep Packet Inspection as a Service Author : Anat Bremler-Barr, Yotam Harchol, David Hay and Yaron Koral Conference: ACM 10th International Conference.
Author Name Security and Networks Research Group Department of Computer Science Rhodes University SNRG SLIDE TEMPLATE.
Snort – IDS / IPS.
Yotam Harchol The Hebrew University of Jerusalem
The DPIaaS Controller Prototype
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Yotam Harchol The Hebrew University of Jerusalem, Israel
SPEAKER: Yu-Shan Chou ADVISOR: DR. Kai-Wei Ke
Memento: Making Sliding Windows Efficient for Heavy Hitters
Compact DFA Structure for Multiple Regular Expressions Matching
Pipelined Architecture for Multi-String Matching
Yotam Harchol The Hebrew University of Jerusalem, Israel
Author: Yaron Weinsberg ,Shimrit Tzur-David ,Danny Dolev and Tal Anker
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David Hay (HUJI) www.deepness-lab.org

Deepness Lab was founded in November 2010 Our mission: Deep Packet Inspection (DPI) for Next Generation Network devices Funding: 5 years ERC Starting Grant (1M Euro) 3 years Kabarnit, a Magnet program ($70K/year) A gift from Cisco ($75K) Main Industry Collaborations: Commtouch, Radware, Verint

People Faculty: Anat Bremler-Barr (IDC Herzliya), David Hay(The Hebrew University of Jerusalem) Postdoc : Shimrit Tzur-David, Yaron Koral Ph.D. Students Liron Schiff (Tel Aviv University), Yotam Harchol (The Hebrew University of Jerusalem) Collaborators: Yehuda Afek (Tel Aviv University), Isaac Keslassy (Technion),Shir Landau-Feibish (Tel Aviv University) Past Students Victor Zigdon, M.Sc. (IDC Herzliya),Adam Mor, M.Sc. (IDC Herzliya)

People Dr. Anat Bremler-Barr - Ph.D. with distinction, Tel-Aviv University, Israel (2001). Founder and chief scientist of Riverhead Networks (focused on distributed denial of service solution, and was acquired by Cisco). Senior lecturer (assistant professor) with tenure at IDC. Dr. David Hay - Ph.D. from the Technion (2007). Post-doc at Columbia University, NY, USA and Politecnico di Torino. Previously, also at IBM Research and Cisco San Jose. Senior lecturer (assistant professor) at the Hebrew U.

Deep Packet Inspection (DPI) DPI - Identifying signatures (patterns or regular expressions) in the packets’ payload DPI is the main action taken to inspect traffic and therefore it is a critical component in next generation networks: security, content filtering, traffic monitoring, load balancing, lawful interception, targeted advertising, data leakage prevention, application-aware routing …. High-speed DPI is challenging and quickly becomes the bottleneck of the entire packet inspection process. resulting in security holes and/or limited/ineffective functionalities.

Impact 66% of network network equipment vendors define DPI as “a must have” technology today [Heavy Reading Survey, 2011] DPI market on 2011 estimated at $550 million, growth of 20%/year [Qosmos report, Heavy Reading, Dec. 2012]

Major Challenges Scalability: Compressed traffic Rate - greater than 10 or even 100 Gbps Memory - handling thousands of signatures Power - educing the high power consumption Compressed traffic Security of the NIDS itself: Current solutions are vulnerable to Denial of Service attack DPI in Software Defined Networks Signatures Extraction SDN – needs to determine flows, DPI can help, will play a significant role…

Classical Algorithms

Aho-Corasick Algorithm B E C D C E D B A Build a Deterministic Finite Automaton Traverse the DFA, byte by byte Accepting state  pattern found Example: {E, BE, BD, BCD, CDBCAB, BCAA} s2 s5 s6 s9 B s10 The standard algorithm for exact string matching in DPI is Aho-Corasick Its idea is to build a Deterministic Finite Automaton that when traversing it we recognize patterns Each state corresponds to the longest prefix, that is the suffix of the current input To build the automaton one builds a trie over the alphabet and it should contain all possible transitions Usually, the automaton remains in the higher levels, under real-life traffic (about 10% of the states, 85% of the time) s11 s12 BCDBCAB

Aho-Corasick Algorithm B C D E S0 2 7 1 S1 S2 5 4 3 S3 S4 S5 13 6 S6 9 S7 8 S8 : Naïve implementation: Represent the transition function in a table of |Σ|×|S| entries Σ: alphabet S: set of states Lookup time: one memory access per input symbol Space: In reality: 70MB to gigabytes… Snort has 77K states, ClamAV over 1M One way to represent this automaton is a table with (alphabet size) multiplied by (number of states) entries It looks like the lookup time of this method is constant and so it will give a constant throughput

Alternative Implementation s0 s7 s12 s1 s2 s3 s5 s4 s14 s13 s6 s8 s9 s10 s11 s0 Forward Transition Failure Transition B E C D C E D B A s1 s7 Failure transition goes to the state that matches the longest suffix of the input so far Lookup time: at most two memory accesses per input symbol (via amortized analysis) Space: at most, # of symbols in pattern set, depends on implementation s5 s10 Let's take away the extra transitions that make the automaton so big We'll add, instead, different transitions, call them "failure transitions". They point where we should go if we did not find a matching forward transition. Without consuming an input symbol. (Failure transition points to the state with the longest common suffix of current state's label) Why two memory accesses? Because failure transitions always go up the tree, so we will at most go up what we went down before.

Other Alternative: Compress the State Representation symbol A B C D E forward: 13 6 symbol A D forward: 13 6 failure: 7 match: False failure: 7 match: False size: 2 Lookup Table s0 s7 s12 s1 s2 s3 s5 s4 C E D B s14 s13 s6 s8 s9 s10 A s11 s0 s7 s12 s1 s2 s3 s5 s4 C E D B s14 s13 s6 s8 s9 s10 A s11 Linear Encoded A B C D E 1 Bitmap: Can count bits using popcnt instruction Length=|Σ| How do we represent a state in this method? We want: smaller representation, to fit in cache, AND fast lookup These are the known methods. In bitmap – say "popcnt" Now – how can we make the automaton even smaller? forward: 13 6 failure: 7 match: False Bitmap Encoded

The Boyer-Moore (BM) Algorithm Shift-based single-pattern search Main idea by example: Shifts of size m or close to it occur most of the times, leading to a very fast algorithm Shift Table otherwise t h g i r b Char 6 (m) 1 2 3 4 5 Shift

Compressed Traffic

Compressed HTTP 84.1% of the top 1,000 sites compress their traffic. Data compression is done by adding references to repeated data. There are two types of compression: Intra-response compression – the references point to bytes within the response (Gzip/Deflate) Inter-responses/connections compression – the references point to bytes in a separate file, called dictionary (Google’s SDCH). 19% increase in 8 month! There is a paper the handles the intra-response infocome 2009 ref We exploit this repetitions to facilitate the dpi process

Challenges Current security tools do not deal with compressed traffic due to the great challenges in time and space

Compressed Traffic : Space Challenge Thousands of concurrent sessions Compressed, Mem: 32KB/session Uncompressed Traffic DPI unzip Space Time 80% 40% Contribution: Improve

Compressed Traffic : Time Challenge General belief: Our algorithms show how to accelerate the pattern matching using the compression information Decompression + pattern matching >> pattern matching Decompression + pattern matching < pattern matching 18

High-Level Idea Compression is done by compressing repeated sequences of bytes Store information about the pattern matching results  No need to fully perform again pattern matching on repeated sequences which were already scanned  x2-3 time reduction The buffers needed for decompression are not used most of the time, and therefore can be kept in compressed form most of time  x5 space reduction 19

General Idea: Keep “compressed” buffer New Packet active session buffer Compressed Keep buffers in a “compressed” form Uncompress “active session” only unzip

Results Reduction of space by factor of 5! Speedup by factor of 2 or 3 (in GZIP and SDCH)

Experimental Results: DPI +Packing Unzip entire session. Avg. Size = 170KB SOP 1.39, 5.17KB SOP+ACCH 0.64, 6.19KB Naïve 1.1, 29KB ACCH 0.36, 37.4KB

The Other Side of the Coin: Acceleration by Identifying repetitions in uncompressed Traffic There are repetitions in uncompressed HTTP traffic Entire files (e.g., images) Parts of the files (e.g., HTML tags, javascripts) We keep scanning again and again the same thing (and get the same scanning results..) Identify frequently repeated data Stored in a dictionary Perform DPI on the data once and remember the results DPI by pattern matching Aho-Corasick algorithm. Result is the state. When encountering a repetition, recover the state without re-scanning Delicate points need to be taken care of, so we won’t miss any pattern

Securing the NIDS Itself

Complexity DoS Attack Over NIDS Easy to craft – very hard to process packets 2 Steps attack: Attacker 1. Kill IPS/FW Internet 2. Sneak into the network

Attack on Security Elements Combined Attack: DDoS on Security Element exposed the network – theft of customers’ information

Attack on Snort The most widely deployed IDS/IPS worldwide. Heavy packets rate

OUR GOAL: A multi-core system architecture, which is robust against complexity DDoS attacks

System Throughput Over Time Reaction time can be smaller

System Architecture Routine Mode: Load balance between cores Detects heavy packets NIC Core #1 Q Core #2 Q Processor Chip Routine Mode: Load balance between cores Core #8 Q Core #9 Q Core #10 Q

System Architecture Alert Mode: Dedicated cores for heavy packets Detects heavy packets NIC Core #1 Q Core #2 Q Processor Chip Alert Mode: Dedicated cores for heavy packets Others detect and move heavy to Dedicated. Core #8 Q Dedicated Core #9 B B Q B Dedicated Core #10 Q B

Cloud solution The different cores are different (virtual) machines. Load balancing sends heavy packets to machines that run a special more efficient processing method. In SDN, this can be done even faster and easier.

DPI using TCAMs

TCAM – Ternary Content- Addressable Memory Action 1 1 2 3 4 5 6 7 8 9 deny 1 2 3 4 6 5 7 8 9 1110101010100101001********1111 deny 1110101010100101001*******11111 1110101010100101001*********011 accept deny 0011101010********************* 3 accept 1110*********0101001010101010** Encoder deny 1110101010100101001************ deny *************************001110 deny 0011101010101****************** De-facto solution of packet classification. Core component of SDN switch log 1111111111111111111111111111*** accept ******************************* Match lines TCAM SRAM Search Key 0011101010101001110001110001110 34 34

Some Challenges In Using TCAM Reducing the number of entries  power consumption reduction Dealing with ranges (how to encode the range [1-6]?) How to correct errors? More about it in the next slide How to use it for non-traditional tasks Traditionally, TCAM is used for IP lookup and header classification (e.g., using 5-tuples)

Example: Error Correction in TCAM In SRAM (or any regular memory) Input: address (entry number) Output: content of that address One can apply an error detection/correcting code on that content In TCAM Even if the content seems OK, we still have false miss or indirect false miss errors, TCAM EDC/ECC are harder

PEDS: Parallel Error Detection Scheme for TCAM Devices Detecting all errors using the built-in parallel lookup of the TCAM The number of lookups is a function of the width of the TCAM word, and not the number of entries in the database. Which is 3 orders of magnitude larger Developed, patented in DEEPNESS lab

CompactDFA for DPI Using TCAM to represent a huge DFA in a compact manner. Reducing the problem of pattern matching to IP lookup (much easier problem) Each byte scan  one TCAM lookup Can be reduced using variable stride traversal Further performance boost with parallelism and pipelining 38

DFA  CompactDFA Longest Prefix Match Snort: 73MB  0.6MB TCAM SRAM Next State Sym Current 0000 (s0) A 1 0110(s6) B 0000(s0) 2 1100 (s12) C 3 D 4 0001(s1) E 5 F 6 7 0010 (s2) 8 9 10 11 12 0010(s2) 13 0100 (s4) 14 0011 (s3) 15 16 0000 (s0) 1101 (s13) 84 Longest Prefix Match DFA  CompactDFA Snort: 73MB  0.6MB ClamAV: 1.5GB  26MB

Signature Extraction

Current DDoS Attack Armies of zombies  Many sources Hard to identify behaviorally No known signatures Zombies on innocent computers Infrastructure-level DDoS attacks Server-level DDoS attacks Bandwidth-level DDoS attacks

Automated Extraction of Signatures for Zero-day Internet Attacks Input: sample of attack traffic (high volume attack) sample of normal traffic Output: Automatically find signatures that appear frequently only during attack Where: Input collection: In mitigation apparatus (DDoS Guard/firewall/anti-DDoS etc.) In the cloud – collect data from several collectors. DDoS – power computation saving Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks