Design principles for packet parsers

Slides:



Advertisements
Similar presentations
Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick.
Advertisements

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Fast forwarding table lookup exploiting GPU memory architecture Author : Youngjun Lee,Minseon Jeong,Sanghwan Lee,Eun-Jin Im Publisher : Information and.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Regular Expression Matching for Reconfigurable Packet Inspection Authors: Jo˜ao Bispo, Ioannis Sourdis, Jo˜ao M.P. Cardoso and Stamatis Vassiliadis Publisher:
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, D. Walker SIGCOMM CCR, 2014 Presented.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Shadow MACs: Scalable Label- switching for Commodity Ethernet Author: Kanak Agarwal, John Carter, Eric Rozner and Colin Dixon Publisher: HotSDN 2014 Presenter:
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.
PC-TRIO: A Power Efficient TACM Architecture for Packet Classifiers Author: Tania Banerjee, Sartaj Sahni, Gunasekaran Seetharaman Publisher: IEEE Computer.
LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.
1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.
Chapter 3 Data Representation
Computer Organization
Computer Organization and Architecture + Networks
A DFA with Extended Character-Set for Fast Deep Packet Inspection
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
Lecture 12 Logic Design Review & HCL & Bomb Lab
Cache Memory Presentation I
Regular Expression Matching in Reconfigurable Hardware
2018/11/19 Source Routing with Protocol-oblivious Forwarding to Enable Efficient e-Health Data Transfer Author: Shengru Li, Daoyun Hu, Wenjian Fang and.
P4-to-VHDL: Automatic Generation of 100 Gbps Packet Parsers
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
SigMatch Fast and Scalable Multi-Pattern Matching
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
Scalable Memory-Less Architecture for String Matching With FPGAs
2019/1/1 High Performance Intrusion Detection Using HTTP-Based Payload Aggregation 2017 IEEE 42nd Conference on Local Computer Networks (LCN) Author: Felix.
Lecture 3: Main Memory.
Implementing an OpenFlow Switch on the NetFPGA platform
Memory-Efficient Regular Expression Search Using State Merging
Virtual TCAM for Data Center Switches
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Scalable Multi-Match Packet Classification Using TCAM and SRAM
A New String Matching Algorithm Based on Logical Indexing
Project proposal: Questions to answer
Computer Organization & Architecture 3416
2019/5/2 Using Path Label Routing in Wide Area Software-Defined Networks with OpenFlow ICNP = International Conference on Network Protocols Presenter:Hung-Yen.
Compact DFA Structure for Multiple Regular Expressions Matching
2019/5/5 A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata Author: Hsiang-Jen Tsai, Chien-Chih Chen, Yin-Chi Peng,
2019/5/8 BitCoding Network Traffic Classification Through Encoded Bit Level Signatures Author: Neminath Hubballi, Mayank Swarnkar Publisher/Conference:
Pipelined Architecture for Multi-String Matching
Power-efficient range-match-based packet classification on FPGA
Large-scale Packet Classification on FPGA
OpenSec:Policy-Based Security Using Software-Defined Networking
Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp
A Hybrid IP Lookup Architecture with Fast Updates
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Towards TCAM-based Scalable Virtual Routers
Packet Classification Using Binary Content Addressable Memory
2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter:Hung-Yen Wang Authors:Ran Ben Basat, Xiaoqi Chen,
Presentation transcript:

Design principles for packet parsers 2019/7/18 Design principles for packet parsers Author: Glen Gibb, George Varghese, Mark Horowitz, Nick McKeown Publisher/Conf.: Architectures for Networking and Communications Systems, Oct 2013, pp. 13-24 Presenter: 林鈺航 Date: 2018/10/24 1 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Introduction All network devices must parse packet headers to decide how packets should be processed. In this paper, we describe trade-offs in parser design, identify design principles for switch and router designers. To parse a packet, a network device has to identify the headers in sequence before extracting and processing specific fields. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Introduction Regular expression matching work is not applicable: parsing processes a subset of each packet under the control of a parse graph, while regex matching scans the entire packet for matching expressions. The goal of this paper is to educate designers as to how parser design decisions impact area, speed, and power via a design space exploration, considering both hard-coded and reconfigurable designs. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Performance Requirements The parser must operate at line rate for worst-case traffic patterns in applications such as an Ethernet switch. Failure to parse at line rate causes packet loss when ingress buffers overflow. Many switches today operate with very aggressive packet ingress-to-egress latency targets. All elements of the switch, including the parser, must be designed to avoid excessive latency. This implies that packets should be parsed as they are received rather than buffering the entire packet (or header region) before commencing parsing. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Parsing Parsing is the process of identifying headers and extracting fields for processing by subsequent stages of the device. Parsing is inherently sequential: each header is identified by the preceding header, requiring headers to be identified in sequence. The structure of each header type is known by the parser, allowing it to identify the location of field(s) indicating the current header length and the next header type. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Parse Graph A parse graph expresses the header sequences recognized by a switch or seen within a network. Parse graphs are directed acyclic graphs, with vertices representing header types and edges specifying the sequential ordering of headers. The parse graph is the state machine that sequentially identifies the header sequence within a packet. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Parse Graph National Cheng Kung University CSIE Computer & Internet Architecture Lab

Abstract Parser Model The Header identification module identifies headers and informs the field extraction module of header type and location information. The Field extraction module extracts fields and sends them to the field buffer module. The Field buffer accumulates extracted fields, sending them to subsequent stages within the device once all fields are extracted. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Header Identification National Cheng Kung University CSIE Computer & Internet Architecture Lab

Field Extraction Chosen fields are extracted from identified headers. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Fixed Parser and Programmable Parser A fixed parser processes a single parse graph chosen at design-time. The chosen parse graph is a union of the parse graphs for each use case the switch is designed to support. A programmable parser uses a parse graph that is specified at run-time. We've chosen a state table driven approach for simplicity of understanding and implementation. State tables can be easily implemented in RAM and/or TCAM. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Header Identification – Fix Parser The State machine implements the chosen parse graph, and the Buffer stores received packet data prior to identification. The Header processors exist for each header type that reads the length and next-header fields, identifying the length of the header and the subsequent header type. The Sequence resolution element is a simple MUX. The MUX selects the processor output corresponding to the current header type. State machine transitions are determined by the MUX output. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Field Extraction – Fix Parser A Buffer stores data while awaiting header identification. The fields to extract for each header type are stored in a table. A simple State machine manages the extraction process: waiting for header identification, looking up identified header types in the extract table, and extracting the specified fields from the packet. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Programmable Parser The TCAM stores bit sequences that identify headers. The RAM stores next state information, fields to extract, and any other data needed during parsing. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Header identification - Programmable Parser The current state and a subset of bytes from the buffer are sent to the TCAM, which returns the first matching entry. The RAM entry corresponding to the matching TCAM entry is read and specifies the next state for the header identification module and the headers that were matched. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Parse Table Structure The parse table is a state transition table, Each edge in the parse graph is a state transition, thus each edge must be encoded as a parse table entry. Rather than using all packet data as input to the parse table, only fields indicating the header length and next header type are input. The lookup offsets indicate which fields to use as inputs to the parse table. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Efficient Table Generation The number of table entries can be reduced by encoding multiple transitions within a single entry, thereby identifying multiple headers. Combining multiple transitions within a table entry can be viewed as node clustering or merging. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Improving Table Generation Each subgraph is processed independently by the algorithm, resulting in different sets of entries being generated for some overlapping sections of sub-graphs, resulting in more entries than required. Suboptimal solutions occur only when multiple ingress edges lead to a node.  National Cheng Kung University CSIE Computer & Internet Architecture Lab

Design Principle We begin by describing a parser generator that allows exploration of the design space. We then present a number of design principles that we identified through our design space exploration. Generator parameters include the parse graph, the processing width, the type (fixed or programmable), the field buffer depth, and the size of the programmable TCAM/RAM. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Generator Operation - Fixed Parser Two parameters are of particular importance when generating a fixed parser: the parse graph and the processing width. The parse graph is specified as a text file containing a description of each header type. The field extract table is generated using the fields tagged for extraction in the parse graph description. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Generator Operation - Fixed Parser Header-specific processors are created by the generator for each header type. The generator partitions the parse graph using the target processing width,  each region indicates headers that may be seen within a single cycle. The shaded region could have included the first four bytes of the upper IPv4 header. However, the parser would then have required two IPv4 processors. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Generator Operation - Programmable Parser Parameters important in generating a programmable parser include the processing width, the parse table dimensions, the window size, and the number and size of parse table lookups. Parameters are used by the generator to determine component sizes and counts. For example, the window size determines the input buffer depth within the header identification component and the number of mux inputs used when extracting fields for lookup in the parse table. Similarly, the number of parse table inputs determines the number of muxes required to extract inputs. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Fixed Parser Design Principles Principle: the Processing Width of a Single Parser Instance Trades Area for Power A single parser instance's throughput is determined by r = w×f, where r is the rate or throughput, w is the processing width, and f is the clock frequency. Increasing the processing width increases area as additional resources are required to process the additional data. Power decreases because the clock frequency decreases (and decreases faster than the amount of logic increases). National Cheng Kung University CSIE Computer & Internet Architecture Lab

Fixed Parser Design Principles Principle: Use Fewer Faster Parsers When Aggregating Parser Instances The graph shows a small power advantage of using fewer faster parser over many slower parsers to achieve the desired aggregate throughput. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Fixed Parser Design Principles Principle: Field Buffers Dominate Area Field buffers dominate parser area, accounting for approximately two-thirds of total area. There is little flexibility in the design of the field buffer: it must be built from an array of registers to allow extracted fields to be sent in parallel to downstream components National Cheng Kung University CSIE Computer & Internet Architecture Lab

Fixed Parser Design Principles Principle: a Parse Graph's Extracted Bit Count Determines Parser Area (for a Fixed Processing width) This graph consists of three nodes but extracts the same number of bits as the big-union graph.  This principle follows from the previous principle: the number of extracted bits determines the field buffer depth, and the field buffer dominates total parser area, thus the number of extracted bits should approximately determine the total parser area. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Programmable Parser Design Principles Principle: Parser State Table and Field Buffer Area Are the Same Order of Magnitude Both parsers include 4 Kb field buffers for comparison, and the programmable parser includes a 256 ×40 b TCAM. It is important to note that a fixed parser is sized to accommodate the chosen parse graph, while a programmable parser must be sized to accommodate all expected parse graphs. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Programmable Parser Design Principles Principle: Minimize the Number of Parse Table Lookup Inputs Increasing the number of parse table lookup inputs allows more headers to be identified per cycle, potentially decreasing the total number of table entries. However, the cost of an additional lookup is paid by every entry in the table, regardless of the number of lookups required by the entry. National Cheng Kung University CSIE Computer & Internet Architecture Lab