Department of Computer Science and Engineering Applied Research Laboratory 1 A Hardware Based TCP/IP Processing Engine David V. Schuehler
Department of Computer Science and Engineering Applied Research Laboratory 2 Outline Problem Statement and Motivation Challenges Description of Architecture Traffic Analysis Current Results and Future Work
Department of Computer Science and Engineering Applied Research Laboratory 3 Background Transmission Control Protocol (TCP) provides a virtual bit pipe between two end nodes –Byte streams generated at source are delivered to destination –Connection oriented protocol –Retransmission services –Flow control services Internet Protocol (IP) provides message routing services –Datagram is supported transmission unit –Unreliable –Connectionless TCP is an important protocol –All interesting data on the Internet is transmitted via TCP
Department of Computer Science and Engineering Applied Research Laboratory 4 Problem Statement Given an arbitrary network, design a solution which provides access to TCP stream content at various locations within the network Core Router Edge Router A B
Department of Computer Science and Engineering Applied Research Laboratory 5 Objective Reconstruct TCP data streams from individual network packets –Operate at Internet backbone data rates OC-48 (2.5Gbps) and above –Support millions of simultaneous flows –Maintain per-flow context information –Provide enhanced flow management services –Contain a simple client interface
Department of Computer Science and Engineering Applied Research Laboratory 6 Target Platform Design must support implementation using logic and memory devices Example platform FPX card –Xilinx Virtex 2000E –2MB ZBT SRAM –1 GB SDRAM PC100 (100Mhz)
Department of Computer Science and Engineering Applied Research Laboratory 7 Motivation Most Internet traffic is TCP based Network solutions will require access to TCP data –Virus detection and elimination Viruses spread to machines world wide Consume computing resources Reduce network throughput –Content filtering 50% of traffic is spam Corporate security –Content based routing –Extensible networking solutions Stream reassembly is required –Processing packets separately provides insufficient coverage of network content
Department of Computer Science and Engineering Applied Research Laboratory 8 Related Work Software based approaches –tcpdump & httpdump –Ethereal –Internet Protocol Scanning Engine –Packet Scope & BLT (AT&T) –Cluster based online monitoring Hardware based approaches –TCP reassembly & state tracking (Georgia Tech)
Department of Computer Science and Engineering Applied Research Laboratory 9 Related Technologies Load balancing systems –Content (cookie) based request routing –Delayed binding technique –Limited to scanning start of flow Intrusion Detection Systems –Perform stream reassembly and content scanning –Traffic Rates < 1Gbps TCP offload engines –Move TCP protocol processing to NIC –Targeting Gigabit NIC market –Intel, NEC, Adaptec, Lucent, and others
Department of Computer Science and Engineering Applied Research Laboratory 10 Outline Problem Statement and Motivation Challenges Description of Architecture Traffic Analysis Current Results and Future Work
Department of Computer Science and Engineering Applied Research Laboratory 11 Challenges Matching individual packets to flows –96 bit exact match –High rate of insert & delete events –Operational environment requires high performance Resequencing of out-of-order packets –Passive solution Annotate sequence gap Forward packet & store data for later delivery to monitor –Active solutions Drop selected out-of-order packets Buffer packet for later in-order transmission Dealing with idle flows Handling resource exhaustion –Drop packet –Ignore packet –Resource reclamation Providing different levels service –Selectable on a per-flow basis Monitoring flows at core routers –Coordinating traffic amongst multiple nodes
Department of Computer Science and Engineering Applied Research Laboratory 12 Challenges (cont) Processing packet fragments –Passive or active solution –Reassemble original IP frame or process fragments –Jumbo frames (9k packets) Supporting large numbers of flows –Backbone links can carry millions of active flows Maintaining per-flow context information –Larger per-flow records support more complex solutions Providing enhanced flow manipulation features –Blocking and unblocking flows –Terminating and ensuring they are terminated –Support flow modification Monitoring bidirectional traffic –Alter response traffic based on request traffic Providing for advanced content manipulation –Altering previously processed data –Requires buffering –TCP slow start
Department of Computer Science and Engineering Applied Research Laboratory 13 Outline Problem Statement and Motivation Challenges Description of Architecture Traffic Analysis Current Results and Future Work
Department of Computer Science and Engineering Applied Research Laboratory 14 TCP Processing System
Department of Computer Science and Engineering Applied Research Laboratory 15 TCP Protocol Processing Engine
Department of Computer Science and Engineering Applied Research Laboratory 16 Simple interface Supports multiple retrieval algorithms 512 MByte SDRAM module 64 bytes of state per flow –32 bytes used by TCP Processing Engine –32 bytes available for Application 8 million active flows supported State Store Manager Features
Department of Computer Science and Engineering Applied Research Laboratory 17 State Store Manager
Department of Computer Science and Engineering Applied Research Laboratory 18 Per-Flow State Store Record
Department of Computer Science and Engineering Applied Research Laboratory 19 Hash Implementation Tradeoffs Unlimited hash entry chaining –Pro: Best option for fully monitoring all flows –Con:Poor worst case performance Excessive time required to perform lookup No hash entry chaining –Pro:Easy to implement Fast –Con:Potential for incomplete monitoring of flows Limited hash entry chaining –Pro:Bounded time to perform lookup –Con:Potential for incomplete monitoring of flows Excessive time required to perform lookup
Department of Computer Science and Engineering Applied Research Laboratory 20 Outline Problem Statement and Motivation Challenges Description of Architecture Traffic Analysis Current Results and Future Work
Department of Computer Science and Engineering Applied Research Laboratory 21 Flow Classification Analysis Analyze Internet backbone captures –Evaluate hashing functions –Detect traffic patterns National Laboratory for Applied Network Research (NLANR) –Difficulty in retrieving data sets
Department of Computer Science and Engineering Applied Research Laboratory 22 Hash Table Analysis SourceBWY (OC-3) MEM (OC-3) ADV (OC-3) TXS (OC-3) AIX (OC-12) ANL (OC-3) MEM (OC-3) COS (OC-3) Total Packets 2,975k227,641180,55717,171239,817163,267235,8982,668k TCP Packets2,247k 75% 13,232 6% 179,166 99% 1,858 11% 80,307 33% 117,8067 2% 23,649 17% 2,445k 92% TCP Flows27, , ,7106, ,997 Cache Hits2,146k10,608142,0311, ,88021,2652,140k Collisions (New-Old) 286 – – 665 – 100 – 015 – 3578 – 402 – – 1752 Table Usage17, ,313 Deepest Bucket
Department of Computer Science and Engineering Applied Research Laboratory 23 Consecutive Small Packets TCP data packets (0 < data length < 64) SourceBWY (OC-3) MEM (OC-3) ADV (OC-3) TXS (OC-3) AIX (OC-12) ANL (OC-3) MEM (OC-3) COS (OC-3) Small Packets 266,2391,3426, ,5637,5216,142116,711 Consecutive Small Pkts Min time between SP 020us1us236us01us2us0 Max time between SP 392us1.8ms9.4ms56ms3.5ms3.6ms4.5ms492us Avg time between SP 26us538us355us32ms349us452us260us29us
Department of Computer Science and Engineering Applied Research Laboratory 24 Minimum Length Packet Processing FPX operational environment Current technology
Department of Computer Science and Engineering Applied Research Laboratory 25 Average Length Packet Processing FPX operational environment Current technology
Department of Computer Science and Engineering Applied Research Laboratory 26 Outline Problem Statement and Motivation Challenges Description of Architecture Traffic Analysis Current Results and Future Work
Department of Computer Science and Engineering Applied Research Laboratory 27 Place & Route Results Including Protocol Wrappers & Application Number of BLOCKRAMs –68 out of 160(42%) Number of SLICEs –6579 out of 19200(34%) Minimum period: ns Maximum frequency: MHz
Department of Computer Science and Engineering Applied Research Laboratory 28 TCP Processing Circuit Layout for Xilinx Virtex 2000E
Department of Computer Science and Engineering Applied Research Laboratory 29 Future Work Multi-node coordinated monitoring Bi-directional flow monitoring Packet reordering schemes Resource exhaustion & resource reclamation Packet classification & lookup algorithms Selectable per-flow monitoring Performance enhancement Memory contention prevention Flow modification Extensible networking solution integration Worst case traffic loads Traffic analysis
Department of Computer Science and Engineering Applied Research Laboratory 30 Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler