Download presentation
Presentation is loading. Please wait.
1
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan
2
Why do we need PPE / TOE The problem is that TCP termination –Involves reconstructing a stream of coherent data from many independent packets –Compute-intensive task –Requires roughly 10 times performance as TCP routing –A 400-MHz MIPS CPU consumes all of its cycles trying to terminate a Fast Ethernet 100Mbps channel – A 200-MHz IXP1200 has similar TCP performance.
3
Packet Processing Engine Computing and Memory Resources –Necessary for communication processing –Scalable (throughput) –Extensible (Newer Protocols, and applications) –Programmable (changing Standards) Intel Xeon is extensible and programmable –Future (Multi core in a single chip) Particular idea why ETA is being researched
4
ETA S/W Architecture Host and Server Partitioning –Host General purpose OS and application processes –PPE All communication centric tasks are processed –Interface Asynchronous queues in a cache-coherent, shared host memory
5
ETA S/W Stack NATIVETCP/IPNATIVETCP/IP ACCELERATEDACCELERATED
6
ETA host-engine interface Set of queuing structures (DTI) DTI (Direct Transport Interfaces) –Based on Infiniband and VI Architecture –DTI also supports TCP connection commands –Buffer pools to buffer TCP streams –Parent DTIs listen on new TCP connections –When ETA host accepts a new connection a child DTI is created to service the TCP session
7
DTI Structure
8
Direct Transport Interface Send Queue, Receive Queue [Host to PPE, vice versa] Event Queue [Post Event notice to Host] Doorbells [Host writes signals directly to ETA PPE] Data buffers [ETA PPE buffers data when – Source / target buffers are not pre-conditioned – PPE receives TCP segments w/o receive descriptors on receive queue –TCP segments are out of order
9
ETA PPE SW ETA architecture: Independent of PPE implementation –Fixed device, a specialized engine, or a CPU –ETA aware PPE must support DTI structures –Execute packet processing function on behalf of host (termination of TCP / IP) –Support an interface to the network
10
The Prototype Dual Processor (Xeon CPUs) –Host CPU0 –PPE CPU1 Establish and terminate TCP/IP sessions on behalf of host –No special hardware developed –Use of standard tools –Gigabit Ethernet cards with modified drivers –Shared memory interface between host and PPE
11
SW Environment Linux Kernel 2.4 PPE SW is a loadable kernel module –Supports DTI –Affinity for one processor (CPU1) –Never yields control of processor, implying dedicated use of CPU1 as PPE –PPE polls NIC descriptors in shared memory –DTI structures in shared host memory –CPU and PPE communicate via doorbells
12
Hardware Platform The Prototype can run on any Linux multiprocessor kernel One server, with five Ethernet links Five clients are cots servers running Linux and TTCP
13
ETA Test Environment
14
Measurement and Analysis Comparison between ETA and standard Linux dual processor server –ETA leaves more than 80% of CPU idle –Tx throughput increases considerably –Receive performance lower, because ETA uses memory-memory copy from packet buffer to destination buffer
15
Performance with HT HT results in ~ 50% increase in Tx performance Receive performance lower, because ETA uses memory-memory copy from packet buffer to destination buffer ETA HT NoCopy: Test path, w/o data copy, enhanced Rx performance
16
Related Work TOEs have been developed –Devices attached to the server’s I/O subsystem –Use separate specialized processing and memory resources ETA uses processing and memory resources of the server instead EOP.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.