Presentation is loading. Please wait.

Presentation is loading. Please wait.

TCP Servers: Offloading TCP/IP Processing in Internet Servers

Similar presentations


Presentation on theme: "TCP Servers: Offloading TCP/IP Processing in Internet Servers"— Presentation transcript:

1 TCP Servers: Offloading TCP/IP Processing in Internet Servers
Liviu Iftode Department of Computer Science University of Maryland and Rutgers University

2 My Research: Network-Centric Systems
TCP Servers and Split-OS [NSF CAREER] Migratory TCP and Service Continuations Federated File Systems Smart Messages [NSF ITR-2] and Spatial Programming for Networks of Embedded Systems

3 Networking and Performance
IP Network TCP WAN Internet Servers S S Storage Networks SAN IP or not IP ? TCP or not TCP? D D D The transport-layer protocol must be efficient

4 The Scalability Problem
Apache web server on 1 Way and 2 Way 300 MHz Intel Pentium II SMP repeatedly accessing a static16 KB file

5 Breakdown of CPU Time for Apache

6 The TCP/IP Stack APPLICATION SYSTEM CALLS SEND
copy_from_application_buffers TCP_send IP_send packet_scheduler setup_DMA RECEIVE copy_to_application_buffers TCP_receive IP_receive software_interrupt_handler hardware_interrupt_handler packet_in KERNEL packet_out

7 Breakdown of CPU Time for Apache

8 Serialized Networking Actions
APPLICATION SYSTEM CALLS SEND copy_from_application_buffers TCP_send IP_send packet_scheduler setup_DMA packet_out RECEIVE copy_to_application_buffers TCP_receive IP_receive software_interrupt_handler hardware_interrupt_handler packet_in Serialized Operations

9 TCP/IP Processing is Very Expensive
Protocol processing can take up to 70% of the CPU cycles For Apache web server on uniprocessors [Hu 97] Can lead to Receive Livelock [Mogul 95] Interrupt handling consumes a significant amount of time Soft Timers [Aron 99] Serialization affects scalability

10 Outline Motivation TCP Offloading using TCP Server
TCP Server for SMP Servers TCP Server for Cluster-based Servers Prototype Evaluation

11 TCP Offloading Approach
Offload network processing from application hosts to dedicated processors/nodes/I-NICs Reduce OS intrusion network interrupt handling context switches serializations in the networking stack cache and TLB pollution Should adapt to changing load conditions Software or hardware solution?

12 The TCP Server Idea CLIENT SERVER TCP/IP OS FAST COMMUNICATION
Host Processor TCP Server TCP/IP Application OS CLIENT FAST COMMUNICATION SERVER

13 TCP Server Performance Factors
Efficiency of the TCP server implementation event-based server, no interrupts Efficiency of communication between host(s) and TCP server non-intrusive, low-overhead API asynchronous, zero-copy Adaptiveness to load

14 TCP Servers for Multiprocessor Systems
CPU 0 CPU N TCP Server Application Host OS CLIENT SHARED MEMORY Multiprocessor (SMP) Server

15 TCP Servers for Clusters with Memory-to-Memory Interconnects
Host TCP Server Application CLIENT MEMORY-to-MEMORY INTERCONNECT Cluster-based Server

16 TCP Servers for Multiprocessor Servers

17 SMP-based Implementation
TCP Server Application Host OS IO APIC Disk & Other Interrupts Network and Clock Interrupts Interrupts

18 SMP-based Implementation (cont’d)
TCP Server Application Host OS ENQUEUE SEND REQUEST DEQUEUE AND EXECUTE SEND REQUEST SHARED QUEUE

19 TCP Server Event-Driven Architecture
Dispatcher Monitor Send Handler Receive Handler Asynchronous Event Handler Shared Queue NIC From Application Processors To Application Processors

20 Dispatcher Kernel thread executing at the highest priority level in the kernel Schedules different handlers based using input from the monitor Executes an infinite loop and does not yield the processor No other activity can execute on the TCP Server processor

21 Asynchronous Event Handler (AEH)
Handles asynchronous network events Interacts with the NIC Can be an Interrupt Service Routine or a Polling Routine Is a short running thread Has the highest priority among TCP server modules The clock interrupt is used as a guaranteed trigger for the AEH when polling

22 Send and Receive Handlers
Scheduled in response to a request in the Shared Memory queues Run at the priority of the network protocol Interact with the Host processors

23 Monitor Observes the state of the system queues and provides hints to the Dispatcher to schedule Used for book-keeping and dynamic load balancing Scheduled periodically or when an exception occurs Queue overflow or empty Bad checksum for a network packet Retransmissions on a connection Can be used to reconfigure the set of TCP servers in response to load variation

24 TCP Servers for Cluster-based Servers

25 Cluster-based Implementation
TCP Server Host Application Socket Stub TUNNEL SOCKET REQUEST DEQUEUE AND EXECUTE SOCKET REQUEST VI Channels

26 TCP Server Architecture
Eager Processor Resource Manager TCP/IP Provider VI Connection Handler Request Handler Socket Call Processor SAN NIC - WAN (To Host)

27 Sockets and VI Channels
Pool of VI’s created at initialization Avoid cost of creating VI’s in the critical path Registered memory regions associated with each VI Send and receive buffers associated with socket Also used to exchange control data Socket mapped to a VI on the first socket operation All subsequent operations on the socket tunneled through the same VI to the TCP server

28 Socket Call Processing
Host library intercepts socket call Socket call parameters are tunneled to the TCP server over a VI channel TCP server performs socket operation and returns results to the host Library returns control to the application immediately or when the socket call completes (asynchronous vs synchronous processing).

29 Design Issues for TCP Servers
Splitting of the TCP/IP processing Where to split? Asynchronous event handling Interrupt or polling? Asynchronous API Event scheduling and resource allocation Adaptation to different workloads

30 Prototypes and Evaluation

31 SMP-based Prototype Modified Linux – SMP kernel on Intel x86 platform to implement TCP server Most parts of the system are kernel modules, with small inline changes to the TCP stack, software interrupt handlers and the task structures Instrumented the kernel using on-chip performance monitoring counters to profile the system

32 Evaluation Testbed Server Clients NIC : 3-Com 996-BT Gigabit Ethernet
4-Way 550MHz Intel Pentium II Xeon system with 1GB DRAM and 1MB on chip L2 cache Clients 4-way SMPs 2-Way 300 MHz Intel Pentium II system with 512 MB RAM and 256KB on chip L2 cache NIC : 3-Com 996-BT Gigabit Ethernet Server Application: Apache web server Client program: sclients [Banga 97] Trace driven execution of clients

33 Trace Characteristics
Logs Number of files Average file size Number of requests Average reply size Forth 11931 19.3 KB 400335 8.8 KB Rutgers 18370 27.3 KB 498646 19.0 KB Synthetic 128 16.0 KB 50000

34 Splitting TCP/IP Processing
APPLICATION APPLICATION PROCESSORS SYSTEM CALLS SEND copy_from_application_buffers TCP_send IP_send packet_scheduler setup_DMA packet_out RECEIVE copy_to_application_buffers TCP_receive IP_receive software_interrupt_handler interrupt_handler packet_in C3 C2 DEDICATED PROCESSORS C1

35 Implementations Implementation Interrupt processing (C1)
Receive Bottom (C2) Send Bottom (C3) Avoiding Interrupts (S1) SMP_BASE SMP_C1C2 SMP_C1C2S1 SMP_C1C2C3 SMP_C1C2C3S1

36 Throughput

37 CPU Utilization for Synthetic Trace

38 Throughput Using Synthetic Trace With Dynamic Content

39 Adapting TCP Servers to Changing Workloads
Monitor the queues Identify low and high water marks to change the size of the processor set Execute a special handler for exceptional events Queue length lower than the low water mark Set a flag which dispatcher checks Dispatcher sleeps if the flag is set Reroute the interrupts Queue length higher than the high water mark Wake up the dispatcher on the chosen processor

40 Load behaviour and dynamic reconfiguration

41 Throughput with Dynamic Reconfiguration

42 Cluster-based Prototype
User-space implementation (bypass host kernel) Entire socket operation offloaded to TCP Server C1, C2 and C3 offloaded by default Optimizations Asynchronous processing: AsyncSend Processing ahead: Eager Receive, Eager Accept Avoiding data copy at host using pre-registered buffers requires different API: MemNet

43 Implementations Implementation Kernel Bypassing (H1) Asynchronous
Processing (H2) Avoiding Host Copies (H3) Ahead (S2) Cluster_base Cluster_C1C2C3H1 Cluster_C1C2C3H1H3 Cluster_C1C2C3H1H2H3 Cluster_C1C2C3H1H2H3S2

44 Evaluation Testbed Server Clients NIC: 3-Com 996-BT Gigabit Ethernet
Host and TCP Server: 2-Way 300 MHz Intel Pentium II system with 512 MB RAM and 256KB on chip L2 cache Clients 4-Way 550MHz Intel Pentium II Xeon system with 1GB DRAM and 1MB on chip L2 cache NIC: 3-Com 996-BT Gigabit Ethernet Server application: Custom web server Flexibility in modifying application to use our API Client program: httperf

45 Throughput with Synthetic Trace Using HTTP/1.0

46 CPU Utilization

47 Throughput with Synthetic Trace Using HTTP/1.1

48 Throughput with Real Trace (Forth) Using HTTP/1.0

49 Related Work TCP Offloading Engines
Communication Services Platform (CSP) System architecture for scalable cluster-based servers, using a VIA-based SAN to tunnel TCP/IP packets inside the cluster Piglet - A vertical OS for multiprocessors Queue Pair IP - A new end point mechanism for inter-network processing inspired from memory-to-memory communication

50 Conclusions Offloading networking functionality to a set of dedicated TCP servers yields up to 30% performance improvement Performance Essentials: TCP Server architecture event driven polling instead of interrupts adaptive to load API asynchronous, zero-copy

51 Future Work TCP Server software distributions
Compare TCP Server Architecture with hardware based offloading schemes Use TCP Servers in Storage Networking

52 Acknowledgements My graduate students:
Murali Rangarajan, Aniruddha Bohra and Kalpana Banerjee


Download ppt "TCP Servers: Offloading TCP/IP Processing in Internet Servers"

Similar presentations


Ads by Google