Download presentation
Presentation is loading. Please wait.
Published byJasmine Sparks Modified over 9 years ago
1
Scalable Networking for Next-Generation Computing Platforms Yoshio Turner *, Tim Brecht *‡, Greg Regnier §, Vikram Saletore §, John Janakiraman *, Brian Lynn * * Hewlett Packard Laboratories § Intel Corporation ‡ University of Waterloo
2
page 214 Feb 2004 SAN-3 workshop – HPCA-10 Outline Motivation: Enable applications to scale to next- generation network and I/O performance on standard computing platforms Proposed technology strategy: – Embedded Transport Acceleration (ETA) – Asynchronous I/O (AIO) programming model Web server application evaluation vehicle Evaluation Plan Conclusions
3
page 314 Feb 2004 SAN-3 workshop – HPCA-10 Motivation: Next-Generation Platform Requirements Low overhead packet and protocol processing for next- generation commodity interconnects (e.g., 10 gigE) – Current systems: performance impeded by interrupts, context switches, data copies – Existing proposals include: TCP Offload Engines (TOE): special hardware, cost/time to market issues RDMA: new protocol, requires support at both endpoints Increased I/O concurrency for high link utilization – I/O bandwidth is increasing – I/O latency is fixed or slowly decreasing toward limit Need larger number of in-flight operations to fill pipe
4
page 414 Feb 2004 SAN-3 workshop – HPCA-10 Proposed Technology Strategy Embedded Transport Acceleration (ETA) architecture – Intel Labs project: prototype architecture dedicates one or more processors to perform all network packet processing -- ``Packet Processing Engines’’ (PPEs) – Low overhead processing: PPE interacts with network interfaces/applications directly via cache-coherent shared memory (bypass the OS kernel) – Application interface: VIA-style user-level communication Asynchronous I/O (AIO) programming model – Split two-phase file/socket operations Post an I/O operation request: non-blocking call Asynchronously receive completion event information – High I/O concurrency even for single-threaded application – Initial focus: ETA socket AIO (future extensions to file AIO)
5
page 514 Feb 2004 SAN-3 workshop – HPCA-10 Key Advantages Potentially enables Ethernet and TCP to approach latency and throughput performance of System Area Networks Uses standard system processor/memory resources: – Automatically tracks semiconductor cost-performance trends – Leverages microarchitecture trends: multiple cores, hardware multi-threading – Leverages standard software development environments rapid development Extensibility: fully programmable PPE to support evolving data center functionality – Unified IP-based fabric for all I/O – RDMA AIO increases network-centric application scalability
6
page 614 Feb 2004 SAN-3 workshop – HPCA-10 Overview of the ETA Architecture Partitioned server architecture: – Host: application execution – Packet Processing Engine (PPE) Host-PPE Direct Transport Interface (DTI) – VIA/Infiniband-like queuing structures in cache coherent shared host memory (OS bypass) – Optimized for sockets/TCP Direct User Socket Interface (DUSI) – Thin software layer to support user level applications
7
page 714 Feb 2004 SAN-3 workshop – HPCA-10 Host CPU(s) PPE LAN Storage IPC Network Fabric ETA Host Interface iSCSI File System Kernel Applications User Applications Legacy Sockets Direct Access TCP/IP Driver Shared Memory ETA Overview: Partitioned Architecture
8
page 814 Feb 2004 SAN-3 workshop – HPCA-10 ETA Overview: Direct Transport Interface (DTI) Queuing Structure Asynch socket operations: connect, accept, listen, etc. TCP buffering semantics – anonymous buffer pool supports non-pre-posted or OOO receive packets Packet Processing Engine HOST Shared Host Memory DTI Tx Queue Data Buffers DTI Event Queue DTI Rx Queue Anonymous Buffer Pool DTI Doorbells
9
page 914 Feb 2004 SAN-3 workshop – HPCA-10 API for Asynchronous I/O (AIO) Layer socket AIO API above ETA architecture – Investigate impact of AIO API features on application structure and performance Initial focus: ETA Direct User Socket Interface (DUSI) API – provides asynchronous socket operations: connect, listen, accept, send, receive AIO examples: – File/socket: Windows AIO w/completion ports, POSIX AIO – File I/O: Linux AIO recently introduced – Socket I/O with OS bypass: ETA DUSI, OpenGroup Sockets API Extensions
10
page 1014 Feb 2004 SAN-3 workshop – HPCA-10 ETA Direct User Socket Interface (DUSI) AIO API Queuing structure setup for sockets: – One Direct Transfer Interface (DTI) per socket – Event queues: created separately from DTIs Memory registration: – Pin user space memory regions, provide address translation information to ETA for zero-copy transfers – Provide access keys (protection tags) Application posts socket I/O operation requests to DTI Tx and Rx work queues PPE delivers operation completion events to DTI event queues Both operation posting and event delivery are lightweight (no OS involvement)
11
page 1114 Feb 2004 SAN-3 workshop – HPCA-10 AIO Event Queue Binding AIO API design issue: assignment of events to event queues – Flexible binding enables applications to separate or group events to facilitate operation scheduling DUSI: each DTI work queue can be bound at socket creation to any event queue – Allows separating or grouping events from different sockets – Allows separating events by type (transmit, receive) Alternatives for event queue binding: – Windows: per-socket – Linux and POSIX AIO: per-operation – OpenGroup Sockets API Extensions: per-operation-type
12
page 1214 Feb 2004 SAN-3 workshop – HPCA-10 Retrieving AIO Completion Events AIO API design issue: application interface for retrieving events DUSI: lightweight mechanism bypassing OS – Event queues in shared memory – Callbacks: similar to Windows – Event tags Application monitoring of multiple event queues – Poll for events (OK for small number of queues) – No events block in OS on multiple queues Uncommon case in a busy server acceptable in this case to use OS signaling mechanism Useful for simultaneous use of different AIO APIs – Race conditions: user level responsibility
13
page 1314 Feb 2004 SAN-3 workshop – HPCA-10 AIO for Files and Sockets File AIO support – OS (e.g., Linux AIO, POSIX AIO) – Future: ETA support for file I/O (e.g., via iSCSI or DAFS) Unified application processing of file/socket events – ETA PPE and OS kernel may both supply event queues Blocking on event queues of different types facilitated by use of OS signal signal mechanism (as in DUSI) Unified event queues may be desirable: require efficient coordination of ETA and OS access to event queues – Support for zero-copy sendfile(): integration of ETA with OS management of the shared file buffer in system memory
14
page 1414 Feb 2004 SAN-3 workshop – HPCA-10 Initial Demonstration Vehicle: Web Server Application Plan: demonstrate value of ETA/AIO for network-centric applications Initial target: web server application – Single request may require multiple I/Os – Stresses system resources (esp. OS resources) – Must multiplex thousands/tens of thousands concurrent connections Web server architecture alternatives: – SPED (single process event-driven) – MP (multi-process) or MT (multi-threaded) – Hybrid approach: AMPED (asymmetric multi-process event-driven) AIO model favors SPED for raw performance
15
page 1514 Feb 2004 SAN-3 workshop – HPCA-10 The userver Open source micro web server Extensive tracing and statistics facilities SPED model -- run one process per host CPU Previous support for Unix non-blocking socket I/O and event notification via Linux epoll() Modified to support socket AIO (eventually file AIO) – Generic AIO interface: can be mapped to a variety of underlying AIO APIs (DUSI, Linux AIO, etc.) Comparison: web server performance with and without ETA engine – With Standard Linux: processes share file buffer cache using sendfile() for zero-copy file transfer – With ETA: mmap() files into shared address space
16
page 1614 Feb 2004 SAN-3 workshop – HPCA-10 Web Server Event Scheduling Balance accepting new connections with processing of existing connections Scheduling: – Separate queues for accept(), read(), and write()/close() completion events – Process based on current queue lengths Early results with non-blocking I/O – accept processing frequency Throughput impact of frequency of accepting new connections
17
page 1714 Feb 2004 SAN-3 workshop – HPCA-10 Evaluation Plans Goal: evaluate approach, compare to design alternatives Construct functional prototype of proposed stack (Linux) – Extend existing ETA prototype kernel-level interface to user level with OS bypass (DUSI) – Extend the userver to use socket AIO, mapping layer to DUSI – Evaluate on 10 gigE –based client/server setup using SPECweb type workload Current ETA prototype: promising kernel-level micro- benchmark performance Expectation: ETA + AIO will show significantly higher scalability than existing Linux network implementation
18
page 1814 Feb 2004 SAN-3 workshop – HPCA-10 UDP TCP RAW IP Linux Kernel DTI Data Path User Kernel ETA Direct User Sockets Interface (DUSI) Packet Driver Linux Sockets Library uServer - AIO ETA Packet Processing Engine Control Path AIO Mapping Network Interfaces ETA Kernel Agent uServer - sockets Proposed Stack/Comparison
19
Kernel-Level ETA Prototype
21
page 2114 Feb 2004 SAN-3 workshop – HPCA-10 Evaluation Plans: Analyses and Comparisons Compare proposed stack to well-tuned conventional system: checksum offload, TCP segmentation offload, interrupt moderation (NAPI) Examine micro-architectural impacts: VTune/oprofile to get CPU, memory, cache usage, interrupts, data copies, context switches Comparison to TOE Extend analysis to application domains beyond web server: e.g., storage, transaction processing Port highly scalable user-level threading package (UC Berkeley Capriccio project) to ETA – Benefit: familiar threaded programming model with efficient ``under the hood’’ underlying AIO and OS bypass
22
page 2214 Feb 2004 SAN-3 workshop – HPCA-10 Summary Proposed technology strategy combining ETA and AIO to enable industry standard platforms to scale to next- generation network performance Cost-performance, time to market, flexibility advantages over alternative approaches Ethernet/TCP to approach performance levels of today’s SANs – toward unified data center I/O fabric based on commodity hardware Status – Promising initial experimental results for kernel-level ETA – Prototype implementation of proposed stack nearly complete – Testing environment setup based on 10 gigE
23
page 2314 Feb 2004 SAN-3 workshop – HPCA-10 Backup Slides
24
ETA Packet Processing Engine Software Gigabit NICs (5) ETA Host Interface Kernel Test Program CPU 0 Host CPU 1 PPE Off-the-shelf Linux Servers Host Memory Clients Test Clients Kernel Abstraction Layer
27
UDP TCP RAW IP OS Kernel Sockets Provider DTI (User) User Kernel ETA Direct User Sockets Provider ETA Kernel Sockets Provider Packet Driver Kernel Applications ETA Kernel Adaptation Layer ETA User Adaptation Layer ETA Kernel Agent OSV User Sockets Provider Service Level Provider Switch Kernel Sockets Provider Switch User-level Sockets Applications & Services ETA Packet Processing Engine DTI (Kernel)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.