Download presentation
Presentation is loading. Please wait.
Published byGervais Alexander Modified over 9 years ago
1
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized Network Drivers Kaushik Kumar Ram*, J. Renato Santos +, Yoshio Turner +, Alan L. Cox*, Scott Rixner* + HP Labs *Rice University
2
Xen summit – Feb 2009 02/25/2009 2 Xen PV Driver on 10 Gig Networks Focus of this talk: RX Throughput on a single TCP connection (netperf)
3
Xen summit – Feb 2009 02/25/2009 3 Network Packet Reception in Xen Driver DomainGuest Domain Backend Driver Fronten d Driver Xen Physical Driver Hardware NIC I/O Channel Incoming Pkt IRQ Bridge grant copy event DMA demux 234 5 67 Push into the network stack 1 Post grant on I/O channel gr Mechanisms to reduce driver domain cost: Use of Multi-queue NIC −Avoid data copy −Packet demultiplex in hardware Grant Reuse Mechanism −Reduce cost of grant operations
4
Xen summit – Feb 2009 02/25/2009 4 Using Multi-Queue NICs Driver Domain Guest Domain Backend Driver Fronten d Driver Xen Physical Driver Hardware MQ NIC I/O Channels Incoming Pkt IRQ event 6 8 Post grant on I/O channel 1 Map buffer post buf on dev queue DMA 5 UnMap buffer 7 9 Push into the network stack gr 3 2 Advantage of multi-queue Avoid data copy Avoid software bridge One RX queue per guest guest MAC addr demux 4
5
Xen summit – Feb 2009 02/25/2009 5 Performance Impact of Multi-queue Savings due to multiqueue grant copy bridge Most of remaining cost grant hypercalls (grant + xen functions) Driver Domain CPU Cost
6
Fronten d Driver Xen summit – Feb 2009 02/25/2009 6 Using Grants with Multi-queue NIC Driver DomainGuest Domain Backend Driver Xen Physical Driver NIC 1 map grant hypercall gr 3 unmap grant hypercall Multi-queue replaces one grant hypercall (copy) with two hypercalls (map/unmap) Grant hypercalls are expensive Map/unmap calls for every I/O operation use page for I/O 2
7
Xen summit – Feb 2009 02/25/2009 7 Reducing Grant Cost Grant Reuse −Do not revoke grant after I/O is completed −Keep buffer page on a pool of unused I/O pages −Reuse already granted pages available on buffer pool for future I/O operations −Avoids map/unmap on every I/O
8
Xen summit – Feb 2009 02/25/2009 8 Revoking a Grant for when the Page is Mapped in Driver Domain Guest may need to reclaim I/O page for other use (e.g. memory pressure on guest) Need to unmap page at driver domain before using it in guest kernel To preserve memory isolation (e.g. protect from driver bugs) Need handshake between frontend and backend to revoke grant This may be slow especially if the driver domain is not running
9
Xen summit – Feb 2009 02/25/2009 9 Approach to Avoid Handshake when Revoking Grants Observation: No need to map guest page into driver domain with multi-queue NIC Software does not need to look at packet header, since demux is performed in the device Just need page address for DMA operation Approach: Replace grant map hypercall with a shared memory interface to the hypervisor Shared memory table provides translation of guest grant to page address No need to unmap page when guest needs to revoke grant (no handshake)
10
Xen summit – Feb 2009 02/25/2009 10 Software I/O Translation Table Driver DomainGuest Domain Backend Driver Fronten d Driver Xen Physical Driver NIC create a grant for buffer page Send grant over I/O channel 2 set hypercall Validate, pin and update SIOTT 9 clear hypercall 6 get page 8 reset use 1 3 SIOT T #pg use pg 0 1 Use page for I/O 7 DMA event 10 check use and revoke gr 4 5 set use pg SIOTT: software I/O translation table −Indexed by grant reference −“pg” field: guest page address & permission −“use” field indicates if grant is in use by driver domain set/clear hypercalls −Invoked by guest −Set validates grant, pins page, and writes page address to SIOTT −Clear requires that “use”=0
11
Xen summit – Feb 2009 02/25/2009 11 Grant Reuse: Avoid pin/unpin hypercall on every I/O Driver DomainGuest Domain Backend Driver Fronten d Driver Xen Physical Driver NIC create grant 2 set hypercall validate, pin and update SIOTT 1 3 SIOTT #pg use pg 0 event I/O Buffer Pool reuse buffer & grant from pool 5 return buffer to pool & keep grant 4 kernel mem pressure clear hypercall 8 9 clear SIOT return page to kernel 7 11 gr 6 return buffer to pool & keep grant 10 revoke grant Use page for I/O
12
Xen summit – Feb 2009 02/25/2009 12 Performance Impact of Grant Reuse w/ Software I/O Translation Table cost saving: grant hypercall Driver Domain CPU Cost
13
Xen summit – Feb 2009 02/25/2009 13 Impact of optimizations on throughput Data rate CPU utilization Multi-queue w/ grant reuse significantly reduce driver domain cost Bottleneck shifts from driver domain to guest Higher cost in guest than in Linux still limits throughput in Xen
14
Xen summit – Feb 2009 02/25/2009 14 Additional optimizations at guest frontend driver LRO (Large Receive Offload) support at frontend −Consecutive packets on same connection combined into one large packet −Reduces cost of processing packet in network stack Software prefetch −Prefetch next packet and socket buffer struct into CPU cache while processing current packet −Reduces cache misses at frontend Avoid full page buffers −Use half-page (2KB) buffers (Max pkt size is 1500 bytes) −Reduces TLB working set and thus TLB misses
15
Xen summit – Feb 2009 02/25/2009 15 Performance impact of guest frontend optimizations Optimizations bring CPU cost in guest close to native Linux Remaining cost difference −Higher cost in netfront than in physical driver −Xen functions to send and deliver events Guest Domain CPU Cost
16
Xen summit – Feb 2009 02/25/2009 16 Impact of all optimizations on throughput Multiqueue with software optimizations achieves the same throughput as direct I/O ( ~8 Gb/s) 2 or more guests are able to saturate 10 gigabit link current PV driver optimized PV driver (1 guest) optimized PV driver (2 guests) Direct I/O (1 guest) Linux
17
Xen summit – Feb 2009 02/25/2009 17 Conclusion Use of multi-queue support in modern NICs enables high performance networking with Xen PV Drivers −Attractive alternative to Direct I/O Same throughput, although with some additional CPU cycles at driver domain Avoids hardware dependence in the guests −Light driver domain enables scalability for multiple guests Driver domain can now handle 10 Gb/s data rates Multiple guests can leverage multiple CPU cores and saturate 10 gigabit link
18
Xen summit – Feb 2009 02/25/2009 18 Status −Performance results obtained on a modified netfront/netback implementation using the original Netchannel1 protocol −Currently porting mechanisms to Netchannel2 Basic multi-queue already available on public netchannel2 tree Additional software optimizations still in discussion with community and should be included in netchannel2 sometime soon. Thanks to −Mitch Williams and John Ronciak from Intel for providing samples of Intel NICs and for adding multi-queue support on their driver −Ian Pratt, Steven Smith and Keir Fraser for helpful discussions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.