Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
RDDP Is Coming Soon “ST [RDMA] Is The Wave Of The Future” – S Bailey & C Good, CERN 1999 Need: –standard protocols –host software –accelerated NICs (RNICs) –faster host buses (for > 1G) Vendors are finally serious: Broadcom, Intel, Agilent, Adaptec, Emulex, Microsoft, IBM, HP (Compaq, Tandem, DEC), Sun, EMC, NetApp, Oracle, Cisco & many, many others
Overview Motivation Architecture Open Issues
CFP SigComm Workshop NICELI SigComm 03 Workshop Workshop on Network-I/O Convergence: Experience, Lessons, Implications orkshop/niceli/index.html
High Speed Data Transfer Bottlenecks –Protocol performance –Router performance –End station performance, host processing CPU Utilization The I/O Bottleneck –Interrupts –TCP checksum –Copies
What is RDMA? Avoids copying by allowing network adapter under control of application to steer data directly into application buffers Bulk data transfer or kernel bypass for small messages Grid, cluster, supercomputing, data centers Historically, special purpose fabrics – Fibre Channel, VIA, Infiniband, Quadrics, Servernet
Ethernet/ IP Storage Network (Fibre Channel) Database Intermachine Network (VIA, IB, Proprietary) Servers The World application A Machine Traditional Data Center
Why RDMA over IP? Business Case TCP/IP not used for high bandwidth interconnection, host processing costs too high High bandwidth transfer to become more prevalent – 10 GE, data centers Special purpose interfaces are expensive IP NICs are cheap, volume
The Technical Problem- I/O Bottleneck With TCP/IP host processing can’t keep up with link bandwidth, on receive Per byte costs dominate, Clark (89) Well researched by distributed systems community, mid 1990’s. Industry experience. Memory bandwidth doesn’t scale, processor memory performance gap– Hennessy(97), D.Patterson, T. Anderson(97), Stream benchmark
Copying Using IP transports (TCP & SCTP) requires data copying NIC 1 User Buffer Packet Buffer Packet Buffer 2 Data copies
Why Is Copying Important? Heavy resource high speed (1Gbits/s and up) –Uses large % of available CPU –Uses large fraction of avail. bus bw – min 3 trips across the bus TestThroughput (Mb/sec) Tx CPUsRx CPUs 1 GBE, TCP CPUs1.2 CPUs 1 Gb/s RDMA SAN - VIA CPUs 64 KB window, 64 KB I/Os, 2P 600 MHz PIII, 9000 B MTU
What’s In RDMA For Us? Network I/O becomes `free’ (still have latency though) 2500 machines using 30% CPU for I/O 1750 machines using 0% CPU for I/O
Approaches to Copy Reduction On-host – Special purpose software and/or hardware e.g., Zero Copy TCP, page flipping –Unreliable, idiosyncratic, expensive Memory to memory copies, using network protocols to carry placement information –Satisfactory experience – Fibre Channel, VIA, Servernet FOR HARDWARE, not software
RDMA over IP Standardization IETF RDDP Remote Direct Data Placement WG – RDMAC RDMA Consortium –
RDMA over IP Architecture Two layers: DDP – Direct Data Placement RDMA - control IP Transport DDP RDMA control ULP
Upper and Lower Layers ULPs- SDP Sockets Direct Protocol, iSCSI, MPI DAFS is standardized NFSv4 on RDMA SDP provides SOCK_STREAM API Over reliable transport – TCP, SCTP
Open Issues Security TCP order processing, framing Atomic ops Ordering constraints – performance vs. predictability Other transports, SCTP, TCP, unreliable Impact on network & protocol behaviors Next performance bottleneck? What new applications? Eliminates the need for large MTU (jumbos)?