Download presentation
Presentation is loading. Please wait.
1
IP Communication Fabric Mike Polston HP michael.polston@hp.com
2
Agenda Data Center Networking Today IP Convergence and RDMA The Future of Data Center Networking
3
Communication Fabric versus Communication Network
4
Communication Fabrics The Need –Fast, efficient messaging between two users of a shared network or bus –Predictable response and fair utilization for any 2 users of the ‘fabric’ Examples –Telephone switch –Circuit switch –ServerNet –Giganet –InfiniBand –RDMA over IP
5
How Many, How Far, How Fast? Number of Systems Connected Exponential Scale 1/Speed BUS LAN SAN Internet “Fabrics” Distance
6
Data Center Connections Connects for Management Public Net Access Client (PC) Access Storage Access Server to Server Messaging Load Balancing Server to Server Backup Server to DBMS Server to Server HA
7
Fabrics Within the Data Center Today Ethernet Networks –Pervasive Infrastructure –Proven Technology –IT Experience –Management Tools –Volume and Cost Leader –Accelerated Speed Improvements
8
Fabrics Within the Data Center Today Clustering –High Availability –Computer Clustering –Some on Ethernet –Memory Channel –Other Proprietary –Async connections –Early standards (ServerNet, Giganet, Myranet)
9
Fabrics Within the Data Center Today Storage Area Networks –Fibre Channel –Mostly Standard –Gaining Acceptance –Record –File –Bulk Transfer
10
Fabrics Within the Data Center Today Server Management –KVM Switches –HP Riloe, iloe –KVM over IP –Private IP nets
11
Processors Scale at Moore’s Law Doubling every 18 months Networks Scaling at Gilder’s Law Doubling every 6 months Memory Bandwidth growth rate Only 10-15% per year Scale Up Scale Out Partitionable Sea of Servers Solution to Scalability Business Growth …… And the need for Scale
12
Why Scale Out? Provide benefits by adding, not replacing … Fault Resiliance –HA Failover –N + 1 Protection Modular System Growth –Blades, Density –Investment Protection Parallel Processing –HPTC –DBMS Processing –Tiered Architectures
13
The “Hairy Golf Ball” Problem
14
Agenda Data Center Networking Today IP Convergence and RDMA The Future of Data Center Networking
15
IP Convergence convergencestorage networking remotemanagementclustering
16
Ethernet Bandwidth Evolution 1994 1998 1973 1979 2002 20xx 3 Mbps 10 Mbps 100 Mbps 1 Gbps 10Gbps 1xx Gbps
17
Sockets Scalability Where is the Overhead? Send Message 9000 Instructions 2 mode switches 1 memory registration 1 CRC Calculation Receive Message 9000 Instructions Less with CRC & LSS offload 2 mode switches 1 buffer copy 1 interrupt 1 CRC calculation Systemic Effects Cache, Scheduling Single RPC Request = 2 sends & 2 receives User Kernel OSV API (Winsock) OSV API Kernel Service(s) Protocol Stack(s) Device Driver LAN Media Interface Traditional LAN Architecture Components Application 50-150mS one-way
18
What is RDMA? Remote DMA (RDMA) The ability to move data from the memory space of one process to another memory space, without minimal use of the remote node’s processor. Provides error free data placement without CPU intervention or data movement at either node. a.k.a. Direct Data Placement (DDP) Capable of being submitted and completed from user-mode without subverting memory protection semantics. (OS bypass) Request processing for Messaging and DMA handled by receiver without host OS/CPU involvement.
19
The Need for RDMA At 1Gbps and above, memory copy overhead is significant, and it’s not necessarily the CPU cycles –Server designs don’t have 100MBytes/sec of additional memory bandwidth for buffer copying –RDMA makes each segment self describing, it can be landed in the right place w/o copying and/or buffering Classic networking requires two CPUs to be involved in a request/response pair for data access –End-to-end latency includes kernel scheduling events at both ends, which is guaranteed to be 10s-100s of milliseconds. –TOE alone doesn’t help with the kernel scheduling latency –RDMA initiates data movement from one CPU only, with no kernel transition. End-to-end latency is 10s of microseconds
20
Typical RDMA Implementation Applications User Agent (Verbs) Open/Close/Map Memory Send/Receive/Read/Write Kernel Agent Kernel HW Interface Fabric Media Interface ( ServerNet, IB, Ethernet) OS Vendor API (WinSock, MPI, Other) DBMS Apps QP CQ
21
“Big Three” wins for RDMA Accelerate legacy sockets apps User space sockets -> SDP -> RDMA Universal 25% - 35% performance gain in Tier 2-3 application communication overhead Parallel commercial database <100us latency needed to scale real world apps Requires user space messaging and RDMA IP based storage Decades old block storage access model (iSCSI, SRP) Command/RDMA Transfer/Completion Emerging user space file access (DAFS, NFS, CIFS) Compaq experiment identified up to 40% performance advantage. First lab test beat hand-tuned TPC-C run by 25%
22
WHY IP versus IB? Ethernet Hardware Continues to Advance SpeedLow CostUbiquity TCP Protocol Continues to Advance Management and Software Tools Internet WorldWide Trained Staff World Standards – Power, Phone, IP
23
Formed in Feb, 2002 Went public in May, 2002 Founders were Adaptec, Broadcom, Compaq, HP, IBM, Intel, Microsoft, NetApp. Added EMC and Cisco Open group with no fees working fast and furious Deliverables Include: Framing, DDP and RDMA Specifications Sockets Direct SCSI Mapping Investigation Deliverables to be submitted to the IETF as informational RFCs RDMA Consortium (RDMAC)
24
The Stack RDMA – Converts RDMA Write, RDMA Read, and Sends into a DDP message(s). DDP – Segments outbound DDP Messages into 1 or more DDP Segments; reassembles 1 or more DDP Segments into a DDP Message. MPA – Adds a backward marker at a fixed interval to DDP Segments. Also adds a length and CRC to each MPA Segment. TCP – Schedules outbound TCP Segments and satisfies delivery guarantees. IP – Adds necessary network routing information. IP TCP MPA DDP RDMA
25
RDMA Architectural Goals Data transfer from local to remote system into an advertised buffer Data retrieve from a remote system to local from an advertised buffer Data transfer from a local to remote system into a non advertised buffer Allow local system to signal completion to the remote system Provide for reliable sequential delivery from local to remote Provide for multiple stream support
26
RDMAP Data Transfer Operations Send Send with Invalidate Send with Solicitated Event (SE) Send with SE and Invalidate RDMA Write RDMA Read Terminate
27
Direct Data Placement Contain Placement Information –Relative Address –Record Length Tagged Buffers UnTagged Buffers Allows NIC Hardware to access application memory (Remote DMA) Can be implemented with or without TOE
28
RDMA over MPA/TCP Header Format TCP Header TCP Payload / TCP Data TCP Segment IP Datagram LengthULP Payload IP Header IP Data Frame PDU DDP Segment ULP PDU RDMA Message Oper- ation RDMA or Anonymous Buffer Ethernet Header Data (logical operation) RDMA Read RDMA Write Send DDP/RDMA Header(s) DDP/RDMA Payload Mar- ker
29
Agenda Data Center Networking Today IP Convergence and RDMA The Future of Data Center Networking
30
Emerging Fabric Adoption Two Customer Adoption Waves as Solutions Evolve RDMA/TCP InfiniBand InfiniBand InfiniBand Time Wave 1 First fabric solutions available (InfiniBand) Fabric evaluation within data centers begins Wave 2 Fabric Computing pervasiveness IP fabric solutions become the leading choice for data center fabric deployments Leverage existing investment and improve infrastructure performance and utilization InfiniBand used for specialized applications
31
Ethernet Roadmap Continued Ethernet Pervasiveness in the Datacenter Improved Ethernet Performance and Utilization 1 Gigabit Ethernet TCP/IP Offload & acceleration 10 Gigabit Ethernet iSCSI (Storage over IP) Today’s Ethernet Infrastructure Lights-out Management (KVM over IP) Revolutionary IP Improvements & Advancements Interconnect convergence Scalability & performance Resource virtualization RDMA/TCP Fabrics IP Sec Security acceleration
32
hp Fabric Leadership Bringing NonStop Technologies to Industry Standard Computing Robust, Scalable Computing Technology & Expertise Drive Fabric Standards Introduced the First Switched Fabric BreakthroughFabricEconomics High Volume Knowledge FabricComputing Leading Storage Fabric
33
Fabrics Within Future Data Centers Foundation for Future Adaptive Infrastructure Vision Move from “tiers” to “elements” n-tier architecture, like DISA, replaced by element “pools” available over the fabric resource access managed by tools like ProLiant Essentials and hp OpenView centrally administered automation tools Heterogeneous fabric “islands” data center fabric connecting “islands” of compute & storage resources RDMA/TCP enables practical fabric scaling across the datacenter protocol routers translate between islands compute fabric storage fabric edge router database servers web servers NASiSCSI SAN virtualized functions intern et application servers IP to FC router Fibre Channel SAN IP to IB router (UNIX) data center fabric firewall routing switches - switches - manageme nt systems provisioning monitoring resource mgmt by policy service-centric hpOpenViewProLiantEssentials hp utility data center
34
Making the IP Fabric Connection
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.