Network IO Architectures Partner Concepts Steven Hunter IBM Corporation Blade Architectures Applications and Benefits with a Networking Focus Chris Pettey NextIO I/O Virtualization Enablement for Blade Servers
Blade Architectures Applications and Benefits With a Networking Focus Steven W. Hunter, PhD, PE IBM Distinguished Engineer IBM Corporation xSeries / BladeCenter Server Architecture IBM Systems and Technology Group
Agenda Blade applications Web hosting Server consolidation Network attached storage Benefits of blade architectures Reduced cost Density Power consumption "Ease of" category (install/upgrade/service) Dependability Scalable performance Unified management solutions
Switch Internet Data Centers Large scale centers providing content hosting, load balancing/caching, infrastructure and application hosting Server Consolidation File/print, mail, terminal serving, infrastructure and e-commerce Telecommunications and Equipment Manufacturers Telecommunications network infrastructure, switching, and voice over IP Clusters Database clusters High performance clusters High availability clusters Blade Applications
L4/7 Switch App Server 2 App Server N App Server 1 Client Blade Chassis One or more IP Address(es) Client Management Application Hosting with Layer 4/7 Switching Integrated load balancing and content switching enhances web hosting and caching by further improving scalability, availability, management, etc. Enables a closer coupling between the server and network for performance, health, power, etc. Achieved with Layer 4/7 switching either by integrating or front-ending with other (e.g., Layer 2) integrated networking technology IP Network
Switch Server 2 Server N Server 1 Client Blade Chassis Client Management VM Server Consolidation and Virtualization Virtualization at the blade level enables consolidation of multiple applications or multiple secure instances within a blade chassis Switch VLANs may be used to guarantee application separation and security across the network Security must be maintained from network connection through virtualization layer to Virtual Machine IP Network
Network Attached Storage NAS servers and storage consolidated via clustering for high availability, scalability and ease of management Clients and servers communicate via TCP/IP protocols (e.g., CIFS, NFS) Storage consolidated on a Fibre Channel network with block I/O Switch Server 2 Server N Server 1 Blade Chassis Mgmt Client FC Switch Storage Controller IP Network
Shared file storage consolidation Improved storage utilization Scale storage independent of servers DS400 Single Controller RS Ctrl A HS20 Storage Server A FC HBA HS20 Storage Server B FC HBA X346 Storage Server C FC HBA X346 Storage Server D FC HBA Storage for Shared Files NIC FC Switch Clients Servers File I/O TCP/IP Network Protocols CIFS, NFS File I/O TCP/IP Network Protocols CIFS File I/O TCP/IP Network Protocols NFS Block I/O Network Attached Storage - Example IP Network
Benefits of Blade Architectures Reduced Cost Chassis infrastructure - lower cost vs. 1U as infrastructure grows Adding more nodes to 1U model linearly increases cables and forces other tradeoffs due to limited space Consolidating power, cooling, networking and management is a more cost effective than the disaggregated approach Integrating network functions reduces cost by 1) reducing cables and 2) the sharing of chassis power, cooling, and management Switch Blade Switch Blade Chassis Midplane
Benefits of Blade Architectures Density Twice the amount of CPU MIPs in the same amount of space Consolidating power, cooling and management also improves density Integration of network increases density by over twice a 1U solution The BladeCenter switch form factor is based on Infiniband mechanical Width 29 mm, height 112 mm, depth mm The BladeCenter I/O daughter card area is approximately 110cm 2 Processor Blade Switch Module
Benefits of Blade Architectures Power Power distribution from a common bulk source to all blades reduces electrical power consumption The BladeCenter power budget for a single switch is 45 watts Combining power under a common management domain enables further power savings by associating compute power requirements to the workload being applied (e.g., network load balancing) Workload Management Workload Measurement Workload Execution Power Control Blade Workload Measurement Workload Execution Power Control Blade... Power Management More information:
Ease of Install, Upgrade, and Service 100's of cables with 1U vs. 10's with blade Upgrade is remove/replace in minutes Less time than 1U translates to cost savings Less skill required, less error potential than 1U Central site deployment for complex configurations Web caching configurations are deployed worldwide, but they’re also complex, so a specialist must configure a disaggregated approach on-site Benefits of Blade Architectures
Dependability Dependability includes reliability, availability and serviceability and now includes aspects of security (e.g., malicious users/attacks) Replacing blade in seconds vs. minutes (i.e., 1U) improves availability, such that steady-state availability is MTTF / (MTTF + MTTR) Techniques such as Software Rejuvenation are better enabled under a single control point of a clustered blade system For example, remove blade from load balancing service, reset software state to initial level of resource consumption and add back to service pool More information: Develop proactive self-healing systems Detect and predict resource exhaustion through Predictive Failure Analysis (PFA)
Benefits of Blade Architectures Scalable Performance Easily scale performance to match workload by physically adding blades into a chassis Performance can also be scaled dynamically by provisioning “free blades” with OS and application to meet current workload demands The goal is to have a non-blocking networking fabric (example below) BRCM 56XX BRCM 56XX 10G …. External Switch Interfaces 4 external links at 1Gbps 1000BaseT 802.3ad, 802.1p&q Internal Processor Blade Interfaces 14 SerDes links 1 Gbps link speed Internal Mgmt Module Interfaces 100 Mbps link speed Connected to ports 1 & 2 Power and Low-level Management Signals CPU Subsystem Ethernet Switching Hardware Bus.. Aggregate Switch Bandwidth 18 Gbps (i.e., total number of ports)
Benefits of Blade Architectures Unified Management Solutions Common single point of control to more tightly couple chassis management for server, networking, power, thermal, cooling, etc. Upward integration into enterprise management framework Converged management of communications, storage (i.e., iSCSI), clustering (i.e., RDMA), and systems management For example, converged low level management onto the same physical fabric, but kept separate and secure with VLANs Ethernet NIC CPU Management Processor Mgmt Link PCIx [Serial Data] [RMCP] Blade Mgmt Module SoL Telnet Server Ethernet Switch 10/100 Mgmt Network 100Mbps Link Multiple 1 Gbps Public Network 1 Gbps Link [VLAN] [RMCP] [SSH/Telnet]
questions
I/O Virtualization Enablement for Blade Servers Chris Pettey, CTO Next I/O, Inc.
Agenda Background Current Blades Shared I/O Shared I/O in Blades One Die Fits All Shared NIC
Gig Ethernet Switch External Gig Ethernet Port Blade Server Chassis CPU SDRAM Ethernet PCI-Ex Chipset CPU SDRAM Ethernet PCI-Ex Chipset CPU SDRAM Ethernet PCI-Ex Chipset Fibre Channel Fibre Channel Fibre Channel Fibre Channel Switch External Fibre Channel Port Multiple Fabrics I/O fixed at order time Internal vs. External Switch Compatibility Concerns Management software must contemplate multiple fabrics Current Blade System
What is Shared I/O? Virtualization of I/O devices in a physical adapter Non-Shared I/O = One adapter per Root Complex Single Operating System Domain (OSD) in a single adapter Shared I/O = One adapter per multiple Root Complexes Multiple OSDs in a single adapter, one for each Root Complex Each Root Complex owns a unique virtual adapter in a single Shared I/O adapter Different “Views” within one adapter are part of multiple unrelated PCI Hierarchies Each PCI Hierarchy runs independently Complete transparency to Application, OS, BIOS, etc Important for Blades Flexible I/O devices available to multiple blades simultaneously Dedicated adapters per blade no longer required with Shared I/O * Root Complex (RC) = Processor, Chipset and Memory
Blade Server Chassis PCI Express Switch Shared Ethernet CPU SDRAM PCIe Chipset CPU SDRAM PCIe Chipset CPU SDRAM PCIe Chipset Shared Fibre Channel Standard PCI Express Mid-plane Enhanced PCI Express Protocol Four Keys to Success 1)OS Transparency 2)OS Isolation 3)Performance 4)Cost OSD 0 OSD 1 OSD 2 OSD 3 Shared I/O in Blade Chassis
Shared I/O Benefits Single mid-plane technology is less expensive Fewer Switches, Simpler Mid-plane Smooth technology transitions For both new technologies and speed grades Fewer interoperability concerns No need to match internal with external switch Management SW complexity reduced I/O managed just like rack mount servers Shared I/O preserves the value of blades and eliminates the drawbacks!
RC#0 RC#1 RC#2 PCI Express Switch Network Fabric Cfg. Reg OSD 0 Cfg. Reg OSD 1 Cfg. Reg OSD 2 CSR OSD 0 CSR OSD 2 CSR OSD 1 DMA Engine #2 DMA Engine #1 OSD Specific Statistics Global Statistics Receive Buffer PHY Transmit Buffer Context Processor MAC RC#1 Network Fabric Cfg. Reg OSD 1 CSR OSD 1 DMA Engine #2 DMA Engine #1 Global Statistics Receive Buffer PHY Transmit Buffer Context Processor MAC Basic vs Shared I/O NIC
Functionality for Shared I/O NIC Bus Interface: Use Enhanced PCI Express routing to tunnel to an OSD CSR: Provide independent driver registers for each supported OSD DMA Engine: Provide SGL, DMA, and INT resources for each OSD Data Path: Partition inbound accesses to correct internal OSD partition Encapsulate outbound accesses to correct external OSD partition Context Processor: Maintain CMD context for each OSD Provide packet unique packet headers for each OSD Statistics: Provide required statistics per OSD in addition to global (chip-wide) statistics Packet Replication: Replicate inbound Broadcast or Multicast packets to all internal OSD partitions Replicate and wrap outbound Broadcast and Multicast packets to internal OSD partitions Wrap outbound Unicast packets that match MAC address for OSD partition to inbound OSD partition
Shared I/O Devices for Virtual Machine SDRAM PCI Express Chipset CPU OSD 0 OSD 1 OSD 2 Virtual Machine = Multiple Operating System Domains (OSD) running on a processor Multiple OSDs on top of Shared I/O can lead to further hardware innovation DMA Protection, MMIO Protection and Interrupt Isolation & Steering Shared I/O devices may enhance future VM solutions Network Fabric Cfg. Reg OSD 0 Cfg. Reg OSD 2 CSR OSD 0 CSR OSD 2 CSR OSD 1 DMA Engine #2 DMA Engine #1 OSD Specific Statistics Global Statistics Receive Buffer PHY Transmit Buffer Context Processor MAC Cfg. Reg OSD 1
Emergence of Shared I/O Devices Shared I/O Gate count is minimal Large die consumption functions are shared resources Resources per OSD are small Pointers, counters, etc. Shared I/O NIC operates in PCIe base mode Single root complex systems “see” a PCI express base device No changes required to BIOS, Driver, or OS One Device for PCI-EX Base, VM, & Shared I/O !
Summary Virtual I/O is rapidly evolving Shared I/O is a natural extension Shared I/O important for Blades Less expensive More flexible Easier to manage Shared NIC = the next I/O Single die for Standard PCI-E, virtual I/O & shared I/O
Community Resources Windows Hardware & Driver Central (WHDC) Technical Communities Non-Microsoft Community Sites Microsoft Public Newsgroups Technical Chats and Webcasts Microsoft Blogs
questions
© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.