HP StorageWorks Scalable File Share based on Lustre technology for HPC Linux Clusters Ramakrishna Ghildiyal Hewlett Packard India New Delhi Scale up, Scale out, Scale simply! Sep 2006
The Linux-Cluster I/O Problem Need high-performance, efficient, scalable I/O Multiple GB/s of parallel bandwidth A mix of large and small I/Os Distributed to dozens, hundreds, or thousands of compute nodes Need resilience No single points of failure for high up-time Need the simplicity of standard I/O interfaces Coherent, POSIX compliant APIs, with MPI I/O support Need robust technology Using underlying field-proven filesystems Traditional NFS, CIFS, and SAN filesystems do not always meet these cluster storage needs NFS Compute Farm 9/19/2018
The Linux-Cluster I/O Solution HP StorageWorks Scalable File Share (HP SFS) Scalability Scales to meet almost any bandwidth and capacity need 200 MB/s to 35 GB/s (or more) Fabulous price/performance Two to five times the bandwidth per dollar Compared to scalable NFS and other clustered storage solutions Integrated storage appliance Integrated, reliable, easy to use Engineered, delivered & supported worldwide by HP Compute Farm HP SFS 9/19/2018
Unified Cluster Portfolio Data Management Options Higher Bandwidth & Capacity “HPC class” Greater HA & Transactions “Enterprise class” Clustered Gateway HP SFS Lustre & NFS NAS Cluster Scalability SAN SMP & FC scalability NFS & CIFS GFS Matrix StorNext filesystem & Managed Storage Comparing NAS and SAN: NAS storage works well to connect a handful to thousands of small servers, workstations, and PCs to a shared filesystem within a machine room or among continents. NAS is well suited to interconnect high-speed shared storage with multiple gigabytes-per-second (GB/s) using the new scalable storage servers such as HP SFS and Clustered Gateway. NFS and CIFS are ubiquitous NAS protocols for Unix/Linux and Windows clients (respectively). New protocols, such as the Lustre™ protocol supported by HP SFS can deliver POSIX compliant coherent interfaces with parallel read/write locking behavior similar to that found on SMPs and shared SAN filesystems. This eliminates the locking and performance issues inherent in the more ubiquitous NFS protocol. HP SFS supports the object-based Lustre protocol over GbE interconnects and over gigabyte-per-second (GB/s) interconnects such as InfiniBand, Myrinet, and Quadrics. Thus NAS can now deliver hundreds of MB/s of bandwidth to individual compute servers with aggregate cluster storage bandwidths of tens of GB/s. With these new advanced protocols and interconnects, NAS is now faster than SAN for cluster storage with dramatically superior price/performance. SAN interconnects are oriented toward serving a smaller number of larger servers over Fibre Channel fabrics. Newer SAN transports include iSCSI over Ethernet. Multiple Fibre Channel connections can sustain multiple GB/s of bandwidth to individual SMPs. (NAS does not yet support multiple GB/s connections to individual SMPs.) SAN scalability tends to be limited to 256 compute servers or less. The price of SAN interconnects is a practical limit, given SAN costs rise as the scalability increases. Many SAN storage devices limit the number of hosts to which they will speak. These limits are typically in the range of 20 to 256 hosts, with more expensive SAN equipment often required for the larger host counts. SAN filesystems tend to provide coherent, POSIX compliant, APIs, much like the parallel I/O behavior on and SMP. Some SAN filesystems (such as ADIC StorNext) can deliver guaranteed bandwidth for video streaming and other time-critical functions. OS interfaces: NAS interfaces, such as NFS and CIFS tend to be ubiquitous standards that are not usually sensitive to operating system upgrades. On the other hand SAN filesystems tend to use complex OS interfaces that may change during OS upgrades. The Lustre protocol (supported by HP SFS) currently lies somewhere between these extremes. Lustre is currently supported only on major Linux distributions (Red Hat EL 3, EL 4, and Fedora and SUSE SLES9). It requires a Linux kernel modification. These Lustre patches are expected to be included in future distributions from kernel.org. Comparing Bandwidth/Capacity with High Availability Solutions that deliver higher availability tend to deliver lower performance per dollar. These differences can by large---factors, not just fractional differences. For example HP SFS delivers over five times as much large-block sequential bandwidth per dollar compared to the Clustered Gateway solution. On the other hand Clustered Gateway maintains continuously available service for Enterprise class storage. Similarly the ADIC StorNext solution is geared for high bandwidth while the Veritas Storage Foundation solution concentrates on high availability and small-random transaction performance. In addition ADIC offers Managed Storage (transparent migration of data between hierarchical storage such as disk and tape). Summary HP SFS: Highest bandwidth (10s of GB/s), highest capacity (512 TB), 1000s of Linux clients, limited number of NFS clients, POSIX coherent I/O, resiliency geared for HPC Clustered Gateway: High bandwidth (>2 GB/s), high capacity (16 TB), high transaction rates, many Unix/Linux/Windows clients (NFS), NFS read/write locking efficiency, high availability (HA) ADIC StorNext with Managed Storage High bandwidth (10s of GB/s at SAN prices), high capacity (100s of TB), maximum 256 clients (also limited by SAN pricing), multi-OS support (HP-UX, Linux, Windows, other UNIX), POSIX coherent I/O, guaranteed bandwidth, Managed Storage (HSM) Veritas Storage Foundation Clustered File System High transaction rates, not geared for high bandwidth, maximum 16 clients, multi-OS support (HP-UX, Linux, Windows, other UNIX), POSIX coherent I/O, high availability (HA) Storage Foundation SAN Filesystem Higher Bandwidth & Capacity Guaranteed BW Greater HA & Transactions “Enterprise class” Fabric Fabric HP Tape Silos 9/19/2018
NAS Cluster Scalability SAN SMP & FC scalability SAN verses NAS Ethernet, InfiniBand, Myrinet, Quadrics NAS Cluster Scalability SAN SMP & FC scalability Large number of small compute servers Thousands of clients Optimized for price/performance Low to medium individual-client bandwidth ~100 to 600 MB/s Ubiquitous access NFS and CIFS & other protocols Small to medium number of compute servers Tens to a few hundred High bandwidth to individual large SMPs QoS: Guaranteed bandwidth Low-latency transactions Less emphasis on price/performance Comparing NAS and SAN: NAS storage works well to connect a handful to thousands of small servers, workstations, and PCs to a shared filesystem within a machine room or among continents. NAS is well suited to interconnect high-speed shared storage with multiple gigabytes-per-second (GB/s) using the new scalable storage servers such as HP SFS and Clustered Gateway. NFS and CIFS are ubiquitous NAS protocols for Unix/Linux and Windows clients (respectively). New protocols, such as the Lustre™ protocol supported by HP SFS can deliver POSIX compliant coherent interfaces with parallel read/write locking behavior similar to that found on SMPs and shared SAN filesystems. This eliminates the locking and performance issues inherent in the more ubiquitous NFS protocol. HP SFS supports the object-based Lustre protocol over GbE interconnects and over gigabyte-per-second (GB/s) interconnects such as InfiniBand, Myrinet, and Quadrics. Thus NAS can now deliver hundreds of MB/s of bandwidth to individual compute servers with aggregate cluster storage bandwidths of tens of GB/s. With these new advanced protocols and interconnects, NAS is now faster than SAN for cluster storage with dramatically superior price/performance. SAN interconnects are oriented toward serving a smaller number of larger servers over Fibre Channel fabrics. Newer SAN transports include iSCSI over Ethernet. Multiple Fibre Channel connections can sustain multiple GB/s of bandwidth to individual SMPs. (NAS does not yet support multiple GB/s connections to individual SMPs.) SAN scalability tends to be limited to 256 compute servers or less. The price of SAN interconnects is a practical limit, given SAN costs rise as the scalability increases. Many SAN storage devices limit the number of hosts to which they will speak. These limits are typically in the range of 20 to 256 hosts, with more expensive SAN equipment often required for the larger host counts. SAN filesystems tend to provide coherent, POSIX compliant, APIs, much like the parallel I/O behavior on and SMP. Some SAN filesystems (such as ADIC StorNext) can deliver guaranteed bandwidth for video streaming and other time-critical functions. OS interfaces: NAS interfaces, such as NFS and CIFS tend to be ubiquitous standards that are not usually sensitive to operating system upgrades. On the other hand SAN filesystems tend to use complex OS interfaces that may change during OS upgrades. The Lustre protocol (supported by HP SFS) currently lies somewhere between these extremes. Lustre is currently supported only on major Linux distributions (Red Hat EL 3, EL 4, and Fedora and SUSE SLES9). It requires a Linux kernel modification. These Lustre patches are expected to be included in future distributions from kernel.org. Comparing Bandwidth/Capacity with High Availability Solutions that deliver higher availability tend to deliver lower performance per dollar. These differences can by large---factors, not just fractional differences. For example HP SFS delivers over five times as much large-block sequential bandwidth per dollar compared to the Clustered Gateway solution. On the other hand Clustered Gateway maintains continuously available service for Enterprise class storage. Similarly the ADIC StorNext solution is geared for high bandwidth while the Veritas Storage Foundation solution concentrates on high availability and small-random transaction performance. In addition ADIC offers Managed Storage (transparent migration of data between hierarchical storage such as disk and tape). Summary HP SFS: Highest bandwidth (10s of GB/s), highest capacity (512 TB), 1000s of Linux clients, limited number of NFS clients, POSIX coherent I/O, resiliency geared for HPC Clustered Gateway: High bandwidth (>2 GB/s), high capacity (16 TB), high transaction rates, many Unix/Linux/Windows clients (NFS), NFS read/write locking efficiency, high availability (HA) ADIC StorNext with Managed Storage High bandwidth (10s of GB/s at SAN prices), high capacity (100s of TB), maximum 256 clients (also limited by SAN pricing), multi-OS support (HP-UX, Linux, Windows, other UNIX), POSIX coherent I/O, guaranteed bandwidth, Managed Storage (HSM) Veritas Storage Foundation Clustered File System High transaction rates, not geared for high bandwidth, maximum 16 clients, multi-OS support (HP-UX, Linux, Windows, other UNIX), POSIX coherent I/O, high availability (HA) Fibre Channel, iSCSI (Ethernet) 9/19/2018July 2005 UCP Data Management AR for HP SFS 2.0
Typical SAN Filesystems ADIC StorNext Many OSes, high bandwidth, rich media, guaranteed bandwidth, managed storage (HSM to tape), up to 256 clients Red Hat GFS Linux, targeted at commercial HA, up to 256 clients PolyServe Matrix SAN FS HA, Linux or Windows, 2 to16 clients Symantec (Veritas) Storage Foundation Cluster Filesystem HA, HP-UX and Linux, 2 to 16 clients, integrated with ServiceGuard SAN Fabric GFS 9/19/2018July 2005 UCP Data Management AR for HP SFS 2.0
HP StorageWorks Scalable File Share Scalable High-Bandwidth Storage Appliance for Linux Clusters Higher Cluster Throughput: Solves the I/O Bottleneck for Linux Clusters Scalable bandwidth 200 MB/s to 35 GB/s (more by request) Scalable capacity 2 TB to 512 TB (more by request) Scalable price/performance 2X to 5X bandwidth/$ price/performance advantage Scalable connectivity Tens to thousands of compute clients Scalable resiliency Flexible reliability choices Flexible redundancy choices Scalable simplicity Integrated storage appliance Engineered, delivered & supported worldwide by HP A Cluster of Data Servers Forming a Single Virtual Scalable File Server [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{726638BE-B5BF-44CD-9819-0E8172FCEDC5}} Scalable Bandwidth Linux Cluster 9/19/2018
$ X X HP SFS Attributes Maximize Minimize Price &TCO Parallel bandwidth Capacity Resiliency/reliability/redundancy Ease of use with one big, fast and easy filesystem Minimize Price &TCO By clustering low cost, standard, scalable components X X $ 9/19/2018
HPC Bandwidth & Scalability verses Enterprise HA & Transactions HP SFS Clustered Gateway Scalable Linux Storage Scalable NFS Specific HPC class storage Emphasis on bandwidth, scalability, and price High reliability required General Enterprise class storage Emphasis on high availability and transaction counts High reliability required Higher Bandwidth & Capacity “HPC class” Greater HA & Transactions “Enterprise class” Comparing NAS and SAN: NAS storage works well to connect a handful to thousands of small servers, workstations, and PCs to a shared filesystem within a machine room or among continents. NAS is well suited to interconnect high-speed shared storage with multiple gigabytes-per-second (GB/s) using the new scalable storage servers such as HP SFS and Clustered Gateway. NFS and CIFS are ubiquitous NAS protocols for Unix/Linux and Windows clients (respectively). New protocols, such as the Lustre™ protocol supported by HP SFS can deliver POSIX compliant coherent interfaces with parallel read/write locking behavior similar to that found on SMPs and shared SAN filesystems. This eliminates the locking and performance issues inherent in the more ubiquitous NFS protocol. HP SFS supports the object-based Lustre protocol over GbE interconnects and over gigabyte-per-second (GB/s) interconnects such as InfiniBand, Myrinet, and Quadrics. Thus NAS can now deliver hundreds of MB/s of bandwidth to individual compute servers with aggregate cluster storage bandwidths of tens of GB/s. With these new advanced protocols and interconnects, NAS is now faster than SAN for cluster storage with dramatically superior price/performance. SAN interconnects are oriented toward serving a smaller number of larger servers over Fibre Channel fabrics. Newer SAN transports include iSCSI over Ethernet. Multiple Fibre Channel connections can sustain multiple GB/s of bandwidth to individual SMPs. (NAS does not yet support multiple GB/s connections to individual SMPs.) SAN scalability tends to be limited to 256 compute servers or less. The price of SAN interconnects is a practical limit, given SAN costs rise as the scalability increases. Many SAN storage devices limit the number of hosts to which they will speak. These limits are typically in the range of 20 to 256 hosts, with more expensive SAN equipment often required for the larger host counts. SAN filesystems tend to provide coherent, POSIX compliant, APIs, much like the parallel I/O behavior on and SMP. Some SAN filesystems (such as ADIC StorNext) can deliver guaranteed bandwidth for video streaming and other time-critical functions. OS interfaces: NAS interfaces, such as NFS and CIFS tend to be ubiquitous standards that are not usually sensitive to operating system upgrades. On the other hand SAN filesystems tend to use complex OS interfaces that may change during OS upgrades. The Lustre protocol (supported by HP SFS) currently lies somewhere between these extremes. Lustre is currently supported only on major Linux distributions (Red Hat EL 3, EL 4, and Fedora and SUSE SLES9). It requires a Linux kernel modification. These Lustre patches are expected to be included in future distributions from kernel.org. Comparing Bandwidth/Capacity with High Availability Solutions that deliver higher availability tend to deliver lower performance per dollar. These differences can by large---factors, not just fractional differences. For example HP SFS delivers over five times as much large-block sequential bandwidth per dollar compared to the Clustered Gateway solution. On the other hand Clustered Gateway maintains continuously available service for Enterprise class storage. Similarly the ADIC StorNext solution is geared for high bandwidth while the Veritas Storage Foundation solution concentrates on high availability and small-random transaction performance. In addition ADIC offers Managed Storage (transparent migration of data between hierarchical storage such as disk and tape). Summary HP SFS: Highest bandwidth (10s of GB/s), highest capacity (512 TB), 1000s of Linux clients, limited number of NFS clients, POSIX coherent I/O, resiliency geared for HPC Clustered Gateway: High bandwidth (>2 GB/s), high capacity (16 TB), high transaction rates, many Unix/Linux/Windows clients (NFS), NFS read/write locking efficiency, high availability (HA) ADIC StorNext with Managed Storage High bandwidth (10s of GB/s at SAN prices), high capacity (100s of TB), maximum 256 clients (also limited by SAN pricing), multi-OS support (HP-UX, Linux, Windows, other UNIX), POSIX coherent I/O, guaranteed bandwidth, Managed Storage (HSM) Veritas Storage Foundation Clustered File System High transaction rates, not geared for high bandwidth, maximum 16 clients, multi-OS support (HP-UX, Linux, Windows, other UNIX), POSIX coherent I/O, high availability (HA) Service interruptions for maintenance OK Standard backup/restore OK New technologies preferred Lustre, SFS20 SATA disks Uninterrupted service important Business continuity important Rapid recovery Proven technologies preferred NFS, EVA FC disks 9/19/2018
HP StorageWorks Enterprise File Services Clustered Gateway Bringing file serving availability, scalability and performance to HP’s storage solutions Availability All nodes monitoring health Fully transparent failover – preserving client state information Scalability and performance Additional nodes can be added for nearly linear increase in performance Supports up to 16 nodes No forklift upgrades Storage scalability Customer can meet both storage and file serving scalability needs Snapshot integration with EVA storage arrays HP Clustered File System There is no need to have a single point of failure in a HP StorageWorks Enterprise File Services Clustered Gateway. All paths and elements can be made redundant: the servers, the connections to the SAN, the FC switches, the connection to the storage, and the storage itself. The HP StorageWorks Enterprise File Services Clustered Gateway solution is fully distributed and symmetrical. Ie there is no one server or element that can be a central bottleneck or single point of failure. HP StorageWorks Enterprise File Services Clustered Gateway supports MultiPath IO for both redundancy and performance. Not the animation that depicts how a failover occurs – the file serving is failed over to all of the remaining nodes! 9/19/2018
HP StorageWorks Enterprise File Services Clustered Gateway (contd) Bringing file serving availability, scalability and performance to HP’s storage solutions Client SAN Fabric LAN HP EFS cluster architecture Clustered file system delivers Global namespace across the cluster Load balancing of mount requests Data and cache coherency are maintained with single or multiple failures Automatic detection of NFS services failures Fabric and LUN discovery Ability to configure stripe size for performance and workload and stripe across multiple storage arrays with optional HP StorageWorks Clustered Volume Manager Software There is no need to have a single point of failure in a HP StorageWorks Enterprise File Services Clustered Gateway NAS Cluster. All paths and elements can be made redundant: the servers, the connections to the SAN, the FC switches, the connection to the storage, and the storage itself. The HP StorageWorks Enterprise File Services Clustered Gateway solution is fully distributed and symmetrical. Ie there is not one server or element that can be a central bottleneck or single point of failure. HP StorageWorks Enterprise File Services Clustered Gateway supports MultiPath IO for both redundancy and performance. Customer advantages are clear – global namespace! Get rid of islands of storage that are hard to manage and control. Load balancing – all nodes are utilized and contribute to performance Data integrity – data and cache coherency are maintained Detection of NFS failure – typically happens in less than 5 seconds. Compare this with competitors who can take 30 seconds to minutes to detect! Clustered Volume Manager Software - see later slides for a more complete discussion – configurable striping and the ability to stripe across arrays – this is a powerful! Cost --- industry standard --- a clear cost advantage! Built with industry standard components the $ per MB/sec performance provides a clear cost advantage 9/19/2018
HP SFS Focus Fast storage for scalable computing Linux Compute Farm HP SFS Fast storage for scalable computing Clusters and server farms Linux Lustre: Leading open-source, Linux, object-based filesystem standard 9/19/2018
HP SFS 2.0 superior price, performance, and scalability Scalable Price and Performance 2 4 6 8 10 Bandwidth (GB/s) Relative List Price HP SFS w/SFS20 Scalable NFS w/MSA1000 … 35,000 MB/s [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{454A3D55-D8AE-477B-9A8E-23C2EE9F6E79}} HP SFS: 5x better bandwidth, same price HP SFS: - > 5x price/performance for bandwidth - > 10x scalability HP SFS Over 10x scalability NFS Clustered Gateway (Scalable NFS): Higher availability - Higher transaction rates - Good scalability and price/performance 9/19/2018
Typical HP SFS Installation: Large Linux Cluster, TFLOPS of computing, GB/s bandwidth,10 or 100s of TB capacity High Speed Interconnect (GbE, InfiniBand, Myrinet, Quadrics) HP SFS Cluster OSS Compute Compute Compute OSS Compute Compute Compute … … … … Multiple OSS nodes Compute Compute Compute Compute Compute Compute OSS Compute Compute Admin OSS Compute Compute Admin MDS Compute Compute Login Admin [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{F37BDDB0-7954-48A8-84A8-1292E33EBB61}} Compute Compute Login Lustre Login Many Compute Nodes Login Connectivity to all nodes Connectivity to all nodes Connectivity to all nodes Campus Network GigE ethernet for boot and system control traffic Connectivity to all nodes Connectivity to all nodes Connectivity to all nodes 10/100 Ethernet out-of-band management (power on/off, etc) 9/19/2018
Lustre Capabilities Overview Lustre is a parallel-scalable-distributed filesystem designed to serve the most demanding high-performance-technical-computing (HPTC) environments Sometimes called the “inter-galactic” filesystem for its extremely high scalability, performance, and availability goals Designed for very high scalability: Thousands of compute client nodes Petabytes of storage Hundreds of gigabytes per second bandwidth With full coherence and high reliability Lustre delivers fast I/O with very high scalability Scalable in speed (GB/s of bandwidth) Scalable in capacity (terabytes or petabytes) 9/19/2018
Lustre Capabilities Overview Designed for full resiliency No single points of failure Journaling, fail-over, redundancy, etc. Designed to manage the storage independently of the client operating systems As with NAS (CIFS or NFS), the compute clients need not know the details of managing the Lustre storage Not only is It fast, but it is reliable and it simplifies storage management, much as NFS delivers simplified storage management. 9/19/2018
HP Lustre-based File Server Highly scalable technical-computing storage at DAS prices Clients (Linux ProLiant or Integrity) Clients (Linux ProLiant or Integrity) Clients (Linux ProLiant or Integrity) System & Parallel File I/O File locking Directory, Metadata & Concurrency Object storage Targets (OSS) (Linux ProLiant) Metadata servers (MDS) (Linux ProLiant) Object storage Targets (OSS) (Linux ProLiant) Recovery File status File creation Metadata servers (MDS) (Linux ProLiant) Object storage Targets (OSS) (Linux ProLiant) Object storage Targets (OSS) (Linux ProLiant) Attached to two ProLiants for fail-over resiliency Minimize single-points of failure (with low cost) 9/19/2018
Lustre Server Logical View Lustre clients running on Linux cluster nodes Cluster Interconnect Myrinet, Quadrics, Gb Ethernet OSS Nodes Lustre File System MDS Nodes & Management Storage Interconnect Cntrl A B Cntrl A B Cntrl A B Cntrl A B Cntrl A B 9/19/2018
V1 HP-Lustre: Protocols Lustre protocols are supported over the Portals intermediate layer on Quadrics, Myrinet, and Ethernet (TCP/IP) to Lustre clients on Linux servers Lustre-Portals protocols on Quadrics and Myrinet Optimized RDMA protocols OS-bypass, low-latency, low overhead Lustre-Portals protocols on Ethernet Standard TCP/IP software stacks Lower price but with higher latency and overhead on standard TCP/IP software stacks Optimized NFS access is not in the scope of v1 NFS access is supported, but not at high bandwidth Optimized CIFS access is not in the scope of v1 Optimized grid-protocols are not in the scope of v1 9/19/2018
Simplified View: Lustre and NFS Clients Engine NFS Client lustre c-s portals NFS v3 ip Lustre over Portals on GigE, Quadrics, Myrinet Scalable performance using Lustre-protocol - Low overhead on Quadrics and Myrinet Supports multiple interconnects simultaneously E.g., Myrinet and Ethernet NFS over TCP/IP on Gigabit Ethernet NFS supported, but not at high bandwidth NFS performance is not a focus in the initial product releases HP-Lustre not initially targeted as a NAS server 9/19/2018
Lustre File system protocol view OS Bypass File I/O Lustre Client File System Metadata WB cache OBD Client MDS Lock Networking Recovery CLIENT Device (Elan,TCP,…) Portal NAL’s System & Parallel File I/O, File Locking Portal Library NIO API Directory Metadata & Concurrency Request Processing OST MDS Networking Recovery Load Balancing MDS Server Lock Ext3, Reiser, XFS, … FS Recovery Networking Client Object-Based Disk Server (OBD server Lock Server Recovery, File Status, File Creation Ext3, Reiser, XFS,… FS Fibre Channel Fibre Channel 9/19/2018
Lustre Combines the Best of SAN and NAS Shared data (as with NAS) High bandwidth, low overhead, resilient access (as with SAN) High scalability (much higher than NAS) Storage managed independently of client hosts (as with NAS) Highly resilient StorageWorks SAN/NAS convergence: Can use existing SAN storage (EVA) Designed to work with multiple interconnects Can use existing message-passing interconnects Gb Ethernet, 10 Gb Ethernet, Quadrics, Myrinet, … Lower cost than connecting Fibre Channel to hundreds of compute clients Lustre Solution system support System area network Sys Admin Network I/O Compute Farm Dedicated resources For Metadata service and lock management 9/19/2018
HP SFS20 storage on InfiniBand Inexpensive HW virtualized with Lustre GbE IB GbE Per OSS pair: 16 TB usable capacity 8 active SFS20s 2 TB per active SFS20 1180 MB/s read bandwidth per OSS pair 800 MB/s write bandwidth per OSS pair Low cost: No expensive SAN components Resilient-redundant fail-over via inexpensive SCSI cables Plugs into existing IB switches OSS Pair DL380 SmartArray 6404 x2 DL380 SmartArray 6404 x2 SFS20 SFS20 SFS20 [Enter any extra notes here; leave the item ID line at the bottom] Avitage Item ID: {{9DC3E51D-0981-41A7-8EF4-C0B76D1863BE}} SCSI cables (two per SFS20) SFS20 SFS20 SFS20 SFS20 SFS20 9/19/2018
HP SFS Smart Cell Pairs ProLiant Servers, Resilient Storage, Interconnect HP SFS software with Lustre protocol 600 MB/s and 8 TB per modular data server DL380 GbE or other high-speed interconnects [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{FCB6F902-4F5F-448B-817A-C81447FA5BED}} DL380 Resilient Pair of Smart Cells 1.2 GB bandwidth, 16 TB Linux Cluster 9/19/2018
StorageWorks SFS20 Disk Array Designed specifically to work with Lustre for excellent performance Reliable, higher bandwidth at a lower cost SFS20 with Lustre enables new low cost, HPC highly reliable, high bandwidth storage 100s of TB @ multiple GB/s in HP SFS configurations One to four SFS20s per data smart cell server (OSS) Performance and capacity per SFS20 array: 190 MB/s sustained read and 110 MB/s write (RAID 6) On InfiniBand and Quadrics ELAN4 1.6 TB or 2 TB usable Under $100 per MB/s and under $6000 per TB (large configurations) Recent Change: Use RAID 6 (ADG) for all new SFS20 sales 160 or 250 GB SATA: 9 data disks, 2 rotating parity, 1 hot spare = 12 disks RAID 6 performance competitive with RAID5 Large increase in protection for small performance decrease Protects against two simultaneous disk failures Faster rebuilds with lower overhead during rebuild Very low cost increase (one more SATA disk per SFS20 enclosure) RAID 6 has the same read performance and 10% slower write performance (compared to RAID 5) RAID 6+1 option delivers ultra high reliability at low prices Multi-layer redundancy: RAID parity plus mirroring HP is the only tier-1 company launching Lustre-based products. Customers wanting fast Linux I/O should turn to HP for this solution. We have invested heavily and early in Lustre technology. We are leaders in the marketplace with a long history in working on Lustre. A proof point is we are the prime contractor to the DoE for their Lustre project. Another point to emphasize is that Lustre is the strongest technology available for fast I/O on Linux clusters. IBM, SGI, Sun, and others do not have any technology to match Lustre’s capabilities. [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{3E73BBAD-5C55-4057-A945-64BB74216ECC}} 9/19/2018
HP StorageWorks Lustre-based Storage Summary HP HPTCD is the source for high-quality, highly- scalable Lustre products and support. The future for scalable, high bandwidth, parallel, high capacity, resilient, filesystems Open Source Backed by HP HP is leading in delivering solutions for high-bandwidth, scalable, shareable storage for Linux clusters. 9/19/2018
Adaptive Scalable Bandwidth Grow again > 2 GB/sec Bandwidth Grow again > 1 GB/sec Bandwidth Add smart cells as needed Double the cells, Double the bandwidth Grow again > 3 GB/sec Bandwidth Start Small, Grow as needed Data cells Metadata Management cells Linux Cluster 24 Smart Cells 2.4 GB/sec 4 Smart Cells 400 MB/sec 8 Smart Cells 800 MB/sec 16 Smart Cells 1.6 GB/sec 32 Smart Cells 3.2 GB/sec 9/19/2018
HP SFS Storage Appliance Value StorageWorks SFS20 HPC-oriented storage Excellent price/performance. Resilient hardware and system design No single points of failure. Multiple resiliency options. Resilient SATA, with twice the performance per dollar of Fiber Channel. We’re implementing this vision at a number of sites around the world. SSCK is a research supercomputing center where we’re implementing a cluster based on a Cluster Platform 6000 based on Integrity rx2600 nodes, with the XC software stack, and HP SFS. NCHC in Taiwan combines a cluster of rx2600s with 2 64-way Superdomes, to provide a mix of computing styles to support a variety of applications BMW has a range of HP systems that support their varied CAE application requirements, including SMPs and Clusters, HP-UX and Linux, etc. Sandia is building a large ProLiant (Xeon/em64t) cluster to support general computing problems HP is the first commercial member of CERN’s grid consortium, showing our commitment to extending the computing environment within a unified cluster to span larger compute grids [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{CA34636A-BCB6-406F-9EE5-82AB400AB834}} Lustre Enhanced ease of use Install HP SFS in hours compared to days to install CFS- Lustre Higher performance, lower initial cost, lower TCO, and HP world wide service and support. 9/19/2018
X-Large HP SFS/SFS20 Configuration 240 TB usable, 18 GB/s read, 12 GB/s write 4 Smart Cells Per Cabinet $1,500,000 list HP is the only tier-1 company launching Lustre-based products. Customers wanting fast Linux I/O should turn to HP for this solution. We have invested heavily and early in Lustre technology. We are leaders in the marketplace with a long history in working on Lustre. A proof point is we are the prime contractor to the DoE for their Lustre project. Another point to emphasize is that Lustre is the strongest technology available for fast I/O on Linux clusters. IBM, SGI, Sun, and others do not have any technology to match Lustre’s capabilities. Expansion Cabs Base Cab Expansion Cabs Base cabinet: 2 MDSes, 2 OSS, switches, console 16 TB usable, 1.2 GB/s read, 0.8 GB/s write (on ELAN4) Expansion cabinets: 4 OSSes w/4 SFS20s per OSS Per cabinet: 32 TB usable, 2.4 GB/s read, 1.6 GB/s write (ELAN4) Total depicted: 2 MDSes and 30 OSSes (on ELAN4) $6,250/TB capacity, $85/MB/s bandwidth 9/19/2018
SATA vs. FC and SCSI Disk Attributes $ Speed (bandwidth per $): FC: 15,000 RPM SATA: 4x7500 = 30,000 RPM higher bandwidth Density (capacity): FC: 36, 72, and 144 GB for 2x to 4x the price SATA: 160 and 250 GB higher capacity Resiliency (redundancy and fail over): RAID-5 (single parity) or RAID-6 (dual parity) SATA (SFS20): RAID-5, RAID-6, RAID-5+1, RAID-6+1, RAID-5 with dual hot spare, higher resiliency X X X X X X 9/19/2018
Sample HP SFS deployments University consortium in South Central Ontario, Canada Scientific Supercomputer Center at Karlsruhe [Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{F0FD5D4F-0BF7-4FAE-ADFA-6D2459276CDD}} Germany Taiwan Swedish defense and research agency 9/19/2018
Nastran is Fast on HP SFS Replaces extra-disk fat-nodes with flexible storage Nastran requires fast I/O Traditional approach: Special nodes in the cluster With multiple local JBOD disks Expensive and hard to manage New approach Use fast centralized HP SFS filesystem Similar performance Lower cost Shared rather than dedicated storage Easier to use Any node in the cluster can run Nastran Studying similar approach for Fluent and Abaqus 9/19/2018
[Enter any extra notes here; leave the item ID line at the bottom] Avitage! Item ID: {{2CA9AB1C-81F0-423A-AA65-BDB06BECC2B9}}