Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan L. Miller IEEE/NASA Goddard Conference on Mass Storage Systems and Technologies April 2004 Henry Chen September 24, 2010
Introduction High-performance storage systems –Petabytes (2 50 bytes) of data storage –Supply hundreds or thousands of compute nodes –Aggregate system bandwidth >100GB/s Performance should scale with capacity Large individual storage systems –Require high-speed network interface –Concentration reduces fault tolerance
Proposal Follow high-performance computing evolution –Multi-processor networks Network of commodity devices Use disk + 4 12port 1GbE switch as building block Explore & simulate interconnect topologies
Commodity Hardware Network –1Gb Ethernet: ~$20 per port –10Gb Ethernet: ~$5000 per port (25x per Gb per port) ● Aside: Now ~$1000 per port Disk drive –ATA/(SATA) –FibreChannel/SCSI/(SAS)
Setup Target 100GB/s bandwidth Build system using 250GB drives (2004) –4096 drives to reach 1PB –Assume each drive has 25MB/s throughput 1Gb link supports 2 3 disks 10Gb link supports ~25 disks
Basic Interconnection 32 disks/switch Replicate system 128x –4096 1Gb ports –128 10Gb ports ~Networked RAID0 Data local to each server
Fat Tree Gb ports Gb ports –2048 switch to router (128 Sw × 8 Rt × 2) –112 inter-router –256 server to router (×2) Need large, multi-stage routers ~$10M for 10Gb ports
Butterfly Network Need “concentrator” switch layer Each network level carries entire traffic load Only one path between any two server and storage
Mesh Routers to servers at mesh edges Gb links Routers only at edges; mesh provides path redundancy
Torus Mesh with edges wrapped around Reduces average path length No edges; dedicated connection breakout to servers
Hypercube Special-case torus Bandwidth scales better than mesh/torus Connections per node increases with system Can group devices into smaller units and connect with torus
Bandwidth Not all topologies actually capable of 100GB/s Maximum simultaneous bandwidth Link speed × number of links Average hops
Analysis Embedding switches in storage fabric uses fewer high-speed ports, but more low-speed ports
Router Placement in Cube-Styles Routers require nearly 100% bandwidth of links Adjacent routers cause overload & underload Use random placement; optimization possible?
Conclusions Build multiprocessor-style network for storage Commodity-based storage fabrics can be used to improve reliability and performance; scalable Rely on large number of lower-speed links; limited number of high-speed links where necessary Higher-dimension torii (4-D, 5-D) provides reasonable solution for 100GB/s from 1PB