Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop
Agenda Supercomputing 2004 Conference Application –Ultimate Integration Resource Overview Did it work? What did we take from it?
Supercomputing 2004 Annual Conference –Supercomputers –Storage Network hardware –Original reason for application Bandwidth Challenge –Didn’t apply due to time
Application Requirements Runs on Lemieux (PSC’s supercomputer) Application Gateways (AGW) Cisco CRS-1 –40Gb/sec OC-768 cards –Few exist Single application Be used with another demo on the show floor if possible
Ultimate Integration Application Checkpoint Recovery System –Program Garden variety Laplace solver instrumented to save its memory state in checkpoint files Checkpoints memory to remote network clients Runs on 34 Lemieux nodes
Lemieux TCS System 750 Compaq Alphaserver ES45 nodes –SMP Four 1GHz Alpha Processors 4 GB of Memory Interconnection –Quadrics Cluster Interconnect Shared memory library
Application Gateways 750 GigE connections are very expensive Reuse Quadrics network to attach cheap Linux boxes with GigE –15 AGWS Single processor Xeons 1 Quadrics card 2 Intel GigE –Each GigE card maxes out at 990Mb/sec –Only need 30 GigE to fill link to Teragrid Web100 kernel
Application Gateways
Network Cisco 6509 –Sup720 –WS-X6748-SFP –Two WS-X GE Used 4 10GE interfaces OSPF load balancing was my real worry – >30 GE streams over 4 links
Network Cisco CRS-1 –40 Gb/sec slot –16 slots –For Demo Two OC-768 cards –Ken Goodwin’s and Kevin McGratten’s big worry was the OC-768 transport Two 8 Port 10 GE cards –Running production IOS-XR code –Had problems with tracking hardware Ran both without 2 Switching Fabrics with no effects on traffic
Network Cisco CRS-1 –One at Westinghouse Machine Room –One on show floor Fork lift needed to place it –7 feet tall –939 lbs empty –1657 lbs fully loaded
The Magic Box Stratalight – OTS 4040 transponder “compresses” the 40Gbs signal to fit into the spectral bandwidth of a traditional 10G wave – Uses proprietary encoding techniques The Stratalight transponder was connected to the Mux/DMUX of the as an alien wavelength
Time Dependences OC-768 wasn’t worked on until one week before the conference
OC-768
Where Does the Data Land? Lustre Filesystem – Developed by Cluster File Systems – POSIX compliant, Open Source, parallel file system Separates metadata and data objects to allow for speed and scaling
The Show Floor 8 Checkpoint Servers with a 10GigE and Infiniband connections 5 Lustre OSTs connected via Infiniband with 2 SCSI disk shelves (RAID5) Lustre meta-data server (MDS) connected via Infiniband
The Show Floor
The Demo
How well did it run? Laplace Solver w/ Checkpoint Recovery –Using 16 Application Gateways (32 GigE connections): 31.1Gbs Only 32 Lemieux nodes were available IPERF –Using 17 Application Gateways + 3 single GigE attached machines: 35 Gbs Zero SONET errors reported on interface Over 44TB were transferred
The Team
Just Demoware? AGWs –qsub command now has AGW option Can do accounting (and possibly billing) Mysql database with Web100 stats –Validated that AGW was cost effective solution OC-768 Metro can be done by mere mortals
Just Demoware?? Application receiver –Laplace solver ran at PSC –Checkpoint receiver program tested / run at both NCSA and SDSC Ten IA64 compute nodes as receiver ~10 Gb/sec Network to Network (/dev/null) –990 Mb/sec * 10 streams
Thank You