NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Capability Computing – High-End Resources Wayne Pfeiffer Deputy Director NPACI & SDSC NPACI Review July 21, 1999
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Compute resources are at 6 sites U Michigan U Texas Caltech SDSC UC Berkeley U Virginia
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Complementary roles of 6 compute resource sites Leading-edge site (UCSD/SDSC) Very high-performance resources: first teraflops system for U.S. academic community Mid-range sites (U Texas & U Michigan) Smaller systems compatible with LES Support for apps with limited scalability, large-memory jobs, apps development, OS testing, & education Alternate architecture & cluster sites (Caltech, UC Berkeley, UCSD, & U Virginia) Support for leading-edge apps, thrust software development, & evaluation
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE NPACI has high-end computers of exceptional capability Strategic partnerships with IBM, HP, Sun, & Tera Early delivery of very large systems First teraflops IBM SP with 8-way nodes to be at SDSC Largest systems from HP, Sun, & Tera at Caltech & SDSC Deep discounts or outright donations plus other leverage Allocable systems 7 -> 10 systems at 4 sites in FY99 -> FY00 to provide increased diversity Redeployment of SP from SDSC to U Michigan & U Texas Additional systems through extensive leverage 10 systems at 5 sites in FY00
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Evolution of allocable computers
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Teraflops IBM SP coming to SDSC in CY99 Cluster of next-generation SMP nodes 1,184 Power3 processors at 222 MHz: 1.05 teraflops way nodes way nodes 640 GB of memory Current generation switch initially Staged installation 28 2-way nodes in June for software development 4 8-way nodes in July way nodes in August Full teraflops in fall of CY99 Switch upgrade in early CY00 Attractive base for upgrade to 5 teraflops
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE NPACI’s road to 5 teraflops
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Caltech & HP offer alternate path to teraflops based on IA-64 & large shared-memory domains CY99: transition to multi-node HP-UX V2250: 32 PA-8200 processors (240 MHz) installed early CY99 V2500: 128 PA-8500 processors (440 MHz) & 128 GB of memory coming soon in two 64-way ccNUMA domains CY00-CY02: evaluation of SuperDome scalability Next-generation architecture with PA-RISC or IA-64 processors 64-way SMP or 256-way ccNUMA domains CY00: 64 PA-8600 processors CY01: 128 PA-8700 processors CY02: teraflops system with IA-64 McKinley processors Earliest large systems from HP through strategic partnership & leverage from NASA
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Sun & SDSC are also exploring path to teraflops based on large shared-memory domains Previous & current systems (deeply discounted) Wildfire: 28-processor ccNUMA system for scalability testing HPC 10000: 32-way SMP (333 MHz) for data serving Coming systems (donated) HPC 10000: 64-way SMP (400 MHz) with 64 GB of memory for SAC projects and allocated use; coming in August Starcat: 72-way SMP with 72 GB of memory for evaluating potential scalability to teraflops; only alpha system outside of Sun; coming early in CY00 Exceptional opportunity to work with Sun through strategic partnership
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE First Tera MTA is at SDSC Characteristics Multithreaded architecture Shared memory 8 processors now going to 16 later this calendar year Benefits Reduced programming effort: single parallel model for one or many processors Good scalability Leveraged funding
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Tera MTA is competitive with CRAY T90 & has better scalability for PULSE3D heart model
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE NPACI cluster project was initiated To exploit the benefits of clusters High performance Very attractive price/performance Widespread interest within scientific computing community To build upon NPACI expertise and capability
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Near-term activities in cluster project Participate in workshop in early August To develop strategy for creating, enhancing, & maintaining software for high-end production clusters To obtain agreement on priorities & responsibilities with leaders from NPACI, the Alliance, & the broader high-end computing community Develop additional functionality & interoperability To help users move between clusters & other systems To help system administrators manage clusters Use clusters & evaluate their capabilities To advise users on suitability of clusters vs other systems To guide future resource acquisitions, e.g., very large clusters
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE NPACI is a pioneer in high-performance mass storage Large HPSS installations at SDSC & Caltech FY99: 100 TB stored at SDSC, the most by HPSS FY00: 200 TB expected at SDSC, 100 TB at Caltech Performance & capacity upgrades via hardware FY99: larger SP servers, additional STK silos, & new HPGNs at SDSC & Caltech FY00: tape drives that are faster & higher density at SDSC Stability upgrade via software FY99: HPSS 3.2 at SDSC resulting in reduced down time HPSS metadata backups between SDSC & Caltech FY99: by tape FY00: by CalREN-2 (at OC-12: 622 Mbps)
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE HPSS down time at SDSC is much lower since software upgrade
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Applications that require high-speed networks for collaborative research & building the Grid Telescience: remote control of scientific instruments between the U.S. and Japan (vBNS at OC-12) Ingestion of molecular structure data from SLAC to the PDB at SDSC (NTON at 4xOC-48) Federation of clusters at UCSD and Caltech (NTON at 4xOC-48) Backup of HPSS metadata between SDSC and Caltech (CalREN-2 at OC-12) Backup of NCSA’s large disk array by HPSS at SDSC (vBNS at OC-12 -> vBNS+ at OC-48)
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Networking improvements are needed to realize benefits of capability computing Better connectivity for LES to NTON at 4xOC-48 LES to vBNS+ at OC-48 LES to Abilene at OC-12 Caltech to vBNS at OC-12 U Texas to vBNS at OC-3 Other partners in out years More networking support for Applications and network tuning together with CAIDA & NLANR Engineering to design, implement, & integrate networking upgrades Security to maintain secure access & foster best practices throughout partnership