Douglas Doerfler Sandia National Labs April 13th, 2004 SOS8 Charleston, SC “Big” and “Not so Big” Iron at SNL.

Slides:



Advertisements
Similar presentations
DOE ASCI TeraFLOPS Rejitha Anand CMPS Accelerated Strategic Computing Initiative Large, complex, multifaceted, highly integrated research and development.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART.
PARAM Padma SuperComputer
Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Beowulf Supercomputer System Lee, Jung won CS843.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Page 1 Dorado 400 Series Server Club Page 2 First member of the Dorado family based on the Next Generation architecture Employs Intel 64 Xeon Dual.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager ICCD ADH for Advanced Technology Lawrence Livermore.
Featured attraction: Computers for Doing Big Science Bill Camp, Sandia Labs 2nd Feature: Hints on MPP computing.
Bill Camp, Jim Tomkins & Rob Leland.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
The Computational Plant 9 th ORAP Forum Paris (CNRS) Rolf Riesen Sandia National Laboratories Scalable Computing Systems Department March 21, 2000.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
CSC Site Update HP Nordic TIG April 2008 Janne Ignatius Marko Myllynen Dan Still.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Why Linux is a Bad Idea as a Compute Node OS (for Balanced Systems) Ron Brightwell Sandia National Labs Scalable Computing Systems Department
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
Stuart Cunningham - Computer Platforms COMPUTER PLATFORMS Network Operating Systems Week 9.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Computer Systems Lab The University of Wisconsin - Madison Department of Computer Sciences Linux Clusters David Thompson
National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Cluster Software Overview
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Tackling I/O Issues 1 David Race 16 March 2010.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
DIT314 ~ Client Operating System & Administration
Appro Xtreme-X Supercomputers
Is System X for Me? Cal Ribbens Computer Science Department
NCSA Supercluster Administration
SiCortex Update IDC HPC User Forum
Cluster Computers.
Presentation transcript:

Douglas Doerfler Sandia National Labs April 13th, 2004 SOS8 Charleston, SC “Big” and “Not so Big” Iron at SNL

SNL CS R&D Accomplishment Pathfinder for MPP Supercomputing NCUBE PARAGON ASCI-Red Cplant Sandia successfully led the DOE/DP revolution into MPP supercomputing through CS R&D –nCUBE-10 –nCUBE-2 –IPSC-860 –Intel Paragon –ASCI Red –Cplant … and gave DOE a strong, scalable parallel platforms effort Computing at SNL is an Applications success (i.e., uniquely-high scalability & reliability among FFRDC’s) because CS R&D paved the way Note: There was considerable skepticism in the community that MPP computing would be a success ICC Red Storm

Our Approach Large systems with a few processors per node Message passing paradigm Balanced architecture Efficient systems software Critical advances in parallel algorithms Real engineering applications Vertically integrated technology base Emphasis on scalability & reliability in all aspects

A Scalable Computing Architecture

ASCI Red 4,576 compute nodes –9,472 Pentium II processors 800 MB/sec bi-directional interconnect 3.21 Peak TFlops –2.34 TFlops on Linpack –74% of peak –9632 Processors TOS on Service Nodes Cougar LWK on Compute Nodes 1.0 GB/sec Parallel File System

Computational Plant Antarctica - 2,376 Nodes Antarctica has 4 “heads” with a switchable center section –Unclassified Restricted Network –Unclassified Open Network –Classified Network Compaq (HP) DS10L “Slates” –466MHz EV6, 1GB RAM –600Mhz EV67, 1GB RAM Re-deployed Siberia XP1000 Nodes –500Mhz EV6, 256MB RAM Myrinet –3D Mesh Topology –33MHz 64bit –A mix of 1,280 and 2,000 Mbit/sec technology –LANai 7.x and 9.x Runtime Software –Yod - Application loader –Pct - Compute node process control –Bebopd - Allocation –OpenPBS - Batch scheduling –Portals Message Passing API Red Hat Linux 7.2 w/2.4.x Kernel Compaq (HP) Fortran, C, C++ MPICH over Portals

Institutional Computing Clusters Two (classified/unclassified), 256 Node Clusters in NM –236 compute nodes: –Dual 3.06GHz Xeon processors, 2GB memory –Myricom Myrinet PCI NIC (XP, REV D, 2MB) –2 Admin nodes –4 Login nodes –2 MetaData Server (NDS) nodes –12 Object Store Target (OST) nodes –256 port Myrinet Switch 128 node (unclassified) and a 64 Node (classified) Clusters in CA Compute nodes RedHat Linux 7.3 Application Directory –MKL math library –TotalView client –VampirTrace client –MPICH-GM OpenPBS client PVFS client Myrinet GM Login nodes RedHat Linux 7.3 Kerberos Intel Compilers –C, C++ –Fortran Open Source Compilers –Gcc –Java TotalView VampirTrace Myrinet GM Administrative Nodes Red Hat Linux 7.3 OpenPBS Myrinet GM w/Mapper SystemImager Ganglia Mon CAP Tripwire

Usage

Red Squall Development Cluster Hewlett Packard Collaboration –Integration, Testing, System SW support –Lustre and Quadrics Expertise RackSaver BladeRack Nodes –High Density Compute Server Architecture 66 Nodes (132 processors) per Rack –2.0GHz AMD Opteron Same as Red Storm but w/commercial Tyan motherboards –2 Gbytes of main memory per node (same as RS) Quadrics QsNetII (Elan4) Interconnect –Best in Class (commercial cluster interconnect) Performance I/O subsystem uses DDN S2A8500 Couplets with Fiber Channel Disk Drives (same as Red Storm) –Best in Class Performance Located in the new JCEL facility

Douglas Doerfler Sandia National Labs April 13th, 2004 SOS8 Charleston, SC

Red Storm Goals Balanced System Performance - CPU, Memory, Interconnect, and I/O. Usability - Functionality of hardware and software meets needs of users for Massively Parallel Computing. Scalability - System Hardware and Software scale, single cabinet system to ~20,000 processor system. Reliability - Machine stays up long enough between interrupts to make real progress on completing application run (at least 50 hours MTBI), requires full system RAS capability. Upgradability - System can be upgraded with a processor swap and additional cabinets to 100T or greater. Red/Black Switching - Capability to switch major portions of the machine between classified and unclassified computing environments. Space, Power, Cooling - High density, low power system. Price/Performance - Excellent performance per dollar, use high volume commodity parts where feasible.

Red Storm Architecture True MPP, designed to be a single system. Distributed memory MIMD parallel supercomputer. Fully connected 3-D mesh interconnect. Each compute node and service and I/O node processor has a high bandwidth, bi-directional connection to the primary communication network. 108 compute node cabinets and 10,368 compute node processors. (AMD 2.0 GHz) ~10 TB of DDR 333 MHz Red/Black switching - ~1/4, ~1/2, ~1/4. 8 Service and I/O cabinets on each end (256 processors for each color). 240 TB of disk storage (120 TB per color). Functional hardware partitioning - service and I/O nodes, compute nodes, and RAS nodes. Partitioned Operating System (OS) - LINUX on service and I/O nodes, LWK (Catamount) on compute nodes, stripped down LINUX on RAS nodes. Separate RAS and system management network (Ethernet). Router table based routing in the interconnect. Less than 2 MW total power and cooling. Less than 3,000 square feet of floor space.

Red Storm Layout Less than 2 MW total power and cooling. Less than 3,000 square feet of floor space. Separate RAS and system management network (Ethernet). 3D Mesh: 27 x 16 x 24 (x, y, z) Red/Black split 2688 : 4992 : 2688 Service & I/O: 2 x 8 x 16

Red Storm Cabinet Layout Compute Node Cabinet –3 Card Cages per Cabinet –8 Boards per Card Cage –4 Processors per Board –4 NIC/Router Chips per Board –N+1 Power Supplies –Passive Backplane Service and I/O Node Cabinet –2 Card Cages per Cabinet –8 Boards per Card Cage –2 Processors per Board –2 NIC/Router Chips per Board –PCI-X for each Processor –N+1 Power Supplies –Passive Backplane

Red Storm Software Operating Systems –LINUX on service and I/O nodes –LWK (Catamount) on compute nodes –LINUX on RAS nodes File Systems –Parallel File System - Lustre (PVFS) –Unix File System - Lustre (NFS) Run-Time System –Logarithmic loader –Node allocator –Batch system - PBS –Libraries - MPI, I/O, Math Programming Model –Message Passing –Support for Heterogeneous Applications Tools –ANSI Standard Compilers - Fortran, C, C++ –Debugger - TotalView –Performance Monitor System Management and Administration –Accounting –RAS GUI Interface –Single System View

Red Storm Performance Based on application code testing on production AMD Opteron processors we are now expecting that Red Storm will deliver around 10 X performance improvement over ASCI Red on Sandia’s suite of application codes. Expected MP-Linpack performance - ~30 TF. Processors –2.0 GHz AMD Opteron (Sledgehammer) –Integrated dual DDR memory 333 MHz Page miss latency to local processor memory is ~80 nano-seconds. Peak bandwidth of ~5.3 GB/s for each processor. –Integrated 3 Hyper Transport 3.2 GB/s each direction Interconnect performance –Latency <2 µs (neighbor) <5 µs (full machine) –Peak Link bandwidth ~3.84 GB/s each direction –Bi-section bandwidth ~2.95 TB/s Y-Z, ~4.98 TB/s X-Z, ~6.64 TB/s X-Y I/O system performance –Sustained file system bandwidth of 50 GB/s for each color. –Sustained external network bandwidth of 25 GB/s for each color.

HPC R&D Efforts at SNL Advanced Architectures –Next Generation Processor & Interconnect Technologies –Simulation and Modeling of Algorithm Performance Message Passing –Portals –Application characterization of message passing patterns Light Weight Kernels –Project to design a next-generation lightweight kernel (LWK) for compute nodes of a distributed memory massively parallel system –Assess the performance, scalability, and reliability of a lightweight kernel versus a traditional monolithic kernel –Investigate efficient methods of supporting dynamic operating system services Light Weight File System –only critical I/O functionality (storage, metadata mgmt, security) –special functionality implemented in I/O libraries (above LWFS) Light Weight OS –Linux configuration to eliminate the need of a remote /root –“Trimming” the kernel to eliminate unwanted and unnecessary daemons Cluster Management Tools –Diskless Cluster Strategies and Techniques –Operating Systems Distribution and Initialization Log Analysis to improve robustness, reliability and maintainability

More Information Computation, Computers, Information and Mathematics Center