National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,

Slides:



Advertisements
Similar presentations
DOSAR Workshop VI April 17, 2008 Louisiana Tech Site Report Michael Bryant Louisiana Tech University.
Advertisements

2. Computer Clusters for Scalable Parallel Computing
Beowulf Supercomputer System Lee, Jung won CS843.
ANL NCSA PICTURE 1 Caltech SDSC PSC 128 2p Power4 500 TB Fibre Channel SAN 256 4p Itanium2 / Myrinet 96 GeForce4 Graphics Pipes 96 2p Madison + 96 P4 Myrinet.
Introduction to Grids and Grid applications Peter Kacsuk and Gergely Sipos MTA SZTAKI
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
Simo Niskala Teemu Pasanen
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
National Center for Supercomputing Applications GLORIAD Science Applications Astronomy – Virtual Observatories Global Climate Change Richard M. Crutcher.
National Weather Service National Weather Service Central Computer System Backup System Brig. Gen. David L. Johnson, USAF (Ret.) National Oceanic and Atmospheric.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX
Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Cluster Software Overview
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC.
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Sergiu April 2006June 2006 Overview of TeraGrid Resources and Services Sergiu Sanielevici, TeraGrid Area Director for User.
Capacity Planning - Managing the hardware resources for your servers.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
University at BuffaloThe State University of New York CCR Center for Computational Research Grid Computing Overview Coordinate Computing Resources, People,
Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
Oct. 6, 1999PHENIX Comp. Mtg.1 CC-J: Progress, Prospects and PBS Shin’ya Sawada (KEK) For CCJ-WG.
The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Joint Techs, Columbus, OH
Introduction to Grid computing and the EGEE project
Jeffrey P. Gardner Pittsburgh Supercomputing Center
Is System X for Me? Cal Ribbens Computer Science Department
NCSA Supercluster Administration
Presentation transcript:

National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director, NCSA and the Alliance Chief Architect, NSF ETF TeraGrid Principal Investigator, NSF NEESgrid William and Jane Marr Gutgsell Professor University of Illinois

National Center for Supercomputing ApplicationsNational Computational Science A Blast From the Past … Everybody who has analyzed the logical theory of computers has come to the conclusion that the possibilities of computers are very interesting – if they could be made to be more complicated by several orders of magnitude. December 29, 1959 Richard Feynman Feynman would be proud! 

National Center for Supercomputing ApplicationsNational Computational Science NCSA Terascale Linux Clusters 1 TF IA-32 Pentium III cluster (Platinum) –512 1 GHz dual processor nodes –Myrinet 2000 interconnect –5 TB of RAID storage –594 GF (Linpack), production July TF IA-64 Itanium cluster (Titan) – MHz dual processor nodes –Myrinet 2000 interconnect –678 GF (Linpack), production March 2002 Large-scale calculations on both –molecular dynamics (Schulten) –first nanosecond/day calculations –gas dynamics (Woodward) –others underway via NRAC allocations Software packaging for communities –NMI GRIDS Center, Alliance “In a Box” … Lessons for TeraGrid NCSA machine room

National Center for Supercomputing ApplicationsNational Computational Science Platinum Software Configuration Linux –RedHat 6.2 and Linux SMP Kernel Open PBS –resource management and job control Maui Scheduler –advanced scheduling Argonne MPICH –parallel programming API NCSA VMI –communication middleware –MPICH and Myrinet Myricom GM –Myrinet communication layer NCSA cluster monitor IBM GPFS

National Center for Supercomputing ApplicationsNational Computational Science Session Questions Cluster performance and expectations –generally met, though with the usual hiccups MTBI and failure modes –node and disk loss (stay tuned for my next talk …) –copper Myrinet (fiber much more reliable) –avoid open house demonstrations System utilization –heavily oversubscribed (see queue delays below) Primary complaints –long batch queue delays –capacity vs. capability balance –ISV code availability –software tools –debuggers and performance tools –I/O and parallel file system performance

National Center for Supercomputing ApplicationsNational Computational Science NCSA IA-32 Cluster Timeline Jan 2001Mar 2001May 2001 Order placed with IBM 512 compute node cluster 2/23 First four racks of IBM hardware arrive 3/1 Head nodes operational 3/10 First 126 processor Myrinet test jobs 3/13 Final IBM hardware shipment 3/22 First application for compute nodes (CMS/Koranda/Litvin) 3/26 Initial Globus installation 3/26 Final Myrinet hardware arrives 3/26 First 512 processor MILC and NAMD runs 5/8 1000p MP Linpack runs 5/ processor Top GF 5/ Kernel testing 5/28 RedHat 7.1 testing Apr 2001Feb /1 Friendly user period begins June /5 Myrinet static mapping in place 4/7 CMS runs successfully 4/ processor HPL runs completing 4/12 Myricom engineering assistance July 2001 Production service

National Center for Supercomputing ApplicationsNational Computational Science NCSA Resource Usage

National Center for Supercomputing ApplicationsNational Computational Science Alliance HPC Usage Source: PACI Usage Database 0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000 FY98FY99FY00FY01FY02 Normalized CPU Hours (NU) NCSA Total Alliance Partner Total Clusters in Production

National Center for Supercomputing ApplicationsNational Computational Science Hero Cluster Jobs Platinum Titan CPU Hours

National Center for Supercomputing ApplicationsNational Computational Science Storm Scale Prediction Sample four hour forecast –Center for Analysis and Prediction of Storms –Advanced Regional Prediction System –full-physics mesoscale prediction system Execution environment –NCSA Itanium Linux Cluster –240 processors, 4 hours per night for 46 days Fort Worth forecast –four hour prediction, 3 km grid –initial state includes assimilation of –WSR-88D reflectivity and radial velocity data –surface and upper air data, satellite, and wind On-demand computing required Source: Kelvin Droegemeier Radar Forecast w/Radar 2 hr

National Center for Supercomputing ApplicationsNational Computational Science NCSA Multiphase Strategy Multiple user classes –ISV software, hero calculations –distributed resource sharing, parameter studies Four hardware approaches –shared memory multiprocessors –12 32-way IBM IBM p690 systems (2 TF peak) –large memory and ISV support –TeraGrid IPF clusters –64-bit Itanium2/Madison (10 TF peak) –SDSC, ANL, Caltech and PSC coupling –Xeon clusters –32-bit systems for hero calculations –dedicated sub-clusters (2-3 TF each) –allocated for weeks –Condor resource pools –parameter studies and load sharing

National Center for Supercomputing ApplicationsNational Computational Science Extensible TeraGrid Facility (ETF) NCSA: Compute IntensiveSDSC: Data IntensivePSC: Compute Intensive IA64 Pwr4 EV68 IA32 EV7 IA64 Sun 10 TF IA large memory nodes 230 TB Disk Storage 3 PB Tape Storage GPFS and data mining 4 TF IA-64 DB2, Oracle Servers 500 TB Disk Storage 6 PB Tape Storage 1.1 TF Power4 6 TF EV68 71 TB Storage 0.3 TF EV7 shared-memory 150 TB Storage Server 1.25 TF IA Viz nodes 20 TB Storage 0.4 TF IA-64 IA32 Datawulf 80 TB Storage Extensible Backplane Network LA Hub Chicago Hub IA32 Storage Server Disk Storage Cluster Shared Memory Visualization Cluster LEGEND 30 Gb/s IA64 30 Gb/s Sun ANL: VisualizationCaltech: Data collection analysis 40 Gb/s Backplane Router

National Center for Supercomputing ApplicationsNational Computational Science NCSA TeraGrid: 10 TF IPF and 230 TB GbE Fabric Myrinet Fabric 2p 1 GHz 4 or 12 GB memory 73 GB scratch Brocade Switches 256 2x FC 2 TF Itanium2 256 nodes 2p Madison 4 GB memory 73 GB scratch 2p Madison 4 GB memory 73 GB scratch ~700 Madison nodes Storage I/O over Myrinet and/or GbE 230 TB Interactive+Spare Nodes Login, FTP 10 2p Itanium2 Nodes 10 2p Madison Nodes TeraGrid Network 2p Madison 4 GB memory 73 GB scratch Being Installed Now