Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience.

Slides:

Advertisements

Similar presentations

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.

Advertisements

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.

System Software Environments Breakout Report June 27, 2002.

Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

Parallel Research at Illinois Parallel Everywhere

Petascale System Requirements for the Geosciences Richard Loft SCD Deputy Director for R&D.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.

1: Operating Systems Overview

OPERATING SYSTEM OVERVIEW

IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.

Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

1 First-Principles Molecular Dynamics for Petascale Computers François Gygi Dept of Applied Science, UC Davis

SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.

Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.

4 October 2007 CISL Technology Development Division Advisory Panel Update Rich Loft TDD Director.

- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.

ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.

Grid-BGC: A Grid Enabled Carbon Cycle Modeling Environment Jason Cope and Matthew Woitaszek University of Colorado, Boulder

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Tools for collaboration How to share your duck tales…

Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.

A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

Argonne Leadership Computing Facility ALCF at Argonne  Opened in 2006  Operated by the Department of Energy’s Office of Science  Located at Argonne.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

CCSM Portability and Performance, Software Engineering Challenges, and Future Targets Tony Craig National Center for Atmospheric Research Boulder, Colorado,

1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.

11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,

Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.

Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.

TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.

CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Interconnection network network interface and a case study.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

NCAR RP Update Rich Loft NCAR RPPI May 7, NCAR Teragrid RP Developments Current Cyberinfrastructure –5.7 TFlops/2048 core Blue Gene/L system –100.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Tackling I/O Issues 1 David Race 16 March 2010.

Preparation for the TeraGrid: Account Synchronization and High Performance Storage Bobby House.

HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT /13/2009.

LQCD Computing Project Overview

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Clouds , Grids and Clusters

Grid Computing.

Recap: introduction to e-science

University of Technology

Software Practices for a Performance Portable Climate System Model

BlueGene/L Supercomputer

Saranya Sriram Developer Evangelist | Microsoft

Presentation transcript:

Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006 Theron Voran

University of Colorado at Boulder / National Center for Atmospheric Research 2 Why Blue Gene?  Extreme scalability, balanced architecture, simple design  Efficient energy usage  A change from IBM Power systems at NCAR  But familiar  Programming model  Chip (similar to Power4)  Linux on front-end and IO nodes  Interesting research platform

University of Colorado at Boulder / National Center for Atmospheric Research 3 Outline  System Overview  Applications  In the Classroom  Scheduler Development  TeraGrid Integration  Other Current Research Activities

University of Colorado at Boulder / National Center for Atmospheric Research 4 Frost Fun Facts  Collaborative effort  Univ of Colorado at Boulder (CU)  NCAR  Univ of Colorado at Denver  Debuted in June 2005, tied for 58 th place on Top500  5.73 Tflops peak – 4.71 sustained  25KW loaded power usage  4 front-ends, 1 service node  6TB usable storage  Why is it leaning? Henry Tufo and Rich Loft, with Frost

University of Colorado at Boulder / National Center for Atmospheric Research 5 System Internals Blue Gene/L system on-a-chip

University of Colorado at Boulder / National Center for Atmospheric Research 6 More Details  Chips  2 cores per node  512 MB memory per node  Coprocessor vs Virtual Node  1:32 IO to Compute ratio  Interconnects  3D Torus (154 MB/s one direction)  Tree (354 MB/s)  Global Interrupt  GigE  JTAG/IDO  Storage  4 Power5 systems as GPFS cluster  NFS export to BGL IO nodes

University of Colorado at Boulder / National Center for Atmospheric Research 7 Frost Utilization

University of Colorado at Boulder / National Center for Atmospheric Research 8 HOMME  High Order Method Modeling Environment  Spectral element dynamical core  Proved scalable on other platforms  Cubed-sphere topology  Space-filling curves

University of Colorado at Boulder / National Center for Atmospheric Research 9 HOMME Performance  Ported in 2004 on BG/L prototype at TJ Watson, with eventual goal of Gordon Bell submission in 2005 Serial and parallel obstacles:  SIMD instructions  Eager vs Adaptive routing  Mapping strategies Result:  Good scalability out to 32,768 processors (3 elements per processor) Snake mapping on 8x8x8 3D torus

University of Colorado at Boulder / National Center for Atmospheric Research 10 HOMME Scalability on 32 Racks

University of Colorado at Boulder / National Center for Atmospheric Research 11 Other Applications  Popular codes on Frost  WRF  CAM, POP, CICE  MPIKAIA  EULAG  BOB  PETSc  Used as a scalability test bed, in preparation for runs on 20-rack BG/W system

University of Colorado at Boulder / National Center for Atmospheric Research 12 Classroom Access  Henry Tufo’s ‘High Performance Scientific Computing’ course at University of Colorado  Let students loose on 2048 processors  Thinking BIG  Throughput and latency studies  Scalability tests - Conway’s Game of Life  Final projects  Feedback from ‘novice’ HPC users

University of Colorado at Boulder / National Center for Atmospheric Research 13 Cobalt  Component-Based Lightweight Toolkit  Open source resource manager and scheduler  Developed by ANL along with NCAR/CU  Component Architecture  Communication via XML-RPC  Process manager, queue manager, scheduler  ~3000 lines of python code  Manages traditional clusters also

University of Colorado at Boulder / National Center for Atmospheric Research 14 Cobalt Architecture

University of Colorado at Boulder / National Center for Atmospheric Research 15 Cobalt Development Areas  Scheduler improvements  Efficient packing  Multi-rack challenges  Simulation ability  Tunable scheduling parameters  Visualization  Aid in scheduler development  Give users (and admins) better understanding of machine allocation  Accounting / project management and logging  Blue Gene/P  TeraGrid integration

University of Colorado at Boulder / National Center for Atmospheric Research 16 NCAR joins the TeraGrid, June 2006

University of Colorado at Boulder / National Center for Atmospheric Research 17 TeraGrid Testbed CSS Switch Computational Cluster Teragri d NLR CU experimenta l Storage Cluster Experimental EnvironmentProduction Environment NETS Switch Datagrid NCAR 1GBNe t NCAR 1GBNe t FRGP FRGP

University of Colorado at Boulder / National Center for Atmospheric Research 18 TeraGrid Activities  Grid-enabling Frost  Common TeraGrid Software Stack (CTSS)  Grid Resource Allocation Manager (GRAM) and Cobalt interoperability  Security infrastructure  Storage Cluster  16 OSTs, TB usable storage  10G connectivity  GPFS-WAN  Lustre-WAN

University of Colorado at Boulder / National Center for Atmospheric Research 19 Other Current Research Activities  Scalability of CCSM components  POP  CICE  Scalable solver experiments  Efficient communication mapping  Coupled climate models  Petascale parallelism  Meta-scheduling  Across sites  Cobalt vs other schedulers  Storage  PVFS2 + ZeptoOS  Lustre

University of Colorado at Boulder / National Center for Atmospheric Research 20 Frost has been a success as a …  Research experiment  Utilization rates  Educational tool  Classroom  Fertile ground for grad students  Development platform  Petascale problems  Systems work Questions?