Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research.

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Case study IBM Bluegene/L system InfiniBand. Interconnect Family share for 06/2011 top 500 supercomputers Interconnect Family CountShare % Rmax Sum (GF)
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Beowulf Supercomputer System Lee, Jung won CS843.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Types of Parallel Computers
Information Technology Center Introduction to High Performance Computing at KFUPM.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
STRATEGIES INVOLVED IN REMOTE COMPUTATION
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
1 First-Principles Molecular Dynamics for Petascale Computers François Gygi Dept of Applied Science, UC Davis
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
BlueGene/L Facts Platform Characteristics 512-node prototype 64 rack BlueGene/L Machine Peak Performance 1.0 / 2.0 TFlops/s 180 / 360 TFlops/s Total Memory.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
The IBM Blue Gene/L System Architecture Presented by Sabri KANTAR.
1 Raspberry Pi HPC Testbed By Bradford W. Bazemore Georgia Southern University.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Processes Introduction to Operating Systems: Module 3.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Interconnection network network interface and a case study.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Parallel IO for Cluster Computing Tran, Van Hoai.
Tackling I/O Issues 1 David Race 16 March 2010.
BluesGene/L Supercomputer A System Overview Pietro Cicotti October 10, 2005 University of California, San Diego.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Architecture of Parallel Computers CSC / ECE 506 BlueGene Architecture 4/26/2007 Dr Steve Hunter.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT /13/2009.
Cross Platform Development using Software Matrix
BlueGene/L Supercomputer
Parallel I/O System for Massively Parallel Processors
QNX Technology Overview
Data Management Components for a Research Data Archive
Types of Parallel Computers
Presentation transcript:

Overview of the New Blue Gene/L Computer Dr. Richard D. Loft Deputy Director of R&D Scientific Computing Division National Center for Atmospheric Research

Outline What is Blue Gene/L and why is it interesting? How did one end up at NCAR? What is the objective of the NCAR Blue Gene/L project? What is the status of it? How do I get an account on Blue Gene/L?

Why Blue Gene/L is Interesting Features Massive parallelism - fastest in world. (137 Tflops) Achieves high packaging density. (2048 pes/rack) Lower power per processor. (25 KW/rack) Dedicated reduction network. (solver scalability) Puts network interfaces on chip. (embedded tech.) Conventional programming model: xlf90, xlcc compiler MPI

Fuel Efficiency: Gflops/Watt Top 20 systems Based on processor power rating only Blue Gene/L Systems

BG/L Questions/Limitations Questions High reliability? (1/N effect) Applications for 100k processors? (Amdahl’s Law) System robustness: I/O, scheduling flexibility. Limitations Node Memory Limitation (512 MB/node) Partitioning is quantized (power of two) Simple node kernel - (no: forks-> threads -> OMP) No support for multiple executables.

BlueGene/L ASIC

The Blue Gene/L Architecture

BlueGene/L Has Five Networks 3-Dimensional Torus –interconnects all compute nodes –175 MB/sec/link bidirectional Global Tree –point-to-point, one-to-all broadcast, reduction functionality –1.5 microsecond latency node ) Global Interrupts –AND/OR operations for global barriers –1.5 microseconds latency (64K system) Ethernet –incorporated into every node ASIC –active in the I/O nodes (1:64 in LLNL configuration) 1K 1Gbit links –all external comm. (file I/O, control, user interaction, etc.) JTAG (Control)

BlueGene/L System Software Architecture User applications execute exclusively in the compute nodes –avoid asynchronous events (e.g., daemons, interrupts) The outside world interacts only with the I/O nodes, an offload engine –standard solution: Linux Machine monitoring and control also offloaded to service nodes: large SP system or Linux cluster.

Blue Gene/L system overview

Blue NCAR

How did one get to NCAR? MRI proposal in partnership with CU’s Elements of MRI proposal to NSF: proving out an experimental architecture. –Application porting and scalability –System software testing Parallel file systems (Lustre, GPFS) Schedulers (LSF, SLURM, COBALT) –Education

BlueGene/L Collaboration NCAR CU Denver CU Boulder Blue Gene/L

BlueGene/L Collaborators NCAR –Richard Loft –Janice Coen –Stephen Thomas –Wojciech Grabowski CU Boulder –Henry Tufo –Xiao-Chuan Cai –Charbel Farhat –Thomas Manteuffel –Stephen McCormick CU Denver –Jan Mandel –Andrew Knyazev Blue Gene/L

Details of NCAR/CU Blue Gene/L 2048 processors, 5.73 Tflops peak 4.61 Tflops on Linpack Benchmark Unofficially, 33rd fastest system in the world (in one rack!) 6 Tbytes of high performance disk Delivered to Mesa Lab: March 15th Acceptance tests –began March 23rd. –Completed March 28th. –First PI meeting March 30th.

BG/L Front-End Architecture

Bring-up of Frost BG/L System Criteria for readiness –Scheduler –Fine Grain Partitions –I/O subsystem ready –MSS connection

Current “Frost” BG/L Status MSS connections in place. I/O system issues appear to be behind us. Partition definitions (512,256,128, 4x32) in place. Codes ported: POP, WRF, HOMME, BOB, BGC5 (pointwise) Biggest apps issue: memory footprint Establishing relationships with other centers –BG/L Consortium membership –Other BG/L sites: SDSC, Argonne, LLNL, Edinburgh

“Frost” BG/L I/O performance

Blue Gene/L “Frost” scheduler status IRC chat room scheduler - “hey, get off!” …done LLML SLURM scheduler -testing –has been installed, tested, available for 512 node “midplane” partitions only. –LLNL testbed system will be used to port SLURM to smaller partitions. Argonne Cobalt scheduler - being installed –DB2 Client on the FEN –Python –Elementtree (XML process library for Python) –Xerces (XML parser) –Supporting libraries (Openssl) Platform LSF - development account provided.

MRI Investigator Phase MRI Investigator access only –Users related to MRI proposal –Porting/testing evaluation Applications –HOMME atmospheric GCM dycore (Thomas) –Wildfire modeling (Coen) –Scalable solvers - algebraic multigrid (Manteuffel, McCormick) –Numerical Flight Test Simulation (Farhat) –WRF - high resolution (Hacker)

User Access to Frost Cycles split –50% UCAR –40% CU Boulder –10% CU Denver Interested users (access policy TBD) –UCAR: contact –CU: contact

Questions?