1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003.

Slides:



Advertisements
Similar presentations
Cache Design and Tricks Presenters: Kevin Leung Josh Gilkerson Albert Kalim Shaz Husain.
Advertisements

1 Cray Supercomputers: The Cray X1 Sara Prochnow Kevin Boucher Brian Femiano Allen Peppler.
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Ver 0.1 Page 1 SGI Proprietary Introducing the CRAY SV1 CRAY SV1-128 SuperCluster.
Beowulf Supercomputer System Lee, Jung won CS843.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
Alternative Processors 5/22/20151 John Gustafson CEO, Massively Parallel Technologies (Former CTO, ClearSpeed)
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
"Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Cluster Computing. References HA Linux Project – Sys Admin – Load Balancing.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011.
ECMWF Slide 1CAS2K3, Annecy, 7-10 September 2003 Report from ECMWF Walter Zwieflhofer European Centre for Medium-Range Weather Forecasting.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
The Cray XC30 “Darter” System Daniel Lucio. The Darter Supercomputer.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Evolution of the NERSC SP System NERSC User Services Original Plans Phase 1 Phase 2 Programming.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Block1 Wrapping Your Nugget Around Distributed Processing.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Angèle Simard Canadian Meteorological Center Meteorological Service of Canada MSC Computing Update.
ESMF Performance Evaluation and Optimization Peggy Li(1), Samson Cheung(2), Gerhard Theurich(2), Cecelia Deluca(3) (1)Jet Propulsion Laboratory, California.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
1 CCSM Component Performance Benchmarking and Status of the CRAY X1 at ORNL Patrick H. Worley Oak Ridge National Laboratory Computing in Atmospheric Sciences.
Compiler and Tools: User Requirements from ARSC Ed Kornkven Arctic Region Supercomputing Center DSRC HPC User Forum September 10, 2009.
ROMS as a Component of the Community Climate System Model (CCSM) Enrique Curchitser, IMCS/Rutgers Kate Hedstrom, ARSC/UAF Bill Large, Mariana Vertenstein,
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Computing at Norwegian Meteorological Institute Roar Skålin Director of Information Technology Norwegian Meteorological Institute CAS.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Memory 2. Activity 1 Research / Revise what cache memory is. 5 minutes.
Tackling I/O Issues 1 David Race 16 March 2010.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
From Clustered SMPs to Clustered NUMA John M. Levesque The Advanced Computing Technology Center.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
High Performance Computing (HPC)
Compute and Storage For the Farm at Jlab
Appro Xtreme-X Supercomputers
Course Description: Parallel Computer Architecture
Online software and backups
Presentation transcript:

1 HPC and the ROMS BENCHMARK Program Kate Hedstrom August 2003

2 Outline New ARSC systems Experience with ROMS benchmark problem Other computer news

3 New ARSC Systems Cray X1 128 MSP (1.5 TFLOPS) 4 GB/MSP Water cooled IBM p690+ and p TFLOPS total At least 2 GB/cpu Air cooled Arriving in September, switch later

4 Cray X1 (klondike)

5

6 Cray Cray X1 Node Node is a 4-way SMP 16 GB/node Each MSP has four vector/scalar processors Processors in MSP share cache Node usable as 4 MSPs or 16 SSPs IEEE floating point hardware

7 Cray Programming Environment Fortran, C, C++ Support for MPI SHMEM Co-Array Fortran UPC OpenMP (Fall 2003) Compiling executes on CPES - Sun V480, happens invisibly to user

8

9 IBM Two p690+ Like our Regatta, but faster, more memory (8 GB/cpu) Shared memory between 32 cpu For big OpenMP jobs Six p655+ towers Like our SP, but faster, more memory (2 GB/cpu) Shared memory on each 8 cpu node, 92 nodes in all For big MPI jobs and small OpenMP jobs

10

11 Benchmark Problem No external files to read Three different resolutions Periodic channel representing the Antarctic Circumpolar Current (ACC) Steep bathymetry Idealized winds, clouds, etc., but full computation of atmospheric boundary layer KPP vertical mixing

12

13

14 IBM and SX6 Notes SX6 is 8 GFLOPS, Power4 is 5.2 GFLOPS peak Both less than 10% of peak IBM scales better, Cray person says SX6 is even worse for more than one node SX6 best for 1xN tiling, IBM better closer to MxM even though this problem is 512x64

15 Cray X1 Notes Have choice of MSP or SSP mode Four SSPs faster than one MSP Sixteen MSPs much faster than 64 SSPs On one MSP, vanilla ROMS spends: 66% in bulk_flux 28% in LMD 2% in 2-D engine Slower than either Power4 or SX6 Can inline lmd_wscale and vastly speed up LMD with compiler option, John Levesque has offered to rewrite bulk_flux - aim for 6-8 times faster than Power4 for CCSM

16 Clusters Can buy rack mounted turnkey systems running Linux Need to spend money on: Memory Processors - single cpu nodes may be best Switch - low latency, high bandwidth Disk storage

17 Don Morton’s Experience No such thing as turnkey Beowulf Need someone to take care of it: Configure queuing system to make it useful for more than one user Security updates Backups

18 DARPA Petaflops award Sun, IBM, Cray each awarded ~$50 million for phase-two development Two will be awarded phase 3 in 2006 Goal is to achieve petaflops by about 2010, also easier to program, more robust operating environment Sun - new switch between cpus, memory IBM - huge cache on chip Cray - heavyweight, lightweight cpus

19 Conclusions Things are still exciting in the computer industry The only thing you can count on is change