Review of NCAR Al Kellie SCD Director November 01, 2001.

Slides:



Advertisements
Similar presentations
Grids and Biology: A Natural and Happy Pairing Rick Stevens Director, Mathematics and Computer Science Division Argonne National Laboratory Professor,
Advertisements

University of Illinois at Chicago The Future of STAR TAP: Enabling e-Science Research Thomas A. DeFanti Principal Investigator, STAR TAP Director, Electronic.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
C3.ca in Atlantic Canada Virendra Bhavsar Director, Advanced Computational Research Laboratory (ACRL) Faculty of Computer Science University of New Brunswick.
Front Range GigaPoP (FRGP), UCAR Point of Presence (UPoP), Bi-State Optical Network (BiSON) 1 Marla Meehl UCAR/FRGP/UPoP/BiSON Manager UCAR/NCAR
1. 2 RE Case Study What was the Project Objective? When did it all Start? What was the Plan? How did Negotiations take place? What were the Benefits of.
THE NSF BUDGET Overview of Agency Funding Processes Presented by Beth Blue National Science Foundation Office of Budget, Finance, and Award Management.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
Ver 0.1 Page 1 SGI Proprietary Introducing the CRAY SV1 CRAY SV1-128 SuperCluster.
Beowulf Supercomputer System Lee, Jung won CS843.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
Network Design and Implementation
SDSC Computing the 21st Century Talk Given to the NSF Sugar Panel May 27, 1998.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Building a Cluster Support Service Implementation of the SCS Program UC Computing Services Conference Gary Jung SCS Project Manager
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
FRGP/UPoP/BiSON 1 Marla Meehl Manager NCAR/UCAR Network Engineering & Telecommunications Section, FRGP, UPoP, BiSON CENIC Meeting – 3/11/13.
MANIT WEB HOSTING SERVICES Presented by - Sandeep Jain & Devesh Lal CRISP, Bhopal.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Server Virtualization: Navy Network Operations Centers
Software Engineering Committee Status Report to the Community: Preliminary Recommendations Richard Loft and Gerry Wiener SE Committee Co-chairs National.
and beyond Office of Vice President for Information Technology.
2005 UCAR Office of Program Annual Report Jack Fellows,UOP Director Open House. Not going over the Annual Report -- I’ll be summarizing UOP and its programs.
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
Communications Pool FY ’06 Marla Meehl Friday, 10/21/05 NETS – Network Engineering & Telecommunications Section Enterprise Services Computer Security.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
NCAR Supercomputing Center (NSC) Project Status Update to the CHAP 4 October 2007 Krista Laursen NSC Project Director.
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience.
Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
The Research Computing Center Nicholas Labello
Top Issues Facing Information Technology at UAB Sheila M. Sanders UAB Vice President Information Technology February 8, 2007.
Center for Computational Sciences O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Vision for OSC Computing and Computational Sciences
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Software Engineering Committee Status Report: Preliminary Findings and Recommendations Richard Loft and Gerry Wiener SE Committee Co-chairs National Center.
Campus Network Development Network Architecture, Universal Access & Security.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum May 2005.
Copyright © 2003 University Corporation for Atmospheric ResearchSponsored by the National Science Foundation NCAR Computing Update Tom Engel Scientific.
CCS Overview Rene Salmon Center for Computational Science.
Overview of the Texas Advanced Computing Center and International Partnerships Marcia Inger Assistant Director Development & External Relations April 26,
NCAR Undergraduate Leadership Workshop. What is NCAR?  National Center for Atmospheric Research  Located in Boulder Colorado  Managed by University.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Capability Computing – High-End Resources Wayne Pfeiffer Deputy Director NPACI & SDSC NPACI.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
August 3, March, The AC3 GRID An investment in the future of Atlantic Canadian R&D Infrastructure Dr. Virendra C. Bhavsar UNB, Fredericton.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
NCAR Realignment. What is UCAR? Structure: Non-profit consortium formed in 1959 to focus on scientific problems that are beyond the scale of a few universities.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
TeraGrid Quarterly Meeting Arlington, VA Sep 6-7, 2007 NCSA RP Status Report.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
NASA ARAC Meeting Update on Next Generation Air Transportation System May 3, 2005 Robert Pearce Deputy Director, Joint Planning & Development Office.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
National Center for Atmospheric Research Boulder, Colorado USA.
Chapter 12 The Network Development Life Cycle
NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.
Internet2 Applications & Engineering Ted Hanss Director, Applications Development.
Charlie Catlett UIUC/NCSA Starlight International Optical Network Hub (NU-Chicago) Argonne National Laboratory U Chicago IIT UIC.
Joint Techs, Columbus, OH
NSF facilities User Workshop
Network Technology Update
Fall 2006 Internet2 Member Meeting
Presentation transcript:

Review of NCAR Al Kellie SCD Director November 01, 2001

Outline of Presentation Introduction to UCAR NCAR SCD Overview of divisional activities Research data sets (Worley) Mass Storage System (Harano) Extracting model performance (Hammond) Visualization & Earth System GRiD (Middleton) Computing RFP (ARCS)

Outline of Presentation INTRODUCTION Overview of three divisional aspects Computing RFP (ARCS)

University Corporation for Atmospheric Research NCAR Tim Killeen, Director Scientific Computing Division (SCD) President Richard Anthes Al Kellie Member Institutions Board of Trustees Finance & Administration Katy Schmoll, VP Corporate Affairs Jack Fellows, VP UCAR Programs Jack Fellows, Director Constellation Observing System for Meteorology Ionosphere Climate (COSMIC) Cooperative Program for Optional Meteorology Education and Training (COMET) GPS Science and Technology Program (GST) Unidata Visiting Scientists Programs (VSP) Environmental & Societal Impacts Group (ESIG) Mesoscale & Microscale Meteorological Division (MMM) Research Applications Programs (RAP) Joint Office for Science Support (JOSS) Information Infrastructure Technology & Applications (IITA) Timothy Spangler Bill Kuo Mary Marlino Robert Harriss Robert Gall Brant Foote Randolph Ware David Fulker Meg Austin Karyn Sawyer Atmospheric Chemistry Division (ACD) Atmospheric Technology Division (ATD) Advanced Study Program (ASP) Climate & Global Dynamics Diviion (CGD) Maurice Blackmon Al Cooper David Carlson Daniel McKenna Richard Chinman Denotes President’s Office 12/07/98 Digital Library for Earth System Science (DLESE) Michael Knölker High Altitude Observatory (HAO)

Atmospheric Chemistry Dan McKenna Atmospheric Technology Dave Carlson Climate & Global Dynamics Maurice Blackmon Mesoscale & Microscale Meteorology Bob Gall High Altitude Observatory Michael Knolker Research Applications Brant Foote Scientific Computing Al Kellie NCAR Tim Killeen UCAR Rick Anthes UCAR Board of Trustees ESIG Bob Harriss ASP Al Cooper Associate Director Steve Dickson ISS K. Kelly B&P R.Brasher NCAR Organization

NCAR at a Glance  41 years; 850 Staff – 135 Scientists  $128M budget for FY2001  9 divisions and programs  Research tools, facilities, and visitor programs for the NSF and university communities

Total FY2001 funding: $128M

NCAR Peer-Reviewed Publications

NCAR Visitors

Where did SCD come from? “Blue Book” 1959 “Blue Book” Link “There are four compelling reasons for establishing a National Institute for Atmospheric Research” 2. The requirement for facilities and technological assistance beyond those that can properly be made available at individual universities

SCD Mission Enable the best atmospheric & related research, no matter where the investigator is located through the provision of high performance computing technologies and related services

Supercomputer Systems Mass Storage Systems High Performance Systems Gene Harano (13) SCIENTIFIC COMPUTING DIVISION Computational Science Steve Hammond (8) Algorithmic Software Development Model performance Research Science Collaboration Frameworks Standards & Benchmarking Data Support Roy Jenne (9) Data Archives Data Catalogs User Assistance Operations and Infrastructure Support Aaron Andersen (18) Operations Room Facility Management & Reporting Database Applications Site Licenses LAN MAN WAN Dial-up Access Network Infrastructure Ginger Caldwell (21) Training/Outreach/Consulting Digital Information Distributed Servers & Workstations Allocations & Account Management User Support Section Networking Engineering & Telecommunications Marla Meehl (25) DIRECTOR’S OFFICE Al Kellie, Director (12) Visualization & Enabling Technologies Don Middleton (12) Data Access Data Analysis Visualization Base $24,874 Ucar $4,027 Outside $2,020 Overhead $1,063

Computing Services for Research Operates two distinct computational facilities. –Climate simulations –University community Governance of these SCD resources in the hands of the users - two external allocation committees. Computing leverages a common infrastructure for access, networking, data storage & analysis, research data sets, and support services including software development, and consulting.

Climate Simulation Laboratory (CSL) The CSL is a national, multi-agency, special-use, computing facility for climate system modeling for the U.S. Global Change Research Program (USGCRP). – Priority projects that require very large amounts of computer time. CSL resources are available to U.S. individual researchers with a preference for research teams regardless of sponsorship. An inter-agency panel selects the projects that use the CSL.

Community Facility The Community Facility is used primarily by university-based NSF grantees and NCAR Scientists. – Community resources are allocated evenly between NCAR and the university community. NCAR resources are allocated by the NCAR Director to the various NCAR divisions. University resources are allocated by the SCD Advisory Panel. Open to areas of atmospheric and related sciences.

Distribution of Compute Resources

History of Supercomputing at NCAR CDC 3600 CDC 6600 CDC 7600 Cray 1-A S/N 3 Cray Y-MP/2 Cray 1-A S/N 14 TMC CM2/8192 Cray X-MP/4 Cray Y-MP/8 Cray C90/16 Cray T3D/64 TMC CM5/32 IBM RS/6000 Cluster IBM SP1/8 CCC Cray 3/ Cray Y-MP/8I Cray T3D/128 Cray J90/16 Cray J90/20 Cray J90se/24 HP SPP-2000/64 SGI Origin2000/128 Beowulf/16 IBM SP/64 IBM SP/604 Compaq ES40/36 Cluster IBM SP/32 IBM SP/296 Non-Production Machines Production Machines Currently in Production IBM SP/

2001 STK 9940 #4 #5

NCAR Wide Area Connectivity OC3 (155Mbps) to the Front Range GigaPop - OC12 (622Mbps) on 1/1/2002 –OC3 to AT&T Commodity Internet –OC3 to C&W Commodity Internet –OC3 to Abilene (OC12 on 1/1/2002) OC3 to the vBNS+ OC12 (622Mbps) to University of Colorado at Boulder –intra-site research and back-up link to FRGP OC12 to NOAA/NIST in Boulder –Intra-site research and UUNET Commodity Internet Dark fiber metropolitan area network at GigE (1000Mbps) to other NCAR campus sites

TeraGrid Wide Area Network NCSA/UIUC ANL UIC Multiple Carrier Hubs Starlight / NW Univ Ill Inst of Tech Univ of Chicago Indianapolis (Abilene NOC) I-WIRE StarLight International Optical Peering Point (see Los Angeles San Diego DTF Backbone Abilene Chicago Indianapolis Urbana OC-48 (2.5 Gb/s, Abilene) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber) Solid lines in place and/or available by October 2001 Dashed I-WIRE lines planned for summer 2002 * DENVER

ARCS Synopsis Credit: Tom Engel

ARCS RFP Overview BEST VALUE PROCUREMENT –Technical evaluation –Delivery schedule –Production disruption –Allocation ready state –Infrastructure –Maintenance –Cost impact – i.e. existing equipment –Past performance of bidders –Business proposal review –Other considerations - invitation to partner

ARCS Procurement Production-level –Availability, robust batch capacity, operational sustainability and support –Integrated software engineering and development environment High performance execution of existing applications Additionally – environment conducive to development of next-generation models

Workload profile context Jobs using > 32 nodes –0.4 % of workload –Average 44 nodes or 176 pes Jobs using < 32 nodes –99.6 % of workload –Average 6 nodes or 24 pes

ARCS – The Goal A production-level, high-performance computing system providing for both capability and capacity computing A stable and upwardly compatible system architecture, user environment, and software engineering & development environments Initial equipment: At least double current capacity at NCAR Long Term: Achieve 1 TFLOPs sustained by 2005

ARCS – The Process SCD began technical requirements draft Feb 2000 RFP process (including scientific reps from NCAR divisions, UCAR Contracts, & external review panel) formally began Mar 2000; RFP released Nov 2000 Offeror proposal reviews, BAFOs, & Supplemental proposals Jan-May 2001 Technical Evaluations, Performance projections, Risk Assessment, etc. Feb-Jun 2001 SCD Recommendation for Negotiations 21 Jun; NCAR/ UCAR acceptance of recommendation 25 Jun Negotiations Jul; tech. Ts&Cs completed 14 Aug Contract submitted to the NSF 01 Oct NSF Approval 5 Oct … Joint Press Release week SC01

ARCS RFP Technical Attributes Hardware (processors, nodes, memory, disk, interconnect, network, HIPPI) Software (OS, user environment, filesystems, batch subsystem) System admin., resource mgmt., user limits, accounting, network/HIPPI, security Documentation & training System maintenance & support services Facilities (power, cooling, space)

Major Requirements Critical Resource ratios: –Disk6 Bytes/peak-FLOP: 64+ MB/sec single-stream & 2+ GB/sec bandwidth - sustainable –Memory0.4 Bytes/peak-FLOP “Full-featured” product set (cluster-aware compilers, debuggers, performance tools, administrative tools, monitoring) Hardware & Software stability Hardware & Software vendor support & responsiveness (on-site, call center, development organization, escalation procedures) Resource allocation (processor(s), node(s), memory, disk; user limits & disk quotas) Batch Subsystem and NCAR job scheduler (BPS)

ARCS – Benchmarks (1) Kernels (Hammond Harkness, Loft) Single Processor (COPY, IA, XPOSE, SHAL, RADABS, ELEFUNT, STREAMC) Multi-processor shared memory (PSTREAM) Message-Passing Performance (XPAIR, BISECT, XGLOB, COMMS[1,2,3], STRIDED[1,2], SYNCH, ALLGATHER) Parallel Shared Memory Applications –CCM (T42 30-days & T170 1-day) – CGD, Rosinski –WRF Prototype (b_wave 5- days) - MMM, Michalakes more >

ARCS – Benchmarks (2) Parallel (MPI & Hybrid) models –CCM (T42 30-day & T170 1-day – CGD, Rosinski –MM5 3.3 (t3a 6-hr & “large” 1-hr) – MMM, Michalakes –POP 1.0 (medium & large) – CGD, Craig –MHD3D (medium & large) – HAO, Fox –MOZART2 (medium & large) – ACD, Walters –PCM 1.2 (T42) – CGD, Craig –WRF Prototype (b_wave 5- day) – MMM, Michalakes System Tests –HIPPI – SCD, Merrill –I/O-tester – SCD, Anderson –Network – SCD, Mitchell –Batch Workload includes: 2 I/O-tester, 4 Hybrid MM5 3.3 large, 2 Hybrid MM5 3.3 t3a, 2 POP 1.0 medium & large, ccm T170, MOZART2 medium, PCM 1.2 T42, 2 MHD3D medium & large, WRF Prototype – SCD, Engel < return

Risks Vendor ability to meet commitments –Hardware (processor architecture, clock speed boosts, memory architecture) –Software (OS, filesystems, processor-aware compilers/libraries, tools [3 rd party]) Service, Support, Responsiveness Vendor stability (product set, financial) Vendor promises vs. reality

Past Performance Hardware & Software –SCD/NCAR experience –Other customers’ experience “Missed Promises” –Vendor X ~ 2 yr slip, product line changes –Vendor Y ~ on target –Vendor Z ~ 1.5 yr slip, product line changes

Other Considerations “Blue Light” project invitation to develop of models for an exploratory supercomputer –Invitation to a partnership development. –Offer for an industrial partnership 256 Tflops peak, 8TB mem, 200TB disk on 64k nodes. True MPP with Torus interconnect. Node-64 Gflops, 128 MB mem, 32 kB L1 cache, 4MB L2 cache –Columbia, LLNL, SDSC, Oak Ridge

ARCS Award IBM was chosen to supply the NCAR Advanced Research Computing System (ARCS) … … will exceed the articulated purpose and goals A world-class system to provide reliable production supercomputing to the NCAR Community and Climate Simulation Laboratory A phased introduction of new, state-of-the-art computational, storage and communications technologies through the life of the contract (3-5 years) First equipment delivered Friday, 5 October

ARCS Timetable DateSystemNode TypeProcessor 3-Year Contract Oct 2001blackforest upgradeWinterhawk-2 & Nighthawk MHz POWER3-II Sep 2002bluesky with Colony Switch Regatta~1.35 GHz POWER4 Sep-Dec 2003Federation Switch Upgrade (blackforest removed after Federation acceptance) 2-Year Extension Option Sep-Dec 2004bluesky upgradeArmada~2.0 GHz POWER4-GP

ARCS Capacities DateSystemTotal Disk Capacity (TB) Total Memory (TB) Peak TFLOPs New (Total) 3-Year Contract Oct 2001blackforest upgrade (2.0) Sep 2002bluesky with Colony Switch (6.81+) Sep-Dec 2003Federation Switch Upgrade 2-Year Extension Option Sep-Dec 2004bluesky upgrade (8.75+) + Negotiated capability commitments may require installation of additional capacity. Minimum

ARCS Commitments Minimum Model Capability Commitments –blackforest upgrade1.0x (defines ‘x’) –bluesky3.1x –bluesky upgrade4.6x Failure to meet these commitments will result in IBM installing additional computational capacity Improved user environment functionality, support and problem resolution response Early access to new hardware & software technologies NCAR’s participation in IBM’s “Blue Light” exploratory supercomputer project (PFLOPs)

Proposed Equipment - IBM ARO+60Sep 2002 Nodes 164 WH2/4 5 NH2/ POWER4 MI SMP/8 Processor375 MHz POWER31.35 GHz POWER4 Interconn. TBMX 180MB/s; 22 usec Colony/NH2 Adapter † 345MB/s; 17 usec Peak TF Mem (TB) Disk (TB) System Software: PSSP/AIX, JFS/GPFS, LoadLeveler † Federation switch (2400 MB/s, 4 usec) option 2H03

ARCS Roadmap Oct ’01Oct ’02Oct ’03Oct ‘04 blackforest Upgrade bluesky Installation Federation Upgrade bluesky Upgrade bluesky 4.8+ TFLOPs peak 2.8 TB memory 21 TB GPFS disk Colony Switch 3 NH2/16pe – P3 POWER4/~1.35 GHz P4 Node/pe #s TBD ~2.0 GB memory/pe bluesky 4.8+ TFLOPs peak 2.8 TB memory 21 TB GPFS disk Federation Switch NH2 removed POWER4/~1.35 GHz P4 Node/pe #s TBD ~2.0 GB memory/pe bluesky TFLOPs peak 3.8 TB memory 65 TB GPFS disk Federation Switch - POWER4-GP/~2.0 GHz P4 Node/pe #s TBD ~3.0 memory/pe blackforest 2.0 TFLOPs peak 0.73 TB memory 10.5 TB GPFS disk TBMX Switch POWER3-II/375 MHz 315 WH2/4pe 3 NH2/16pe 512 MB memory/pe blackforest 2.0 TFLOPs peak 0.73 TB memory 10.5 TB GPFS disk TBMX Switch POWER3-II/375 MHz 315 WH2/4pe NH2 to bluesky 512 MB memory/pe blackforest 2.0 TFLOPs peak 0.73 TB memory 10.5 TB GPFS disk TBMX Switch POWER3-II/375 MHz 315 WH2/4pe MB memory/pe “TFLOP Option” SCD will likely augment bluesky with additional POWER4 nodes when blackforest is decommissioned

Thank you all for attending CAS 2001

See you all in 2003!