SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum 17-19 May 2005.

Slides:



Advertisements
Similar presentations
IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
SLA-Oriented Resource Provisioning for Cloud Computing
Building Coordinators Network Welcome. Building Coordinators Network Meeting Agenda April 13, 2006 Tornado Anniversary – Rod Lehnertz Welcome – Kelli.
Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –
State Data Center Re-scoped Projects With emphasis on reducing load on cooling systems in OB2 April 4, 2012.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
National Center for Atmospheric Research John Clyne 4/27/11 4/26/20111.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Disk and Tape Storage Cost Models Richard Moore & David Minor San Diego Supercomputer.
Fast computers, big/fast storage, fast networks Marla Meehl Manager, Network Engineering and Telecommunications, NCAR/UCAR, Manager of the Front Range.
CHAP Meeting 4 October 2007 CISL Update Operations and Services CISL HPC Advisory Panel Meeting 4 October 2007 Tom Bettge Director of Operations and Services.
SmartMeter Program Overview Jana Corey Director, Energy Information Network Pacific Gas & Electric Company.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
The Interactive Ensemble Coupled Modeling Strategy Ben Kirtman Center for Ocean-Land-Atmosphere Studies And George Mason University.
Satish Babu Best practice license models in the context of the Cloud Date: 22 October 2013 Track 2: Reduce the cost of ICT and accelerating service delivery.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
and beyond Office of Vice President for Information Technology.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
NCAR Annual Budget Review October 8, 2007 Tim Killeen NCAR Director.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
Outline IT Organization SciComp Update CNI Update
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Copyright © 2003 University Corporation for Atmospheric ResearchSponsored by the National Science Foundation NCAR Computing Update Tom Engel Scientific.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Capability Computing – High-End Resources Wayne Pfeiffer Deputy Director NPACI & SDSC NPACI.
NSF CyberInfrastructure Linkages with “IT Issues” in the FDA Critical Path Initiative Sangtae “Sang” Kim, PhD National Science Foundation* presented at.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Power and Cooling at Texas Advanced Computing Center Tommy Minyard, Ph.D. Director of Advanced Computing Systems 42 nd HPC User Forum September 8, 2011.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Next Generation Data and Computing Facility NCAR “Data Center” project An NCAR-led computing ‘facility’ for the study of the Earth system.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Tackling I/O Issues 1 David Race 16 March 2010.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Compute and Storage For the Farm at Jlab
Open-E Data Storage Software (DSS V6)
LQCD Computing Operations
Scientific Computing At Jefferson Lab
Lee Lueking D0RACE January 17, 2002
Data Management Components for a Research Data Archive
Presentation transcript:

SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum May 2005

NCAR/SCD NCAR/SCD Position Year 1996 Procurement IBM Power3 IBM Power4

SCD Update l Production HEC Computing l Mass Storage System l Services l Server Consolidation and Decommissions l Physical Facility Infrastructure Update l Future HEC at NCAR

News: Production Computing l Redeployed SGI 3800 as Data Analysis engine –chinook became tempest –departure of dave l IBM Power 3 blackforest decommissioned Jan 2005 –Loss of 2.0 Tflops of peak computing capacity l IBM Linux Cluster lightning joined production pool March 2005 March 2005 –Gain of 1.1 Tflops of peak computing capacity –256 processors (128 dual node configuration) –2.2 GHz AMD Opteron processors –6 TByte FastT500 RAID with GPFS –40% faster than bluesky (1.3 GHz POWER4) cluster on parallel POP and CAM simulations –3 rd party vendor compilers

l At the end of FY04, the combined supercomputing capacity at NCAR was ~ 11 TFLOPs l Roughly 81% of that capacity was used for climate simulation and analysis (Climate & IPCC) Resource Usage FY04

bluesky Workload by Facility April 2005

Computing Demand l Science Driving Demand for Scientific Computing Summer 2004: CSL Requests 1.5x Availability Sept 2004: NCARRequests 2x Availability Sept 2004: University Requests 3x Availability March 2005: University Requests 1.7x Availability

Computational Campaigns l BAMEXSpring 2003 l IPCCFY 2004 l MMM Spring Real-Time ForecastsSpring 2004 l WRF Real-Time Hurricane ForecastFall 2004 l DTC Winter Real-Time ForecastsWinter l MMM Spring Real-Time ForecastSpring 2005 l MMM East Pacific Hurricane FormationJuly 2005

bluesky 8-way

bluesky 32-way

l SCD’s supercomputers are well utilized yet average job queue-wait times † are measured in hours (was minutes in ’04), not days... yet average job queue-wait times † are measured in hours (was minutes in ’04), not days Apr ’ Bluesky 8-way LPARs 94.6%89% Bluesky 32-way LPARs 95.8%92% Blackforest-82% Lightning48.0%- Regular Queue CSLCommunity Bluesky 8-way 43m3h34m Bluesky 32-way 1h02m49m Lightning1m Servicing the Demand NCAR Computing Facility † April 2005 average

Average bluesky Queue-Wait Times (HH:MM) 8-way LPARs UniversityNCAR Jan '05 Feb '05 Mar '05 Apr '05 Jan '05 Feb '05 Mar '05 Apr '05 Premium0:090:340:520:290:130:281:070:31 Regular0:573:446:242:570:219:4111:194:27 Economy1:471:121:451:004:062:403:005:44 Stand-by0:060:170:103:0210:0832:410:444:58 32-way LPARs UniversityNCAR Jan '05 Feb '05 Mar '05 Apr '05 Jan '05 Feb '05 Mar '05 Apr '05 Premium0:000:200:020:060:180:210:530:22 Regular0:571:102:300:461:031:281:420:55 Economy3:421:392:082:454:400:484:091:54 Stand-by3:367:3619:361:585:3515:5825:2832:34

bluesky Queue Wait Times l blackforest removed l lightning charging did not start until March 1 l Corrective (minor) actions taken: –Disallow “batch” node_usage=shared jobs n Increase utility of the “share” nodes (4 nodes, 128 pes) –Shift the “facility” split (CSL/Community) from 50/50 to 45/55 n More accurately reflects the actual allocation distribution –Reduce premium charge from 2.0x to 1.5x n Encourage use of premium if needed for critical turnaround –Have reduced NCAR 30-day allocation limit from 130% to 120% n Matches other groups (leveled playing field) l SCD is watching closely……

Average Compute Factor per GAU Charged Jan 1 Feb 1 Mar 1 Apr 1 May

Mass Storage System

l Disk cache expanded to service files 100MB 60% of files this size being read from cache, not tape mount 60% of files this size being read from cache, not tape mount l Deployment of 200GB cartridges (previous 60 GB) –Now over 500TB of data on these cartridges –Drives provide 3x increase in transfer rate –Full silo holds 1.2 PBs 5 silos hold 6 PBs of data l Users have recently moved to single copy class of service (motivated by GAU compute charges) l Embarking on project to address future MSS growth –Manageable growth rate –User management tools (identify, remove, etc) –User access patterns / User Education (archive selectively, tar) –Compression

SCD Customer Support l Consistent with SCD Reorganization Reorganization l Phased Deployment Dec 2004 May 2005 Dec 2004 May 2005 l Advantages: –Enhanced service – Computer Production Group 24/7 –Effectively utilize other SCD groups in customer support –Easier questions handled sooner –Harder questions routed to correct group sooner l Feedback Plan SCD will provide a balanced set of services to enable researchers to easily and effectively utilize community resources.

Server Decommissions l MIGS – MSS access from remote sites –Decommission April 12, 2005 –Other contemporary methods now available l IRJE – job submittal to supers (firewall made obsolete) –Decommissioned March 21, 2005 l Front-End Server Consolidation to single new server over next few months –UCAR front-end Sun server (meeker) –UCAR front-end Linux server (longs) –Joint SCD/CSS Sun computational server (k2) –SCD front-end Sun server (niwot)

Physical Facility Infrastructure Update l Chilled water upgrade continues –Brings cooling up to power capacity of data center. –Startup of new chiller went flawlessly on March 15 th –May Last planned shutdown l Stand-By Generators proved themselves again during outage March 13 th, and Xcel power drops April 29 l Design phase of planning electrical distribution upgrades to be completed by late 2005 l Risk assessment identified concerns about substation 3 –Power to data center (station is near lifetime limit) –Additional testing completed Feb. 26 th –Awaiting report

Future Plans for HEC at NCAR……

SCD Strategic Plan: High-End Computing Within the current funding envelop, achieve a 25-fold increase over current sustained computing capacity in five years. SCD intends as well to pursue opportunities for substantial additional funding for computational equipment and infrastructure to support the realization of demanding institutional science objectives. SCD will continue to investigate and acquire experimental hardware and software systems. IBM BlueGene/L 1Q2005

SCD Target Capacity

Challenges in Achieving Goals l Capability vs. Capacity –Costs (price performance) –Need/Desire for Capability Computing (define!) –Balance within center of capability and capacity. How? l NCAR/SCD “fixed income” l Business Plans –Evaluating Year 5 Option with IBM –Engaging vendors to informally analyze SCD Strategic Plan for HEC –Likely to enter year-long procurement for 4Q2006 deployment of additional capacity and capability

Beyond 2006 l Data Center Limitations / Data Center Expansion –NCAR center limits of power/cooling/space will be reached with 2006 computing addition –New center requirements have been compiled/completed –Conceptual Design for new center is near completion –Funding options being developed with UCAR l Opportunity of NSF Petascale Computing Initiative l Commitment to balanced and sustained investment in robust cyberinfrastructure. –Supercomputing systems –Mass storage –Networking –Data Management Systems –Software Tools and Frameworks –Services and Expertise –Security

Scientific Computing Division Strategic Plan to serve the computing, research and data management needs of atmospheric and related sciences.

Questions