Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum 17-19 May 2005.

Similar presentations


Presentation on theme: "SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum 17-19 May 2005."— Presentation transcript:

1 SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum 17-19 May 2005

2 NCAR/SCD NCAR/SCD 1 50 100 200 250 300 350 150 Position Year 1996 Procurement IBM Power3 IBM Power4

3

4 SCD Update l Production HEC Computing l Mass Storage System l Services l Server Consolidation and Decommissions l Physical Facility Infrastructure Update l Future HEC at NCAR

5 News: Production Computing l Redeployed SGI 3800 as Data Analysis engine –chinook became tempest –departure of dave l IBM Power 3 blackforest decommissioned Jan 2005 –Loss of 2.0 Tflops of peak computing capacity l IBM Linux Cluster lightning joined production pool March 2005 March 2005 –Gain of 1.1 Tflops of peak computing capacity –256 processors (128 dual node configuration) –2.2 GHz AMD Opteron processors –6 TByte FastT500 RAID with GPFS –40% faster than bluesky (1.3 GHz POWER4) cluster on parallel POP and CAM simulations –3 rd party vendor compilers

6 l At the end of FY04, the combined supercomputing capacity at NCAR was ~ 11 TFLOPs l Roughly 81% of that capacity was used for climate simulation and analysis (Climate & IPCC) Resource Usage FY04

7 bluesky Workload by Facility April 2005

8 Computing Demand l Science Driving Demand for Scientific Computing Summer 2004: CSL Requests 1.5x Availability Sept 2004: NCARRequests 2x Availability Sept 2004: University Requests 3x Availability March 2005: University Requests 1.7x Availability

9 Computational Campaigns l BAMEXSpring 2003 l IPCCFY 2004 l MMM Spring Real-Time ForecastsSpring 2004 l WRF Real-Time Hurricane ForecastFall 2004 l DTC Winter Real-Time ForecastsWinter 2004-2005 l MMM Spring Real-Time ForecastSpring 2005 l MMM East Pacific Hurricane FormationJuly 2005

10 bluesky 8-way

11 bluesky 32-way

12 l SCD’s supercomputers are well utilized...... yet average job queue-wait times † are measured in hours (was minutes in ’04), not days... yet average job queue-wait times † are measured in hours (was minutes in ’04), not days Apr ’ 05 2004 Bluesky 8-way LPARs 94.6%89% Bluesky 32-way LPARs 95.8%92% Blackforest-82% Lightning48.0%- Regular Queue CSLCommunity Bluesky 8-way 43m3h34m Bluesky 32-way 1h02m49m Lightning1m Servicing the Demand NCAR Computing Facility † April 2005 average

13 Average bluesky Queue-Wait Times (HH:MM) 8-way LPARs UniversityNCAR Jan '05 Feb '05 Mar '05 Apr '05 Jan '05 Feb '05 Mar '05 Apr '05 Premium0:090:340:520:290:130:281:070:31 Regular0:573:446:242:570:219:4111:194:27 Economy1:471:121:451:004:062:403:005:44 Stand-by0:060:170:103:0210:0832:410:444:58 32-way LPARs UniversityNCAR Jan '05 Feb '05 Mar '05 Apr '05 Jan '05 Feb '05 Mar '05 Apr '05 Premium0:000:200:020:060:180:210:530:22 Regular0:571:102:300:461:031:281:420:55 Economy3:421:392:082:454:400:484:091:54 Stand-by3:367:3619:361:585:3515:5825:2832:34

14 bluesky Queue Wait Times l blackforest removed l lightning charging did not start until March 1 l Corrective (minor) actions taken: –Disallow “batch” node_usage=shared jobs n Increase utility of the “share” nodes (4 nodes, 128 pes) –Shift the “facility” split (CSL/Community) from 50/50 to 45/55 n More accurately reflects the actual allocation distribution –Reduce premium charge from 2.0x to 1.5x n Encourage use of premium if needed for critical turnaround –Have reduced NCAR 30-day allocation limit from 130% to 120% n Matches other groups (leveled playing field) l SCD is watching closely……

15 Average Compute Factor per GAU Charged Jan 1 Feb 1 Mar 1 Apr 1 May 1 2005

16 Mass Storage System

17

18 l Disk cache expanded to service files 100MB 60% of files this size being read from cache, not tape mount 60% of files this size being read from cache, not tape mount l Deployment of 200GB cartridges (previous 60 GB) –Now over 500TB of data on these cartridges –Drives provide 3x increase in transfer rate –Full silo holds 1.2 PBs 5 silos hold 6 PBs of data l Users have recently moved to single copy class of service (motivated by GAU compute charges) l Embarking on project to address future MSS growth –Manageable growth rate –User management tools (identify, remove, etc) –User access patterns / User Education (archive selectively, tar) –Compression

19 SCD Customer Support l Consistent with SCD Reorganization Reorganization l Phased Deployment Dec 2004 May 2005 Dec 2004 May 2005 l Advantages: –Enhanced service – Computer Production Group 24/7 –Effectively utilize other SCD groups in customer support –Easier questions handled sooner –Harder questions routed to correct group sooner l Feedback Plan SCD will provide a balanced set of services to enable researchers to easily and effectively utilize community resources.

20 Server Decommissions l MIGS – MSS access from remote sites –Decommission April 12, 2005 –Other contemporary methods now available l IRJE – job submittal to supers (firewall made obsolete) –Decommissioned March 21, 2005 l Front-End Server Consolidation to single new server over next few months –UCAR front-end Sun server (meeker) –UCAR front-end Linux server (longs) –Joint SCD/CSS Sun computational server (k2) –SCD front-end Sun server (niwot)

21 Physical Facility Infrastructure Update l Chilled water upgrade continues –Brings cooling up to power capacity of data center. –Startup of new chiller went flawlessly on March 15 th –May 19-22 Last planned shutdown l Stand-By Generators proved themselves again during outage March 13 th, and Xcel power drops April 29 l Design phase of planning electrical distribution upgrades to be completed by late 2005 l Risk assessment identified concerns about substation 3 –Power to data center (station is near lifetime limit) –Additional testing completed Feb. 26 th –Awaiting report

22 Future Plans for HEC at NCAR……

23 SCD Strategic Plan: High-End Computing Within the current funding envelop, achieve a 25-fold increase over current sustained computing capacity in five years. SCD intends as well to pursue opportunities for substantial additional funding for computational equipment and infrastructure to support the realization of demanding institutional science objectives. SCD will continue to investigate and acquire experimental hardware and software systems. IBM BlueGene/L 1Q2005

24 SCD Target Capacity

25 Challenges in Achieving 2006-2007 Goals l Capability vs. Capacity –Costs (price performance) –Need/Desire for Capability Computing (define!) –Balance within center of capability and capacity. How? l NCAR/SCD “fixed income” l Business Plans –Evaluating Year 5 Option with IBM –Engaging vendors to informally analyze SCD Strategic Plan for HEC –Likely to enter year-long procurement for 4Q2006 deployment of additional capacity and capability

26 Beyond 2006 l Data Center Limitations / Data Center Expansion –NCAR center limits of power/cooling/space will be reached with 2006 computing addition –New center requirements have been compiled/completed –Conceptual Design for new center is near completion –Funding options being developed with UCAR l Opportunity of NSF Petascale Computing Initiative l Commitment to balanced and sustained investment in robust cyberinfrastructure. –Supercomputing systems –Mass storage –Networking –Data Management Systems –Software Tools and Frameworks –Services and Expertise –Security

27 Scientific Computing Division Strategic Plan 2005-2009 www.scd.ucar.edu to serve the computing, research and data management needs of atmospheric and related sciences.

28 Questions


Download ppt "SCD Update Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA User Forum 17-19 May 2005."

Similar presentations


Ads by Google