TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 NCSA TG RP Update 1Q07
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 CSE-Online Science Gateway Production Date: Mar 9, 2007 Developed under ITR program DAC Community Allocation MRAC Community Allocation just awarded Dedicated 4 nodes on Mercury Results from first 30 days (next slide) – Gaussian jobs running in restricted shell Changing reservation to 1 node based on results, will continue to monitor usage
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 CSE Online Utilization Dedicated 4 nodes initially, now one node Goal: improved turnaround for a large number of small jobs submitted through the gateway.
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 LEAD Science Gateway Supported Spring Weather Challenge ( forecasting contest for undergraduate atmospheric science studentswww.wxchallenge.com Feb 19-26: daily testing, 80 processors, 12pm-5pm Feb 26-April 27th: 160 processors; 12pm-5pm Monday through Thursday. Actual contest submissions started week of March 26
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 LEAD Gateway Statistics 250 jobs per week, consuming 1800 SUs/week Each workflow is 5 jobs – 250 jobs corresponds to 50 workflows Expect this to increase once issues are resolved/reliability improves LEAD Gateway typically the most or 2 nd most active gateway in terms of resources used (BIRN or GridChem are often ahead)
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 Issues Uncovered by both Science Gateways Remote job submission – great when jobs run – hard to know problems – even simple things such as planned downtime Reservation Issues – can’t overflow end of reservation when many jobs stack up (LEAD) If user assigns an obsolete project, don’t get useful error message back GridFTP striped server – one fails, all fail
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 SG Next Steps Meetings with teams to understand usage modes and issues CSE Online NCSA contingent visiting CSE Online group at Univ of Utah Apr 23 – 25 LEAD NCSA and IU RP’s setting up a date to visit LEAD group at IU
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 New Resource - Abe Abe: 1955 blade cluster 2.33 GHz Cloverton Quad-Core 1,200 blades/9,600 cores 89.5 TF; 9.6 TB RAM; 120 TB disk Perceus management; diskless boot Cisco Infiniband 2 to 1 oversubscribed Lustre over IB 8.4GB/s sustained Power/Cooling 500 KW / 140 tons TG Software deployment CTSS Inca Production date: May 2007 (anticipated) User Environment Torque/Moab Softenv Intel Compiler MPI: evaluating Intel MPI, MPICH, MVAPICH, VMI-2, etc.
TG Quarterly Meeting Breckenridge, CO Apr 11, 2007 March Allocations 25.1 M SUs (672M NUs) awarded to NCSA systems 34% of allocated resources Several large supplements coming in after the meeting Several 1M+ SU NCSA Silas Beane: 2.0M on Tungsten Ali Uzun: 2.0M on Abe Adrian Roitberg: 1.5M on Abe Thom Cheatham: 1.0M on Abe