1 Maui High Performance Computing Center Open System Support An AFRL, MHPCC and UH Collaboration December 18, 2007 Mike McCraney MHPCC Operations Director
2 Agenda MHPCC Background and History Open System Description Scheduled and Unscheduled Maintenance Application Process Additional Information Required Summary and Q/A
3 An AFRL Center An Air Force Research Laboratory Center Operational since 1993 Managed by the University of Hawaii Subcontractor Partners – SAIC / Boeing A DoD High Performance Computing Modernization Program (HPCMP) Distributed Center Task Order Contract – Maximum Estimated Ordering Value = $181,000,000 Performance Dependent – 10 Years 4 Year Base Period with 2, 3-Year Term Awards An Air Force Research Laboratory Center Operational since 1993 Managed by the University of Hawaii Subcontractor Partners – SAIC / Boeing A DoD High Performance Computing Modernization Program (HPCMP) Distributed Center Task Order Contract – Maximum Estimated Ordering Value = $181,000,000 Performance Dependent – 10 Years 4 Year Base Period with 2, 3-Year Term Awards
4 A DoD HPCMP Distributed Center Distributed Centers Allocated Distributed Centers Army High Performance Computing Research Center (AHPCRC) Arctic Region Supercomputing Center (ARSC) Maui High Performance Computing Center (MHPCC) Space and Missile Defense Command (SMDC) Dedicated Distributed Centers ATC AFWA AEDC AFRL/IF Eglin FNMOC JFCOM/J9 Major Shared Resource Centers Aeronautical Systems Center (ASC) Army Research Laboratory (ARL) Engineer Research and Development Center (ERDC) Naval Oceanographic Office (NAVO) High Performance Computing Modernization Program Director, Defense Research and Engineering Director, DUSD (Science and Technology) DUSD NAWC-AD NAWC-CD NUWC RTTC RTTC SIMAF SIMAF SSCSD SSCSD WSMR WSMR
5 MHPCC HPC History IBM P2SC Typhoon Installed IBM P2SC IBM P3 Tempest Installed IBM Netfinity Huinalu Installed IBM P2SC Typhoon Retired IBM P4 Tempest Installed LNXi Evolocity II Koa Installed Cray XD1 Hoku Installed IBM P3 Tempest Retired IBM P4 Tempest Reassigned Dell Poweredge Jaws Installed
6 Eight, 32 processor/32GB “nodes” IBM P690 Power4 Jobs may be scheduled across nodes for a total of 288p Shared memory jobs can span up to 32p and 32GB 10TB Shared Disk available to all nodes LoadLeveler Scheduling One job per node – 32p chunks – can only support 8 simultaneous jobs Issues: Old technology, reaching end of life, upgradability issues Cost prohibitive – Power consumption constant ~$400,000 annual power cost Hurricane Configuration Summary Current Hurricane Configuration:
7 Dell Configuration Summary Proposed Shark Configuration: 40, 4 processor/8GB “nodes” Intel 3.0Ghz Dual Core Woodcrest Processors Jobs may be scheduled across nodes for a total of 160p Shared memory jobs can span up to 8p and 16GB 10TB Shared Disk available to all nodes LSF Scheduler One job per node – 8p chunks – can support up to 40 simultaneous jobs Shared use as Open system and TDS (test and development system) Much lower power cost – Intel power management System already maintained and in use System covered 24x7 UPS, generator Possible short-notice downtime Features/Issues:
8 Head Node for System Administration “Build” Nodes Running Parallel Tools – (pdsh, pdcp, etc.) SSH Communications Between Nodes Localized Infiniband Network Private Ethernet Dell Remote Access Controllers Private Ethernet Remote Power On/Off Temperature Reporting Operability Status Alarms 10 Blades Per Chassis CFS Lustre Filesystem Shared Access High Performance Using Infiniband Fabric User Webtop 3 Interactive Node s (12 cores) Head Node Simulation Engine 1280 Batch (5120 Cores) Network s DREN Network s Storage DDN 200 TB 10 Gig-E Ethernet Fibre Cisco Infiniband (Copper) Cisco 6500 Core Fibre Channel Jaws Architecture User Webtop 24 Lustre I/O Nodes, 1 MDS Gig-E nodes with 10 Gig-E uplinks. 40 nodes per uplink.
9 Systems Software Red Hat Enterprise Linux v4 – Kernel Infiniband Cisco Software stack MVAPICH – MPICH over IB Library Gnu C/C++/Fortran Intel 9.1 C/C++/Fortran Platform LSF HPC 6.2 Platform Rocks Shark Software
10 Maintenance Schedule New Proposed Schedule 8:00am – 5:00pm 2 nd and 4 th Wednesdays (as necessary) Check website for maintenance notices Current 2:00pm – 4:00pm 2 nd and 4 th Thursday (as necessary) Check website (mhpcc.hpc.mil) for maintenance notices Only take maintenance on scheduled systems Check on Mondays before submitting jobs
11 Contact Helpdesk or website for application information Documentation Needed: Account names, systems, special requirements Project title, nature of work, accessibility of code Nationality of applicant Collaborative relevance with AFRL New Requirements “Case File” information For use in AFRL research collaboration Future AFRL applicability Intellectual property shared with AFRL Annual Account Renewals September 30 is final day of the fiscal year Account Applications and Documentation
12 Summary Anticipated migration to Shark Should be more productive and able to support wide range of jobs Cutting edge technology Cost savings from Hurricane (~$400,000 annual) Stay tuned for timeline – likely end of January, early February
13 Mahalo