S UMMER I NTERNSHIP Douglas Drobny Idaho National Laboratory High Performance Computing
W HO I WORKED FOR Idaho National Laboratory Idaho Falls High Performance Computing group Manages ~4 different clusters Supports and maintains software for big research progress. User Support group
C LUSTERS Fission 12,512 processors 25 TBytes of memory Icestorm 2048 processors 4 TBytes of memory Quark Eos
C OMPUTE M ANAGER Current job submissions are command line Goals Web interface for PBS Scheduler Easy to use Behaves the same as current job submissions Improved error message handling
S ETUP Application Services On the server head nodes Receive web requests Submits Jobs Compute Manager On the web server Creates web forms Sends results to App. Services Displays Results
W HAT I DID Installed compute manager and AIF on Eos Created test cases for PBS features Created test cases for User Inputs Submit feedback / bug reports with PBS Documented process for future implementations / troubleshooting
R ESULTS Good Easy to create different application forms Instant job monitoring Restrict input values Default input values Secure file transferring
R ESULTS Bad Easy to put results in insecure location Always copies the input files Missing a form entry can result in lost output files Spams the sudo log “Fixed in next version (Week after I leave)”
U PDATING HPC W IKI Moinmoin wiki (python) to Used temporary virtual machine to test update and fix issues Added support for viewing reports Deployed on hpcweb Note: Learn what type of service monitoring is being used before taking down a system.
W IKI R EPORTS Automatically generate a visual report of an XML document each month Created the XSL Putting data into charts Automation ('Right' way vs. Working way) Editing to reduce transcription errors
XSL/XML Goal: Display XSL/XML pages inside of a wikipage Problems Moinmoin uses outdated XSL library XSL can contain javascript (XSS) Solution Created a wiki macro to convert XML with a specific XSL stylesheet on the server
I NTEL C OMPILER I SSUE (ICC) Issue Compile times on Quark are much longer than Fission (head nodes) Quark should be faster (hardware wise) 17 minutes on Quark 8 minutes on Fission
I NTEL C OMPILER S TEPS Create test cases Determine effected systems Enable debugging Strace Wireshark Hardware Test Environment
ICC S OLUTION License files were resolved in the order License manager User's home directory /opt/intel /apps/intel/..../license 'Errors' in the license file cause the system to check all of the sources
ICC S OLUTION The /opt/intel license files pointed to the license manager This caused additional requests to the license manager (takes time) Quark's /opt/intel license files pointed to the license servers the most *Removed /opt/intel/license folder to fix the problem.
T HINGS L EARNED Python XSL Creating and Signing SSL Keys Unix permissions Strace Testing Refactoring Monitoring Vim!