Services for Experienced and Starting HPC Tier 3 Users (SES-HPC) Jan Steiner Zentrum für Informations- und Medientechnologie Universität Siegen
Outline Motivation Project Structure First Experiences Status 4.12.17 SES-HPC
National/ International Motivation: Tier 1 National/ International Permeability between Tier 3 and higher tiers Code development on Tier 3 – productive runs on Tier 1 Less experienced users Cheaper resources Start early Get people onto Tier 3 Facilitate movement to higher tiers Tier 2 National/Federal Tier 3 Federal/Local 4.12.17 SES-HPC
Background: Existing Infrastructure HorUS: Current Cluster at Uni Siegen Diverse users Planned future cluster Proposal underway Additional users, e.g. Big Data Good position to support users early 4.12.17 SES-HPC
Outline Motivation Project Structure First Experiences Status 4.12.17 SES-HPC
Five pillars of support Teaching and Training Performance Anlysis Third-party Code Support Tier Change Support Knowledge Transfer 4.12.17 SES-HPC
Five pillars of support Teaching and Training Beginner and advanced devs Hold classes Advise on external courses Gauge demand for new courses Performance Analysis Third-party Code Support Tier Change Support Knowledge Transfer 4.12.17 SES-HPC
Five pillars of support Teaching and Training Beginner and advanced devs Hold classes Advise on external courses Gauge demand for new courses Performance Analysis Experienced code developers Performance reviews Performance measurement tools Third-party Code Support Tier Change Support Knowledge Transfer 4.12.17 SES-HPC
Five pillars of support Teaching and Training Beginner and advanced devs Hold classes Advise on external courses Gauge demand for new courses Performance Analysis Experienced code developers Performance reviews Performance measurement tools Third-party Code Support Users of comm-ercial/open-source codes Support in finding optimal settings Find most suitable hardware Tier Change Support Knowledge Transfer 4.12.17 SES-HPC
Five pillars of support Teaching and Training Beginner and advanced devs Hold classes Advise on external courses Gauge demand for new courses Performance Analysis Experienced code developers Performance reviews Performance measurement tools Third-party Code Support Users of comm-ercial/open-source codes Support in finding optimal settings Find most suitable hardware Tier Change Support Dev teams who want to apply for higher tier hardw. Find most suitable hardware Test and evaluation of software Knowledge Transfer 4.12.17 SES-HPC
Five pillars of support Teaching and Training Beginner and advanced devs Hold classes Advise on external courses Gauge demand for new courses Performance Analysis Experienced code developers Performance reviews Performance measurement tools Third-party Code Support Users of comm-ercial/open-source codes Support in finding optimal settings Find most suitable hardware Tier Change Support Dev teams who want to apply for higher tier hardw. Find most suitable hardware Test and evaluation of software Knowledge Transfer All HPC users Establish and maintain wiki Organize networking workshops 4.12.17 SES-HPC
Outline Motivation Project Structure First Experiences Status 4.12.17 SES-HPC
Example consulting session Client: PhD student R script, runs 24 cases in sequence Script runs 4 weeks, job would often die before that Paper deadline in 4 weeks This is not to make fun of him 4.12.17 SES-HPC
Example consulting session Act 1: Troubleshooting “What did you set the walltime to?” “What’s a walltime?” ”Default then, which queue was it in?” “What’s a queue?” “OK never mind, lets have a look at your script.” 4.12.17 SES-HPC
Example consulting session Act 2: The Script Script: same calculation done 24 times just within this script If-blocks with conditions for case 1, 2, …, 24 „OK, make a shell script with a loop, and set the condition via command line argument for the R script. Then call your script with &“ „What‘s a shell script?“ 4.12.17 SES-HPC
Example consulting session Act 3: Listen, the Mensa is About To Close “Copy your script 24 times, write condition=1,2,etc. at the top and name them script_1.r, script_2.r,…“ “In the shell script, you write:” ./script_1.r & ./script_2.r & ... “Thank you so much! It runs 24 times faster now!” 4.12.17 SES-HPC
Lessons learned We let this guy on the cluster Nobody told him anything Cluster website At least google “SLURM” He ran his jobs for months I met him by sheer coincidence We actually helped him a lot 4.12.17 SES-HPC
Lessons learned Who is the bigger fool? The fool? Source: New Line Cinema Who is the bigger fool? The fool? The fool who fails to take him by his hand? 4.12.17 SES-HPC
Lessons learned Not just user’s job to inform themselves Don’t know what they don’t know Not good: “Keep away from cluster” Help but also educate (sustainability) 4.12.17 SES-HPC
Outline Motivation Project Structure First Experiences Status 4.12.17 SES-HPC
Status Interviews with all institutes that use cluster R-script guy is not completely unique Cluster website review “Getting started” section Additional feedback: Mech.-Eng. Students Little prior knowledge of Linux Seminars “Cluster Introduction”, “Linux Introduction” 4.12.17 SES-HPC
Wanted: second position Teaching and Training Beginner and advanced devs Hold classes Advise on external courses Gauge demand for new courses Performance Analysis Experienced code developers Performance reviews Performance measurement tools Third-party Code Support Users of comm-ercial/open-source codes Support in finding optimal settings Find most suitable hardware Tier Change Support Dev teams who want to apply for higher tier hardw. Find most suitable hardware Test and evaluation of software Knowledge Transfer All HPC users Establish and maintain wiki Organize networking workshops 4.12.17 SES-HPC
Thank You For Your Kind Attention. 4.12.17 SES-HPC