HPC University Training Meeting Welcome!! March 26-27, plementation
Goals, Objectives and Outcomes Understand community needs in relation to available resources Identify competencies and gaps in offerings Establish mechanisms to disseminate and promote quality resources Expand breadth and depth of training resources to address community needs Foster continued information sharing and collaborations Other?
Perspectives from the Field Defining HPC University - Scott HPC RAT recommendations - Laura Computational Science Competencies – Steve On-line instruction methodologies - Sandie Collaborative efforts - Leslie Petascale training gaps - Shawn Community engagement strategies - Kathy Dissemination and Quality assurance – Joiner Survey feedback - Julia
Defining HPC University Establishing competencies for skilled HPC educators, researchers, and practitioners Defining roadmap for acquiring these competencies - from K-12 to researchers Providing access to high-quality resources Broadly disseminating information about events, activities, and resources Cross-cutting among all disciplines Requires collaboration among multiple agencies and organizations for broad impact Certificate and degree-granting opportunities
Proposed Discussion Topics On-line instruction methodologies Quality assurance – VV&A Promotion, scaling and dissemination Petascale training gaps HPC Roadmap Collaboration and coordination strategies
Survey Summary Responses from 12 sites Audience: current and potential users, undergrads, grads, postdocs, industry, senior researchers, non-traditional communities, professionals, sysadmins
Short-term Goals Train users in advanced parallel programming (MPI, OpenMP and hybrid) through hands-on workshops, in-depth consulting, and knowledgeable online content to move them into tera- and petascale computing Help users learn performance tuning, optimization and scaling to peta-scale systems. ducate users to best use HPC facilities and services Beginning to advanced courses in parallel programming Facilitate effective use of high performance computing resources Disseminate knowledge of tools and application software Familiarize users with introductory grid computing strategies
Long-term Goals On-line self paced training, record all live training sessions Enhance all training events Develop next generation of top HPC researchers Prepare users for effective use of future resources Prepare people to be effective grid users Broaden participation in supercomputing amongst a variety of scientific disciplines and user communities Provide more proactive personalized help, supplemented with online resources and infrastructure more capable of responding quickly to user needs.
Selecting Training Topics Provide both ‘ getting started ’ courses and advance courses on scaling and optimization Help users to effectively use facilities: –access machines, batch systems, program to best use the available hardware, transfer data, use mass storage, performance and debugging tools, compile for performance, etc. We use our ticketing system, suggestions on training surveys and through , suggestions from researchers and topics our HPC support staff are interested in to select topics interacting with consulting and applications support staff to identify what users need based on their interactions. Depends on subject matter experts being available to develop content Feedback from workshop evaluations, new architectures and tools, and their fit with Ralph Regula School of Computational Science curricula and competencies
Selecting Topics (cont.) Topic selection and level customized for individual Job management, data management, security, workflow management systems, storage resource managers. Site admin training. Perceived need and user requests and of course instructor availability. We always have an abundance of introductory courses because we continually have new students on campus. Our clientele is mostly graduate students so the need is there.
Evaluating Impact Workshop feedback forms Annual User Survey Suggestion period at the end of each full or multi-day event Post-event evaluation forms for live events Optional online form for online tutorials Number of projects ported to the OSG grid, number of jobs run, number of papers published in which our infrastructure was used to produce results, number of new students/faculty joining our efforts, number of grid computing courses introduced at different institutions as a results of our training. Assessment responses about participants ’ new knowledge & skills that they can apply to their research after the training class
Live vs. Asych Formal training has been done as f2f events Local presentations are provided as WebEx meetings and teleconferences. We have done a few remote-only events (access grid), but they were poorly attended We try to make all presentation and lab materials available online for reference Present intermediate to advanced topics at live events and cover introductory topics and how-to programming topics on-line We hold small-group meetings with discipline specific groups to gain a greater understanding of their computational and scientific needs Developing simulation-based modules for use as curriculum in the classroom Propose to capture as many workshops, seminars, presentations and deliver asynch
New Development Additional sources of info available via the Web Asynch training via web and NCAST videos of current training New topics as our users become more sophisticated Multi-core capabilities Revising User Information website Tutorials associated with conferences and workshops put online A textbook lab training text (with exercises) More asynchronous through web technology Meeting with members of non-traditional HPC disciplines to identify requirements to bring them to the resources available, looking for ways to help them transform their science Introduction to Parallel Programming and MPI ” and “ Scientific Visualization ” using ParaView Asynchronous webcasts of training classes to broaden participation Synchronous training class via videoconference (AccessGrid, Polycom)
Major Gaps Specialized training for computational scientists running on machines vs. computer scientists Debugging, performance measurement, IO strategies, memory management, project management. taking a new user from the introductory training sessions to someone who can actually parallelize their thinking thus their code Multi-core parallelization. Getting word of HPC libraries to users. start a new project, including training on what tools/code/methods are easily available, what resource providers are accessible and how to pick one versus another, scaling Competencies Application specific guidance; i.e. it would be desired to have help available for applications in biology, chemistry, mathematics, etc. Basic and advanced parallel programming and software design Discipline specific parallel programming Real coursework at the university level
Gaps (cont.) Workshops are too few and far between. Workshop content is not delivered with a synchronous remote capability for interested participants who cannot physically attend. Workshop content is not captured for post-workshop asynchronous delivery inconsistency from one system to the next (compiler commands, for example). Each system should be pre-installed with sample code guaranteed to run on that system as well as supplemental training resources specific to that system. Online tutorials lack specificity to a particular system; sample code does not run on most. Lack of pro-active personalized support Scaling up code from tens/hundreds of cores to thousands. Scaling up code to petascale levels, cores >> Methodology: synchronous and asynchronous training Quality Assurance: accurate, verified, and validated training via synchronous and asynchronous methods Coherent and guided set of online training tutorials/modules
Gaps (cont.) C programming Fortran 90/95/2003 programming Unix and Linux as applied in HPC Parallel computing/programming Distributed & grid Data analysis & visualization
What do you want to learn? Ideas for training programs and roadmaps What training topics offered? Who coordinates and presents? Audiences? Effective methods? Is effective remote training possible and, if so, what technologies are used? Are there opportunities for collaborative training events? Ideas to improve our training so that users are best served Interest for joint development of online tutorials and for collaborating on live training Petascale computing techniques To identify new and better ways to share materials or develop materials New and different ways of making training available Understand the training priorities for other sites, and be sensitive to any political issues
RAT Report Focus Mentoring, training the trainers, becoming source and editors for CSERD Provide more details to the training map and the gaps e.g. identify multiple training paths Training the trainers Petascale computing aspects Identification of good parallel computing course V,V&A of collected training materials. Targeting underrepresented populations Capture expertise for asynchronous delivery
Whew! The scope and need is much broader than anything that can ever be accomplished via the limited funding for training Setting priorities Fostering collaboration Avoid duplication of effort Share best practices, resources, materials, etc.