HPC Operating Committee Spring 2019 Meeting

Slides:



Advertisements
Similar presentations
May 12, 2015 XSEDE New User Tutorials and User Support: Lessons Learned Marcela Madrid.
Advertisements

Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.
WEST VIRGINIA UNIVERSITY HPC and Scientific Computing AN OVERVIEW OF HIGH PERFORMANCE COMPUTING RESOURCES AT WVU.
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Cluster Components Compute Server Disk Storage Image Server.
Project Overview:. Longhorn Project Overview Project Program: –NSF XD Vis Purpose: –Provide remote interactive visualization and data analysis services.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Research Support Services Research Support Services.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.
PHP With Oracle 11g XE By Shyam Gurram Eastern Illinois University.
Your First Azure Application Michael Stiefel Reliable Software, Inc.
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
Spring 2011 CIS 4911 Senior Project Catalog Description: Students work on faculty supervised projects in teams of up to 5 members to design and implement.
Yeti Operations INTRODUCTION AND DAY 1 SETTINGS. Rob Lane HPC Support Research Computing Services CUIT
Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.
Hotfoot HPC Cluster March 31, Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Yeti Operations Committee MARCH MEETING. Agenda 1.Usage Report 2.Downtime 3.Research Publications 4.Rentals 5.Free Tier 6.Next Purchase Round 7.Issues?
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
ASTRA Update Sunflower Project Statewide Management, Accounting and Reporting Tool (SMART) February 12, 2009.
Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.
Introduction to Hartree Centre Resources: IBM iDataPlex Cluster and Training Workstations Rob Allan Scientific Computing Department STFC Daresbury Laboratory.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Introduction to HPC Workshop March 1 st, Introduction George Garrett & The HPC Support Team Research Computing Services CUIT.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Ben Rogers August 18,  High Performance Computing  Data Storage  Hadoop Pilot  Secure Remote Desktop  Training Opportunities  Grant Collaboration.
Advanced Computing Facility Introduction
High Performance Computing (HPC)
Workstations & Thin Clients
HCC Fall KickStart University of Nebraska – Lincoln Holland Computing Center David Swanson, Emelie Harstad, Jingchao Zhang, Adam Caprez, Derek Weitzel,
INFORMATION TECHNOLOGY NEW USER ORIENTATION
Buying into “Summit” under the “Condo” model
What is HPC? High Performance Computing (HPC)
HPC usage and software packages
Low-Cost High-Performance Computing Via Consumer GPUs
Project Center Use Cases Revision 2
Deploying ArcGIS for Water
Working With Azure Batch AI
Windows Server* 2016 & Intel® Technologies
Heterogeneous Computation Team HybriLIT
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Tools and Services Workshop Overview of Atmosphere
Windows Server* 2016 & Intel® Technologies
Architecture & System Overview
Introduction to XSEDE Resources HPC Workshop 08/21/2017
Low-Cost High-Performance Computing Via Consumer GPUs
Project Center Use Cases Revision 3
Intel® network builders university
Project Center Use Cases Revision 3
Introduction to HPC Workshop
Shared Research Computing Policy Advisory Committee (SRCPAC)
IT for Students Need IT Help? Visit
Alan Chalker and Eric Franz Ohio Supercomputer Center
Technology Resources for Students
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Advanced Computing Facility Introduction
Cornell Theory Center Cornell Theory Center (CTC) is a high-performance computing and interdisciplinary research center at Cornell.
High Performance Computing in Bioinformatics
INFORMATION TECHNOLOGY NEW USER ORIENTATION
INFORMATION TECHNOLOGY NEW USER ORIENTATION
Division of Engineering Computing Services
Introduction to research computing using Condor
Presentation transcript:

HPC Operating Committee Spring 2019 Meeting March 11, 2019 Meeting Called By: Kyle Mandli, Chair

Introduction George Garrett Manager, Research Computing Services gsg8@columbia.edu The HPC Support Team (Research Computing Services) hpc-support@columbia.edu

Agenda HPC Clusters - Overview and Usage HPC Updates Business Rules Terremoto, Habanero, Yeti HPC Updates Challenges and Possible Solutions Software Update Data Center Cooling Expansion Update Business Rules Support Services and Training HPC Publications Reporting Feedback

High Performance Computing - Ways to Participate Four Ways to Participate Purchase Rent Free Tier Education Tier

Launched in December 2018!

Terremoto - Participation and Usage 24 research groups 190 users 2.1 million core hours utilized 5 year lifetime

Terremoto - Specifications 110 Compute Nodes (2640 cores) 92 Standard nodes (192 GB) 10 High Memory nodes (768 GB) 8 GPU nodes with 2 x NVIDIA V100 GPUs 430 TB storage (Data Direct Networks GPFS GS7K) 255 TFLOPS of processing power Dell Hardware, Dual Skylake Gold 6126 cpus, 2.6 Ghz, AVX-512 100 Gb/s EDR Infiniband, 480 GB SSD drives Slurm job scheduler

Terremoto - Cluster Usage in Core Hours Max Theoretical Core Hours Per Day = 63,360

Terremoto - Job Size Cores 1 - 49 cores 50 - 249 cores 250 - 499 cores 28,274 173 10 --

Terremoto - Benchmarks High Performance LINPACK (HPL) measures compute performance and is used to build the TOP500 list of supercomputers. HPL is now run automatically when our HPC nodes start up. Terremoto CPU Gigaflops per node = 1210 Gigaflops Skylake 2.6 Ghz CPU, AVX-512 advanced vector extensions HPL benchmark runs 44% faster than Habanero Rebuild your code, when possible with Intel Compiler! Habanero CPU Gigaflops per node = 840 Gigaflops Broadwell 2.2 Ghz CPU, AVX-2

Terremoto - Expansion Coming Soon Terremoto 2019 HPC Expansion Round In planning stages - announcement to be sent out in April No RFP. Same CPUs as Terremoto 1st round. Newer model of GPUs. Purchase round to commence late Spring 2019 Go-live in Late Fall 2019 If you are aware of potential demand, including new faculty recruits who may be interested, please contact us at rcs@columbia.edu

Habanero

Habanero - Specifications Specs 302 nodes (7248 cores) after expansion 234 Standard servers 41 High memory servers 27 GPU servers 740 TB storage (DDN GS7K GPFS) 397 TFLOPS of processing power Lifespan 222 nodes expire December 2020 80 nodes expire December 2021

Head Nodes 2 Submit nodes Submit jobs to compute nodes 2 Data Transfer nodes (10 Gb) scp, rdist, Globus 2 Management nodes Bright Cluster Manager, Slurm

HPC - Visualization Server Remote GUI access to Habanero storage Reduce need to download data Same configuration as GPU node (2 x K80) NICE Desktop Cloud Visualization software

Habanero - Participation and Usage 44 groups 1,550 users 9 renters 160 free tier users Education tier 15 courses since launch

Habanero - Cluster Usage in Core Hours Max Theoretical Core Hours Per Day = 174,528

Habanero - Job Size Cores 1 - 49 cores 50 - 249 cores 250 - 499 cores 777,771 1927 869 277 185

HPC - Recent Challenges and Possible Solutions Complex software stacks Time consuming to install due to many dependencies and incompatibilities with existing software Solution: Singularity containers (see following slide) Login node(s) occasionally overloaded Solutions: Training users to use Interactive Jobs or Transfer nodes Stricter cpu, memory, and IO limits on login nodes Remove common applications from login nodes

HPC Updates - Singularity containers Easy to use, secure containers for HPC Supports HPC networks and accelerators (Infiniband, MPI, GPUs) Enables reproducibility and complex software stack setup Typical use cases Instant deployment of complex software stacks (OpenFOAM, Genomics, Tensorflow) Bring your own container (use on Laptop, HPC, etc.) Available now on both Terremoto and Habanero! $ module load singularity

HPC Updates - HPC Web Portal - In progress Open OnDemand HPC Web Portal (In progress) Supercomputing, seamlessly, open, interactive HPC via the Web. Modernization of the HPC user experience. Open source, NSF funded project. Makes compute resources much more accessible to a broader audience. https://openondemand.org Coming Spring 2019

Yeti Cluster - Retired Yeti Round 1 retired November 2017 Yeti Round 2 retired March 2019

Data Center Cooling Expansion Update A&S, SEAS, EVPR, and CUIT contributed to expand Data Center cooling capacity Data Center cooling expansion project almost complete Expected completion Spring 2019 Assures HPC capacity for next several generations

Business Rules Business rules set by HPC Operating Committee Typically meets twice a year, open to all Rules that require revision can be adjusted If you have special requests, i.e. longer walltime or temporary bump in priority or resources, contact us and we will raise with the HPC OC chair as needed

Nodes For each account there are three types of execute nodes Nodes owned by the account Nodes owned by other accounts Public nodes

Nodes Nodes owned by the account Fewest restrictions Priority access for node owners

Nodes Nodes owned by other accounts Most restrictions Priority access for node owners

Nodes Public nodes Few restrictions No priority access Habanero Public nodes: 25 total (19 Standard, 3 High Mem, 3 GPU) Terremoto Public nodes: 7 total (4 Standard, 1 High Mem, 2 GPU) Public nodes: 3 GPU nodes, 3 High Memory nodes, 19 Standard nodes

Job wall time limits Your maximum wall time is 5 days on nodes your group owns and on public nodes Your maximum wall time on other group's nodes is 12 hours

12 Hour Rule If your job asks for 12 hours of walltime or less, it can run on any node If your job asks for more than 12 hours of walltime, it can only run on nodes owned by its own account or public nodes

Fair share Every job is assigned a priority Two most important factors in priority Target share Recent use

Target Share Determined by number of nodes owned by account All members of account have same target share

Recent Use Number of cores*hours used "recently" Calculated at group and user level Recent use counts for more than past use Half-life weight currently set to two weeks

Job Priority If recent use is less than target share, job priority goes up If recent use is more than target share, job priority goes down Recalculated every scheduling iteration

Questions regarding business rules?

Support Services - HPC Team 6 staff members on the HPC Team Manager 3 Sr. Systems Engineers (System Admin and Tier 2 support) 2 Sr. Systems Analysts (Primary providers of Tier 1 support) 951 Support Ticket Requests Closed in last 12 months Common requests: adding new users, simple and complex software installation, job scheduling and business rule inquiries

Support Services Email support hpc-support@columbia.edu

User Documentation hpc.cc.columbia.edu Click on "Terremoto documentation" or "Habanero documentation"

Office Hours HPC support staff are available to answer your HPC questions in person on the first Monday of each month. Where: North West Corner Building, Science & Eng. Library When: 3-5 pm first Monday of the month RSVP required: https://goo.gl/forms/v2EViPPUEXxTRMTX2

Workshops Spring 2019 - Intro to HPC Workshop Series Tue 2/26: Intro to Linux Tue 3/5: Intro to Shell Scripting Tue 3/12: Intro to HPC Workshops are held in the Science & Engineering Library in the North West Corner Building. For a listing of all upcoming events and to register, please visit: https://rcfoundations.research.columbia.edu/

Group Information Sessions HPC support staff can come and talk to your group Topics can be general and introductory or tailored to your group. Talk to us or contact hpc-support@columbia.edu to schedule a session.

Additional Workshops, Events, and Trainings Upcoming Foundations of Research Computing Events Including Bootcamps (Python, R, Unix, Git) Python User group meetings (Butler Library) Distinguished Lecture series Research Data Services (Libraries) Data Club (Python, R) Map Club (GIS)

Questions about support services or training?

HPC Publications Reporting Research conducted on Terremoto, Habanero, Yeti, and Hotfoot machines has led to over 100 peer-reviewed publications in top-tier research journals. Reporting publications is critical for demonstrating to University leadership the utility of supporting research computing. To report new publications utilizing one or more of these machines, please email srcpac@columbia.edu

Feedback? What feedback do you have about your experience with Habanero and/or Terremoto?

User support: hpc-support@columbia.edu End of Slides Questions? User support: hpc-support@columbia.edu