Elastic Cyberinfrastructure for Research Computing Research Computing Services Elastic Cyberinfrastructure for Research Computing Glenn Bresnahan glenn@bu.edu Research Computing Services Boston University
Elastic Cyberinfrastructure Research Computing Services Elastic Cyberinfrastructure Framework for building sharable computational infrastructure for research computing Uses the MOC developed Hardware Isolation Layer (HIL) as a core technology Preserve the best aspects of traditional HPC/HTC computing cluster while providing the elasticity, innovation and market-driven value of the cloud
Motivation Experiences with computing environments at the MGHPCC ⃰ Research Computing Services Motivation Experiences with computing environments at the MGHPCC ⃰ Experiences with building and supporting large shared HPC computing environments Initial experiences in deploying HIL resources pools to extend and expand HPC computing environments ⃰ Massachusetts Green High Performance Computing Center
MGHPCC Computing Environment Research Computing Services MGHPCC Computing Environment MGHPCC provides shared physical infrastructure and WAN connectivity Space, pipe, power, and ping A small number of independent HPC/HTC cluster per member (BU, Harvard, MIT, NEU, UMass) Evolving shared systems: Northeast ATLAS Tier 2 (NET2) Engaging 1 funded by NSF C3DDB funded by MLSC ⃰ Northeast Storage Exchange (NESE) Mass Open Cloud ⃰ Massachusetts Life Science Center
MGHPCC Computing Environment Research Computing Services MGHPCC Computing Environment MGHPCC provides shared physical infrastructure and WAN connectivity Space, pipe, power, and ping A small number of independent HPC/HTC cluster per member (BU, Harvard, MIT, NEU, UMass) Evolving shared systems: Northeast ATLAS Tier 2 (NET2) – Traditional HPC/HTC Engaging 1 funded by NSF– Mostly Traditional HPC C3DDB funded by MLSC ⃰ – Mostly Traditional HPC Northeast Storage Exchange (NESE) – Cloud (storage) Mass Open Cloud – Cloud ⃰ Massachusetts Life Science Center
Traditional HPC clusters (BU example) Research Computing Services Traditional HPC clusters (BU example) Heterogeneous hardware: CPU, GPU, fast fabric, parallel file system, back-up/archive ~ 8000 CPU, 100K GPU, Infiniband, 3 PB GPFS, backup to campus Serve a broad base of diverse researchers university-wide 2,400 user; 524 research projects, 70 departments Extensive support for researchers Training, coding, porting, debugging, application support Multi-pronged financial model Central university support for shared core (compute, storage, infrastructure, services) Proportional contributions by campus and/or college Condo-style environment for direct researcher investment (Buy-in) Funding agency support for large-scale, multi-user enhancements Buy-n represents ~60% of SCC resources; >$2M investment
Research Computing Challenges Research Computing Services Research Computing Challenges Most research groups do not have the resources/desire to manage their own computing environment Most researchers are not from traditional computational disciplines Most research groups are small Computing in multiple environments is hard Differences in software stacks, jobs schedulers, policies Managing multiple copies of data is challenging IRB and compliance issues Support is critical Majority of IT/RCS staff effort is user support Supporting individualized computing environments is not sustainable Financial models are important Not all dollars are equal (CapEx vs OpEx) PIs represent 74 different departments and centers; traditional computational depts.: Physics, Chemistry; Astro, Engineering, Geosciences 52% of research groups have 3 or fewer users; 70 % have 5 or less
Distribution of individuals/project Research Computing Services Distribution of individuals/project
Elastic Cyberinfrastructure for Regional Research Computing Research Computing Services Elastic Cyberinfrastructure for Regional Research Computing Multiple resource pools Private HPC clusters, shared HPC clusters, MOC HIL pools, MOC IaaS pools Storage pools (e.g. NESE) Robust data center fabric Compute resources shift from HIL pools to HPC or IaaS clusters on demand Shared object storage Common data management, discovery, publishing (e.g. Dataverse)
Research Computing Services ECI Features Computing environments, including traditional Linux clusters, are elastic, expanding and contracting on demand New bare-metal or virtualized computing environments and services can be created dynamically from scratch or by stitching together existing resources and services Shared storage allows researchers to create cross-environment workflows with analysis performed on data in situ Data management, sharing, discovery, provenance and dissemination are provided as common services Financial models acknowledge practical need for multiple ownership and researcher investment. Affords development of more sophisticated economic models.
Status Current prototypes with ATLAS, NEU HIL pool, Engaging 1, C3DDB Research Computing Services Status Current prototypes with ATLAS, NEU HIL pool, Engaging 1, C3DDB NESE object storage project ramping up Funding request to enhance MGHPCC networking Funding opportunities to expand deployment