Download presentation
Presentation is loading. Please wait.
Published byFlorence Ball Modified over 9 years ago
1
DRAFT 1 Institutional Research Computing at WSU: A community-based approach Governance model, access policy, and acquisition strategy for consideration by the ITSAC Research Computing Sub-committee August 5, 2015 W ASHINGTON S TATE U NIVERSITY 1 DRAFT
2
DRAFT DRAFT Institutional research computing platforms managed by central IT (WSU/Pullman): http://officeofresearch.wsu.edu/researchcomputing/ 2 Current HPC platform IBM I-Dataplex (2011) WSU Kamiak (pilot) Cluster (2015 +) —Compute nodes: 32 CPU nodes (20 cores – 256 GB / 512 GB) 2 NVIDIA GPU node 1 Phi (Intel Xeon) GPU node —Large memory nodes: 2 TB RAM Server (60 cores) —Storage: NetApp File Storage (633 TB) —Network: Infiniband switch (1) 40 Gb switch (1) 10 Gb switches for network storage (4) 10 GbE switches for network storage (3) —Compute nodes: 164 CPU nodes (12 cores – 24 GB) —Large memory nodes: 3 CPU nodes (32 cores – 512 GB) —Storage: No physical local disk at compute nodes —Network: Infiniband switch (1) 40 Gb switch (1) pilot Kamiak cluster
3
DRAFT DRAFT 1 rack = 16 sq. ft. The WSU pilot condominium Kamiak cluster: A $1.3M / 3-rack system that balances compute and data-intensive research needs Funding: $1.3M = $0.8M (CAHNRS) + $0.5M (VPR) Funding: $1.3M = $0.8M (CAHNRS) + $0.5M (VPR) — Hardware: $1.18M — Operation: $0.12M = total for 5 years Location: Location: — WSU/Pullman: ITB/Room 1010 Operating funds: Operating funds: — 2 FTEs: — Systems administrator for HPC — IT consultant for research computing (User Support Group) Schedule: Schedule: — Procurement: January 2015 — Delivery: April 2015 — Installation and testing: January – October 2015 — Open to early adopters: October 2015 WSU institutional research computing: WSU institutional research computing: — http://officeofresearch.wsu.edu/researchcomputing / http://officeofresearch.wsu.edu/researchcomputing / Compute / Management / Storage 3
4
DRAFT DRAFT The WSU full-size condominium Kamiak cluster (phase 1): 9-rack system: Equipment and research grants; start-up funds; and other contributions from faculty, research staff, and academic units WSU pilot Kamiak cluster 4 Compute / Management / Storage
5
DRAFT DRAFT Principles of condominium research computing What ? What ? — Condominium computing provides users with shares of institutional cluster-based computer resources. How ? How ? — Institutional research computing resources are managed and administered by WSU central IT and co-located in the IT building (ITB) — Investors purchase nodes that become part of the institutional condominium cluster — Investors retain ownership of the hardware they purchase — Investors have “on-demand” access to the resources they own — Unused resources are dynamically harvested and made available to general users — Multi-tiered queuing system implements different access levels: investors, general users, etc. Who ? Who ? — Available to the entire WSU community as an institutional resource Why (benefits) ? Why (benefits) ? — Provides access to larger-scale and leveraged cyber-infrastructure for enhanced productivity (“speed of science”) — Provides mid-scale HPC resources for production runs and development testbed — Prepares for extreme-scale computing at national facilities — Enables coordinated software installation, implementation, development, and optimization — Integrates system’s administration roles and responsibilities at the institutional level — Provides higher level of user support for application domain scientists
6
DRAFT DRAFT 6 Sponsors, investors, and general users What ? Who ? How ? Comments Sponsors —Colleges —Academic units —Office of Research IT and research computing staffing: —Systems administrator —User support for research computing Possible contributions to cyber-infrastructure Investors(Owners) Faculty and researchers who require predictable computational availability —Purchase “menu” equipment (compute nodes, storage, etc.) —WSU/ITS purchases the nodes and deploys them in the shared infrastructure and operates them for a fixed number of years —Once installed, purchased nodes become part of the Kamiak cluster —Cost for a node is price of equipment + markup for IT systems administration and user support General users Entire WSU communitySponsored by their administrative College —Unused compute cycles in the condominium are available for general users —Access to “backfill” queue by general users can be preempted at any time by investors’ priority access Institution —Office of Provost —Office of Research —Office of Finance —WSU/ITS —Colleges Physical infrastructure: —Equipment room space —Power, cooling, etc. —Racks Possible contributions to cyber-infrastructure
7
DRAFT DRAFT 7 Proposed access policy What ? Who ? How ? Sponsors —Colleges —Academic units —Office of Research Investors(Owners) —Faculty and researchers who require predictable computational availability —Faculty and researchers who contribute hardware (compute nodes, storage, etc.) to the institutional shared infrastructure —Investors have “on-demand” access to their own nodes through a dedicated “batch” queue. —All jobs submitted by investors to Kamiak in the "batch" queue will run on an investor’s own nodes. —Investors have access to their dedicated “batch” queue and to the general “backfill” queue at increased priority in proportion to their investment. General users Entire WSU community—Kamiak general users can submit jobs to the "backfill" queue where they will execute on idle CPUs wherever they are in the cluster. —Kamiak implements several policies to ensure fair access. —Backfill jobs may be preempted at any time if “investors” need access to their resources. Institution —Office of Provost —Office of Research —Office of Finance —WSU/ITS —Colleges
8
DRAFT DRAFT 8 Establishing a Governance Board for condominium computing Purpose: Purpose: — Ensures that research computing assets (systems, cyber-infrastructure, processes, access policy, etc.) are implemented and used according to agreed-upon policies and procedures — Ensures that research computing assets are properly controlled and maintained — Ensures that research computing assets are providing value to WSU and the university’s research community. — Reviews applications for use of Kamiak resources — Arbitrates special requests for utilization of Kamiak resources Chairmanship and membership of the Kamiak Governance Board (proposal): Chairmanship and membership of the Kamiak Governance Board (proposal): — Chair: Vice President for IT Services and CIO — Membership: Investors (faculty, staff, etc.) and IT personnel
9
DRAFT DRAFT WSU is committing resources to establish a user support group for application software implementation, optimization, and development Establishment of a software user support group: “IT Research Computing Consultant” Establishment of a software user support group: “IT Research Computing Consultant” — Focus on research computing — Provide assistance in software installation, development, and optimization — Broad spectrum of application domains: — Materials science and engineering — Chemistry and chemical engineering — Bioinformatics — Genomics — Atmospheric research — Parallel scientific computing — Installation and management of software libraries — Development of documentation and training material for the effective use of institutional HPC resources Support: Support: — Institutional support from Colleges (CAS, CAHNRS, and VCEA) and the Office of the VPR (1 FTE) 9
10
DRAFT DRAFT Hardware acquisition strategy Principles of WSU’s proposed business model: Principles of WSU’s proposed business model: — Cost to users is less than purchasing and operating stand alone equipment — Cost to WSU is less than users acting independently Cost model: Cost model: — Based on 5-year lifecycle — Price ranges are driven by how systems’ administration and infrastructure costs are covered — Includes full 5-year hardware maintenance — Includes costs for most IT related infrastructure – 10GbE local network, 10GbE connection to HSSRC, FDR Infiniband, management nodes, etc. — Flexible hardware configurations on memory and CPU EquipmentSpecifications Price range ($K) 5-year Standard Compute Node Large Compute Node 2x Intel E5-2680v2, 20 total cores / 40 total threads: 256GB RAM, 400GB SSD, 10GbE, FDR Infiniband 512GB RAM, 400GB SSD, 10GbE, FDR Infiniband 7.5 – 15 17 – 24 Large Memory Node 4x Intel E7-4880v2, 60 total cores / 120 total threads: 2TB RAM, 10Gb, FDR Infiniband 57 – 66 GPU Compute Node 2x Intel E5-2670v3, 24 total cores / 48 threaded cores 2 Tesla K-80 GPUs / 9984 CUDA Cores: 256GB RAM, 400GB SSD, 10GbE, FDR Infiniband 16 – 22 150 TB storage SSD, 10K, 7.2K tiers 50
11
DRAFT DRAFT Condominium computing: Adopting best practices from successful implementations Several successful case studies in academia: Several successful case studies in academia: — Clemson University: http://www.clemson.edu/ccit/about/departments/citi/ http://www.clemson.edu/ccit/about/departments/citi/ — Purdue: https://www.rcac.purdue.eduhttps://www.rcac.purdue.edu — UC Berkeley: http://research-it.berkeley.edu/services/high- performance-computing/institutional-and-condo-computinghttp://research-it.berkeley.edu/services/high- performance-computing/institutional-and-condo-computing — UW: http://escience.washington.edu/content/hyak-0http://escience.washington.edu/content/hyak-0 Successful implementation of condominium computing: Common features to all models Successful implementation of condominium computing: Common features to all models — Provides benefits to the research community. — Enables computing “at-scale” by allowing a surge in computing capability. — Reduces the time and money spent on maintaining computational resources. — Researchers have confidence in the center being able to meet all their needs. — Center establishes a proven track record of providing resources, so that researchers focus on their research without worrying about maintaining their hardware — Sustained institutional support for embedding infrastructure: initial condo system, upgrades, re-capitalization, network, support personnel, “cyber-institute”, etc. Clemson Palmetto Condo HPC Clemson Computing and Information Technology: Clemson Computing and Information Technology: Provides cyber-infrastructure resources and HPC capabilities Provides advanced knowledge infrastructure through integration of HPC, networks, data visualization, and storage architecture. Palmetto condo HPC: 85 Tflops peak performance 1,541 nodes / 8 cores per node 120 TB high-performance storage Condo owners (“tenants”) are buying "preemption units." Preemption units give an owner job the ability to preempt general jobs if needed in order to acquire the resources needed to run and prevent the owner job itself from being preempted. Unused owners’ resources are dynamically harvested for general users
12
DRAFT 12 Back-up W ASHINGTON S TATE U NIVERSITY 12 DRAFT
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.