DRAFT 1 Institutional Research Computing at WSU: A community-based approach Governance model, access policy, and acquisition strategy for consideration.

Slides:



Advertisements
Similar presentations
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Advertisements

IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
How to commence the IT Modernization Process?
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Reducing Total Cost of Ownership (TCO) Mike Chon AM Computers.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Technical Review Group (TRG)Agenda 27/04/06 TRG Remit Membership Operation ICT Strategy ICT Roadmap.
Building a Cluster Support Service Implementation of the SCS Program UC Computing Services Conference Gary Jung SCS Project Manager
Academic and Research Technology (A&RT)
1 IS112 – Chapter 1 Notes Computer Organization and Programming Professor Catherine Dwyer Fall 2005.
ECM Project Roles and Responsibilities
Update from the CFI: Meeting of the Council of Chairs of Canadian Earth Science Departments Ottawa (ON), October
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Bill Wrobleski Director, Technology Infrastructure ITS Infrastructure Services.
From the IT Assessment to the IT Roadmap ( )
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
Basel Accord IITRANSITIONSERVICES Business Integration Support FCM Management Limited Paris New York Toronto.
Institutional Research Computing at WSU: Implementing a community-based approach Exploratory Workshop on the Role of High-Performance Computing in the.
Server and Short to Mid Term Storage Funding Research Computing Funding Issues.
Dr. Gerry McCartney Vice President for Information Technology and System CIO Olga Oesterle England Professor of Information Technology BETTER THAN REMOVING.
HPC at IISER Pune Neet Deo System Administrator
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Research Support Services Research Support Services.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Center for Planning and Information Technology T HE C ATHOLIC U NIVERSITY of A MERICA ERP Systems: Ongoing Support Challenges and Opportunities Copyright.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the activities of information systems professionals Describe the technical knowledge of computer.
Presentation to Senior Management Team 24 th October 2008 UCD IT Services IT Strategy
SCSC 311 Information Systems: hardware and software.
The Research Computing Center Nicholas Labello
Common Practices for Managing Small HPC Clusters Supercomputing 12
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
S&T IT Research Support 11 March, 2011 ITCC. Fast Facts Team of 4 positions 3 positions filled Focus on technical support of researchers Not “IT” for.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
INFORMATION TECHNOLOGY SERVICES University Data Center Project Overview January 11, 2010.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
CCS Overview Rene Salmon Center for Computational Science.
Tom Furlani Director, Center for Computational Research SUNY Buffalo Metrics for HPC September 30, 2010.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
ORGANIZING IT SERVICES AND PERSONNEL (PART 1) Lecture 7.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Tony Doyle - University of Glasgow Introduction. Tony Doyle - University of Glasgow 6 November 2006ScotGrid Expression of Interest Universities of Aberdeen,
Office of Core and Shared Resources Faculty Council Meeting October 9, 2012.
Information Technology Assessment Findings Presented to the colleges of the State Center Community College District.
Hopper The next step in High Performance Computing at Auburn University February 16, 2016.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Chapter 1 Computer Technology: Your Need to Know
NIIF HPC services for research and education
Elastic Cyberinfrastructure for Research Computing
Buying into “Summit” under the “Condo” model
What is HPC? High Performance Computing (HPC)
Low-Cost High-Performance Computing Via Consumer GPUs
Software Defined Storage
DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE
DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE
Low-Cost High-Performance Computing Via Consumer GPUs
Enterprise Systems Management – ESM821S
Harvard CRM Service Strategy
Enterprise Content Management Owners Representative Contract Approval
Shared Research Computing Policy Advisory Committee (SRCPAC)
Windows Server 2016 Software Defined Storage
Introduce yourself Presented by
Office of Information Technology February 16, 2016
CUBAN ICT NETWORK UNIVERSITY COOPERATION (VLIRED
H2020 EU PROJECT | Topic SC1-DTH | GA:
Presentation transcript:

DRAFT 1 Institutional Research Computing at WSU: A community-based approach Governance model, access policy, and acquisition strategy for consideration by the ITSAC Research Computing Sub-committee August 5, 2015 W ASHINGTON S TATE U NIVERSITY 1 DRAFT

DRAFT DRAFT Institutional research computing platforms managed by central IT (WSU/Pullman): 2 Current HPC platform IBM I-Dataplex (2011) WSU Kamiak (pilot) Cluster (2015 +) —Compute nodes: 32 CPU nodes (20 cores – 256 GB / 512 GB) 2 NVIDIA GPU node 1 Phi (Intel Xeon) GPU node —Large memory nodes: 2 TB RAM Server (60 cores) —Storage: NetApp File Storage (633 TB) —Network: Infiniband switch (1) 40 Gb switch (1) 10 Gb switches for network storage (4) 10 GbE switches for network storage (3) —Compute nodes: 164 CPU nodes (12 cores – 24 GB) —Large memory nodes: 3 CPU nodes (32 cores – 512 GB) —Storage: No physical local disk at compute nodes —Network: Infiniband switch (1) 40 Gb switch (1) pilot Kamiak cluster

DRAFT DRAFT 1 rack = 16 sq. ft. The WSU pilot condominium Kamiak cluster: A $1.3M / 3-rack system that balances compute and data-intensive research needs Funding: $1.3M = $0.8M (CAHNRS) + $0.5M (VPR) Funding: $1.3M = $0.8M (CAHNRS) + $0.5M (VPR) — Hardware: $1.18M — Operation: $0.12M = total for 5 years Location: Location: — WSU/Pullman: ITB/Room 1010 Operating funds: Operating funds: — 2 FTEs: — Systems administrator for HPC — IT consultant for research computing (User Support Group) Schedule: Schedule: — Procurement: January 2015 — Delivery: April 2015 — Installation and testing: January – October 2015 — Open to early adopters: October 2015 WSU institutional research computing: WSU institutional research computing: — / / Compute / Management / Storage 3

DRAFT DRAFT The WSU full-size condominium Kamiak cluster (phase 1): 9-rack system: Equipment and research grants; start-up funds; and other contributions from faculty, research staff, and academic units WSU pilot Kamiak cluster 4 Compute / Management / Storage

DRAFT DRAFT Principles of condominium research computing What ? What ? — Condominium computing provides users with shares of institutional cluster-based computer resources. How ? How ? — Institutional research computing resources are managed and administered by WSU central IT and co-located in the IT building (ITB) — Investors purchase nodes that become part of the institutional condominium cluster — Investors retain ownership of the hardware they purchase — Investors have “on-demand” access to the resources they own — Unused resources are dynamically harvested and made available to general users — Multi-tiered queuing system implements different access levels: investors, general users, etc. Who ? Who ? — Available to the entire WSU community as an institutional resource Why (benefits) ? Why (benefits) ? — Provides access to larger-scale and leveraged cyber-infrastructure for enhanced productivity (“speed of science”) — Provides mid-scale HPC resources for production runs and development testbed — Prepares for extreme-scale computing at national facilities — Enables coordinated software installation, implementation, development, and optimization — Integrates system’s administration roles and responsibilities at the institutional level — Provides higher level of user support for application domain scientists

DRAFT DRAFT 6 Sponsors, investors, and general users What ? Who ? How ? Comments Sponsors —Colleges —Academic units —Office of Research IT and research computing staffing: —Systems administrator —User support for research computing Possible contributions to cyber-infrastructure Investors(Owners) Faculty and researchers who require predictable computational availability —Purchase “menu” equipment (compute nodes, storage, etc.) —WSU/ITS purchases the nodes and deploys them in the shared infrastructure and operates them for a fixed number of years —Once installed, purchased nodes become part of the Kamiak cluster —Cost for a node is price of equipment + markup for IT systems administration and user support General users Entire WSU communitySponsored by their administrative College —Unused compute cycles in the condominium are available for general users —Access to “backfill” queue by general users can be preempted at any time by investors’ priority access Institution —Office of Provost —Office of Research —Office of Finance —WSU/ITS —Colleges Physical infrastructure: —Equipment room space —Power, cooling, etc. —Racks Possible contributions to cyber-infrastructure

DRAFT DRAFT 7 Proposed access policy What ? Who ? How ? Sponsors —Colleges —Academic units —Office of Research Investors(Owners) —Faculty and researchers who require predictable computational availability —Faculty and researchers who contribute hardware (compute nodes, storage, etc.) to the institutional shared infrastructure —Investors have “on-demand” access to their own nodes through a dedicated “batch” queue. —All jobs submitted by investors to Kamiak in the "batch" queue will run on an investor’s own nodes. —Investors have access to their dedicated “batch” queue and to the general “backfill” queue at increased priority in proportion to their investment. General users Entire WSU community—Kamiak general users can submit jobs to the "backfill" queue where they will execute on idle CPUs wherever they are in the cluster. —Kamiak implements several policies to ensure fair access. —Backfill jobs may be preempted at any time if “investors” need access to their resources. Institution —Office of Provost —Office of Research —Office of Finance —WSU/ITS —Colleges

DRAFT DRAFT 8 Establishing a Governance Board for condominium computing Purpose: Purpose: — Ensures that research computing assets (systems, cyber-infrastructure, processes, access policy, etc.) are implemented and used according to agreed-upon policies and procedures — Ensures that research computing assets are properly controlled and maintained — Ensures that research computing assets are providing value to WSU and the university’s research community. — Reviews applications for use of Kamiak resources — Arbitrates special requests for utilization of Kamiak resources Chairmanship and membership of the Kamiak Governance Board (proposal): Chairmanship and membership of the Kamiak Governance Board (proposal): — Chair: Vice President for IT Services and CIO — Membership: Investors (faculty, staff, etc.) and IT personnel

DRAFT DRAFT WSU is committing resources to establish a user support group for application software implementation, optimization, and development Establishment of a software user support group: “IT Research Computing Consultant” Establishment of a software user support group: “IT Research Computing Consultant” — Focus on research computing — Provide assistance in software installation, development, and optimization — Broad spectrum of application domains: — Materials science and engineering — Chemistry and chemical engineering — Bioinformatics — Genomics — Atmospheric research — Parallel scientific computing — Installation and management of software libraries — Development of documentation and training material for the effective use of institutional HPC resources Support: Support: — Institutional support from Colleges (CAS, CAHNRS, and VCEA) and the Office of the VPR (1 FTE) 9

DRAFT DRAFT Hardware acquisition strategy Principles of WSU’s proposed business model: Principles of WSU’s proposed business model: — Cost to users is less than purchasing and operating stand alone equipment — Cost to WSU is less than users acting independently Cost model: Cost model: — Based on 5-year lifecycle — Price ranges are driven by how systems’ administration and infrastructure costs are covered — Includes full 5-year hardware maintenance — Includes costs for most IT related infrastructure – 10GbE local network, 10GbE connection to HSSRC, FDR Infiniband, management nodes, etc. — Flexible hardware configurations on memory and CPU EquipmentSpecifications Price range ($K) 5-year Standard Compute Node Large Compute Node 2x Intel E5-2680v2, 20 total cores / 40 total threads: 256GB RAM, 400GB SSD, 10GbE, FDR Infiniband 512GB RAM, 400GB SSD, 10GbE, FDR Infiniband 7.5 – – 24 Large Memory Node 4x Intel E7-4880v2, 60 total cores / 120 total threads: 2TB RAM, 10Gb, FDR Infiniband 57 – 66 GPU Compute Node 2x Intel E5-2670v3, 24 total cores / 48 threaded cores 2 Tesla K-80 GPUs / 9984 CUDA Cores: 256GB RAM, 400GB SSD, 10GbE, FDR Infiniband 16 – TB storage SSD, 10K, 7.2K tiers 50

DRAFT DRAFT Condominium computing: Adopting best practices from successful implementations Several successful case studies in academia: Several successful case studies in academia: — Clemson University: — Purdue: — UC Berkeley: performance-computing/institutional-and-condo-computinghttp://research-it.berkeley.edu/services/high- performance-computing/institutional-and-condo-computing — UW: Successful implementation of condominium computing: Common features to all models Successful implementation of condominium computing: Common features to all models — Provides benefits to the research community. — Enables computing “at-scale” by allowing a surge in computing capability. — Reduces the time and money spent on maintaining computational resources. — Researchers have confidence in the center being able to meet all their needs. — Center establishes a proven track record of providing resources, so that researchers focus on their research without worrying about maintaining their hardware — Sustained institutional support for embedding infrastructure: initial condo system, upgrades, re-capitalization, network, support personnel, “cyber-institute”, etc. Clemson Palmetto Condo HPC Clemson Computing and Information Technology: Clemson Computing and Information Technology:  Provides cyber-infrastructure resources and HPC capabilities  Provides advanced knowledge infrastructure through integration of HPC, networks, data visualization, and storage architecture.  Palmetto condo HPC:  85 Tflops peak performance  1,541 nodes / 8 cores per node  120 TB high-performance storage  Condo owners (“tenants”) are buying "preemption units."  Preemption units give an owner job the ability to preempt general jobs if needed in order to acquire the resources needed to run and prevent the owner job itself from being preempted.  Unused owners’ resources are dynamically harvested for general users

DRAFT 12 Back-up W ASHINGTON S TATE U NIVERSITY 12 DRAFT