University of California Governance Models for Research Computing Western Educause Conference San Francisco May 2007
Copyright University of California This work is the intellectual property of the Regents of the University of California. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the Regents. To disseminate otherwise or to republish requires written permission from the Regents.
Presenters David Walker, UC Office of the President Director, Advanced Technologies Information Resources & Communications Heidi Schmidt, UC San Francisco Director, Customer Support Services Office of Academic & Administrative Information Systems Ann Dobson, UC Berkeley Associate Director, Client Services Information Services & Technology Jason Crane, PhD, UC San Francisco Programmer Analyst Radiology
Perspectives System-wide initiatives Campus models Central campus IT services Shared research computing facility
UC-Wide Activities in Support of Research and Scholarship David Walker Office of the President University of California
Information Technology Guidance Committee (ITGC) Identify strategic directions for IT investments that enable campuses to meet their distinctive needs more effectively while supporting the University’s broader mission, academic programs and strategic goals. Promote the deployment of information technology services to support innovation and the enhancement of academic quality and institutional competitiveness. Leverage IT investment and expertise to fully exploit collective and campus-specific IT capabilities.
Planning Inputs Broad consultation with: UC stakeholders Campus and system wide governing bodies Coordination with related academic and administrative planning processes Environmental scans and competitive analysis
ITGC Timetable Launch the ITGC Feb, 2006 Interim work group reports Nov, 2006 Summary report to Provost Jun, 2007 Review and comment Oct, 2007 Presentations to President, Nov, 2007 COC, Regents, Academic Council
Areas Addressed by the ITGC Research and Scholarship Teaching, Learning, and Student Experience University-Wide Administrative and Business Systems Critical Success Factors (e.g., common architecture, end-user support, collaboration infrastructure)
Potential Recommendations for Research and Scholarship Advanced Network Services UC Grid Academic Cyberinfrastructure
Advanced Network Services Upgrade all campus routed Internet connections to 10 Gbps Pilot new network services Non-routed interconnects Lightpath-based, application-dedicated bandwidth End-to-end performance tools, instrumentation, and support
UC Grid Enable resource sharing, based on UCTrust Implement comprehensive storage services Large-scale computation, project collaboration, (very) long-term preservation Explore UC-provided resources Base-level compute and storage Data center space Support services
Academic Cyberinfrastructure Ubiquitous access to services critical to research, scholarship, and instruction Collaboration tools and services Tools for creation and dissemination of electronic information Digital preservation Grant application / administration tools End-user support services
UCTrust A unified identity and access management infrastructure for the University of California Based on InCommon and Shibboleth
More Information IT Guidance Committee UCTrust
Campus Governance Models Heidi Schmidt University of California, San Francisco Office of Academic & Administrative Information Systems
University of California = Diversity Campus governance bodies that may influence research computing include: Academic Senate IT advisory boards & committees Research advisory boards & committees Discipline-based governance groups
Campus-wide Perspectives Ann Dobson University of California, Berkeley Information Services & Technology
UC Berkeley Desire to provide central services Desire to meet needs of less technical, less resource-rich researchers (e.g. social scientists) Tension between one-time grant funding and ongoing expenses Desire to optimize use of resources Need commodity model (one size fits all) If we build it, will they come?
Requirements for LBNL Clusters Systems in the SCS Program must meet the following requirements to be eligible for support: IA32 or AMD64 architecture Participating cluster must have a minimum of 8 compute nodes Dedicated cluster architecture. No interactive logins on compute nodes Red Hat Linux operating system Warewulf cluster implementation toolkit Sun Grid Engine scheduler All slave nodes only reachable from master node Clusters that will be located in the Computer room must meet the following additional requirements Rack mounted hardware required. Desktop form factor hardware not allowed Equipment to be installed into APC Netshelter VX computer racks. Prospective cluster owners should include the cost of these racks into their budget
General Purpose Cluster Hardware donated by Sun 29 Sun v20z servers (1 head node, 28 compute nodes) 2 cpu, 2 gig RAM, 1 73-gig hard drive Gigabit ethernet interconnect NFS file storage on SAN Housed in central campus data center
General Purpose Cluster (cont.) Cluster management provided by LBNL’s Cluster Team Operating system: Centos-4.4 (x86_64) MPI Version: Open-MPI-1.1 Scheduler: Torque Compiler: GCC and GCC Cluster provided on a recharge basis
Collocation and Support Collocation in central data center $8/RU/month + power charge Cluster management support Varies based on number of nodes About $1500/month for 30-node cluster Assistance in preparing grant requests
Audience Poll Does your campus provide central research computing facilities? Are these services provided on a recharge basis? Are these services centrally funded?
Departmental Clusters Survey revealed 24 clusters Half are in EECS Data center space provided at no cost Grant funding for support FTE Hardware from donations or from grants Charge for storage, network connections Others a variety: biology, geography, statistics, space science, optometry, seismology Intel or AMD, many flavors of Linux
Departmental Clusters (cont.) Chemistry Model Chemistry provides machine room space Chemistry FTE helps configure and get started PI must have grad student Sys Admin 4-5 clusters owned by faculty, support the research of 5-10 grad students All Linux running ROCS or Warewulf
Audience Poll Do departments on your campus provide research computing support to their PI’s? On a recharge basis? Subsidized?
Other UC Campuses/Labs UC San Francisco Completely decentralized UC Irvine Research Computing Support Group (1.6 FTE) Data center space ($200/month/rack) Shared clusters for researchers and grad students High speed networking Backup service System administration on recharge basis
Other UC Campuses/Labs (cont.) UCLA Data center space High speed networking Shared clusters Storage Cluster hosting and management Charge for in-depth consulting, long-term projects, and nominal one-time node charge
Other UC Campuses/Labs (cont.) LBNL Data center space High speed networking 3 FTE Pre-purchase consulting and procurement assistance Setup and configuration System administration and cybersecurity Charge for incremental costs to support clusters
Other UC Campuses/Labs (cont.) UC Riverside Data center space Funds for seed clusters Researchers without funds to buy own Researchers with ability to purchase but will use central service Ongoing support of systems will be recharged
Other UC Campuses/Labs (cont.) UC San Diego Decentralized Services on Recharge Basis Server room space Hosting System Administration Consulting Central Services: Network Infrastructure Supported by “knowledge worker” fee San Diego Supercomputer Center
Audience Poll Does your campus have a “knowledge worker” fee?
Challenges Provide a useful central resource Optimize use of clusters Encourage PIs to use central resources even if it costs money Develop funding model that works well with grants
Shared Research Computing Jason Crane, PhD University of California, San Francisco Department of Radiology
Case Study: UCSF Radiology Department Shared Research Computing Resources UCSF Radiology computing Center for Quantitative Biomedical Research (QB3 Institute) computational cluster Incentives and disincentives for sharing computing resources Advice about building collaborations and consensus
UCSF Radiology Department Computing Organization and Structure –Ownership: individual research groups (~150 desktop workstations + Linux cluster) –Administration: Radiology Research Computing Recharge –Cost Structure: hardware + support recharge from PIs research grant directs Computational Needs and Problems –Underutilized CPU’s –Some researchers have computationally demanding problems –Serial processing on individual desktop machines takes hours-days –Manual cycle stealing –Embarrassingly parallel problems Group A … Group B Radiology Research Computing Recharge
Solution: –Deploy resource management software (RMS, Sun Grid Engine) to enable parallel computing on idle desktop machines supported by recharge. –Group specific queues: users submit parallel processing jobs to idle machines within their research group. UCSF Radiology Department Computing Group B Group A … … Group B Group A Radiology Research Computing Recharge RMS - Job Scheduler Research Group Users
Observations Clustering –Increased intra-group CPU utilization –Increased adoption of computationally demanding software –Improved research capabilities and throughput However, –Inter-group CPU sharing was under-utilized –Higher-end storage needed to support IO requirements –Recharge cost for underutilized dedicated cluster doesn’t scale well –Time sharing is more cost effective than dedicated partially utilized cluster UCSF Radiology Department Computing
Interdepartmental Shared Computational Cluster Organization and Structure –Users: Interdepartmental within QB3 institute –Cost Structure: PIs research grant directs: compute nodes (time share), fits well with one time sources of funding. Institute grants and endowments: shared admin., high-end shared hardware –Governance: Technical: cluster admin, technical users Policy: committee of representative PIs –Hardware: 1200 cores (Linux), 13TB NAS Radiology’s Requirements –Real-time & interactive apps. benefit from large number of CPUs for short bursts –Access to shared high-end storage for IO intensive apps –Lower cost structure for cluster support via utilizing institute supported administration –HIPAA compliance
Experiences to date –High-end cost-effective resource for institute’s research –Varied use patterns benefit all users –Frees research group time for research –Radiology’s unique requirements (HIPAA, workflow, accessibility) slow to be implemented –Evaluate requirements, consider application interoperability: Use of Grid standards may have eased the transition for Radiology (cluster design/software porting). Interdepartmental Shared Computational Cluster
Incentives for Sharing –Reduce Costs Share administrative costs Leverage bulk buying power Increase hardware utilization –Increase Performance and QOS Justify high-end hardware: shared cost, efficient utilization Greater hardware redundancy Design input from larger expertise pool Disincentives for Sharing –Sharing isn’t equitable –Use cases vary from norm –Sharing may impact my resources Advice for Sharing –Establish guidelines for collaboration: Equitable cost structure Voting rights/governance –Develop applications/services to support accepted Grid standards