Transition in Campus CyberInfrastructure: Community Clusters, Storage and Co-Loc Introduce yourself Presented by Dwight McKay, Director of Systems Engineering ITaP Rosen Center for Advanced Computing Purdue University
Introduction Community Clusters Summarize model and issues Outline Introduction Community Clusters Summarize model and issues Perspective is that of an infrastructure builder / operator Focus on computational and storage allocation, growth strategies, funding Start with our structure and the sea change we see as we move from a centrally funded world to being a resource provider on-campus and beyond
Introduction Community Clusters Summary This is the overall structure of ITaP Introduction Community Clusters Summary
Introduction Community Clusters Summary The Rosen Center is one of five business units. We focus on the cyberinfrastructure needs of the campus and beyond Introduction Community Clusters Summary
Rosen Center for Advanced Computing This is the structure of RCAC Talk about each sub-unit Introduction Community Clusters Summary
Rosen Center for Advanced Computing This is the structure of RCAC Talk about each sub-unit Introduction Community Clusters Summary
Transition in Research Computing Support Move from the center to a service, a collaborator Our history has been to share systems, first with job scheduling, then as we transitioned out of the centrally funded realm we built clusters from old lab systems and then moved into the condo model that we call community clusters. Introduction Community Clusters Summary
Transition in Research Computing Support Change in Direction From central purchase to researcher / project purchase From central shared facility to resource or service provider From service desk to partner Move from the center to a service, a collaborator Our history has been to share systems, first with job scheduling, then as we transitioned out of the centrally funded realm we built clusters from old lab systems and then moved into the condo model that we call community clusters. Introduction Community Clusters Summary
Transition Implications Paying Customers Higher Expectations Service & Support Formal Agreements Cultural Change This transition is the driver for moving into new models of systems acquisition, resource allocation and collaboration The biggest change is that we now have customers explicitly paying for services Introduction Community Clusters Summary
Rosen Center for Advanced Computing While we are structured into specific areas, the boundaries between these areas are more fluid than this diagram suggests. The structure is more of a matrix with projects and people spanning across the reporting boundaries as needed to support our customers. Also note that we incorporate a research group as well as user support. Research User Support Infrastructure Introduction Community Clusters Summary
Rosen Center for Advanced Computing Project A typical project has connections into both computational infrastructure, accounts, queues, etc. AND high level support, consultation, code optimization, application support, etc. Larger, more complex projects often pull in larger sets of resources, such as project specific WAN links, project management, software development and custom infrastructure design and deployment. Project Introduction Community Clusters Summary
Rosen Center for Advanced Computing Our teams have people who span groups. A person reporting to Seb to manage a system in a grid project would also participate in the system team, come to our meetings, work and act like a member of our team to provide a close connection and better achieve the support needs the grid project needs. We also embed people into research teams to provide IT expertise needed to move a project along. Project Introduction Community Clusters Summary
Transition Implications New Business Models Needed HW/SW, infrastructure, support? Unpredictable Demand & Funding Planning for power/space/cooling? Non-paying “Users” How do we pay for hardware, software, services, people in this new environment? If we are not centrally funded, how do we predict the demand we will see from our customers? What about our “general” users; those who did not buy into our services? Introduction Community Clusters Summary
Custom Arrangements for Specific Projects Community Clusters Condominium Model Purchase Computation “by the node” Nodes come with bundle of services Purchase Storage “by the TeraByte” Storage comes with bundle of services Custom Arrangements for Specific Projects Introduction Community Clusters Summary
Services Community Clusters HW Installation HW Maintenance Facilities Support Network Connection Infiniband Connnection OS installation and management Disk Storage Archival Storage On-Call Support Disaster Recovery Security Introduction Community Clusters Summary
Services Node Bundle Community Clusters HW Installation HW Maintenance Facilities Support Network Connection Infiniband Connnection OS installation and management Disk Storage Archival Storage On-Call Support Disaster Recovery Security Node Bundle Introduction Community Clusters Summary
Tiered Cycle Allocation Community Clusters Tiered Cycle Allocation Owners guaranteed specific share Owners given first pick of idle cycles Owners agree to harvesting remaining idle cycles “Use the whole buffalo.” -- Brad Bird Brad Bird is the director of “The Incredibles”. Important for space/power/cooling, non-paying customers, cycles for Grid users Introduction Community Clusters Summary
Tiered Cycle Allocation Community Clusters Tiered Cycle Allocation Owner Introduction Community Clusters Summary
Tiered Cycle Allocation Community Clusters Tiered Cycle Allocation Owner Pre-empt Introduction Community Clusters Summary
Tiered Cycle Allocation Community Clusters Tiered Cycle Allocation Owner Mention Condor here. Cycle Harvesting Pre-empt Introduction Community Clusters Summary
Community Clusters Challenges Business model that spans system generations & recovers shared infrastructure costs Cluster Heterogeneity Multiple communities sharing one cluster (TeraGrid, OSG, NWICG) Other architectures & special needs? We are heading towards the end of a three year cycle on cluster building. How do we do retirement? How do we pay for interconnection infrastructure and upgrade it over time? What do we do when the particular node we use is nolonger available? What about those folks who need a shared memory system? Introduction Community Clusters Summary
Storage Community Clusters Three Primary Tiers Fast for scratch files Commodity for home directories Archival for “longer term” storage Custom Storage for Specific Projects Introduction Community Clusters Summary
Storage Challenges Community Clusters Business model that spans lifetime of data / researcher? Media Purchase vs. Space Rental? Data Retention Policy Initially we had researchers buy disk trays. But storage technology is progressing and there’s a potential danger in being stuck maintaining old storage. How long to keep something and how do we help a research take his data with him when he might have 10s to 100s of TB? Introduction Community Clusters Summary
Connectivity Challenges Community Clusters Connectivity Challenges Multiple classes of network are needed Direct Routes Data Center to Key Research WANs Data Center to Key Research Labs Introduction Community Clusters Summary
Introduction Community Clusters Summary
Four Fold Network Community Clusters Commodity Secure Research High Performance / Large Data Network Research Low Latency Introduction Community Clusters Summary
Summary Transition to resource provider / partner / service Architecture Computation -> Community Clusters Storage -> 3 Tiers Connectivity -> Direct Data Center connections to Research WAN and Lab Access Challenges Customer expectations and implications Serving up storage and other architectures “by the slice” Recovering ALL the costs, especially support Introduction Community Clusters Summary