Campus Bridging in XSEDE, OSG, and Beyond 11/04/10 Campus Bridging in XSEDE, OSG, and Beyond Dan Fraser, OSG Jim Ferguson, NICS Andrew Grimshaw, UVA Rich Knepper, IU David Lifka, Cornell Internet2 Spring Members’ Meeting, Arlington, VA
Overview Campus Bridging introduction 11/04/10 September 19, 2018 Overview Campus Bridging introduction Open Science Grid and software deployment XSEDE Campus Bridging Program Global Federated File System Campus Initiatives RedCloud POD at IU
Campus Bridging: Formation September 19, 2018 Campus Bridging: Formation NSF Advisory Committee for CyberInfrastructure Community input at multiple workshops Surveys ACCI Campus Bridging Task Force Report http://pti.iu.edu/campusbridging
Campus Bridging: Concepts September 19, 2018 Campus Bridging: Concepts Making it easier for users to transition from their laptop to large-scale resources Gathering best practices to deploy resources in a way that makes them familiar to users Providing training and documentation that covers research computation at multiple scales
(Campus High Throughput Computing Infrastructures) OSG Campus Bridging (Campus High Throughput Computing Infrastructures) Dan Fraser OSG Production Coordinator Campus Infrastructure Lead Internet2 Spring Members Meeting Arlington, VA Sept 24, 2012
The Open Science Grid The Open Science Grid (OSG) has focused on campuses from its inception. All OSG computing power comes from campuses. OSG has a footprint on over 100 campuses in the US and abroad. http://display.grid.iu.edu
OSG Campus Bridging Focus Focus on the Researcher (…or Artist) One step at a time
Simple interfaces are good
Engaging with the Campus Campuses each have their own “culture” Terminology Access patterns Security Operational styles Processes (autN, autZ, monitoring, accounting, data management, …) The most fundamental issues are not technological in nature Campus Bridging = Cultural Bridging
Campus Bridging Direction Help the researcher use local resources Run on a local cluster (on campus) Run on several local clusters Use/share resources with a collaborator on another campus Access to the national cyberinfrastructure OSG (and also XSEDE) resources (BTW, OSG is also an XSEDE service provider) Submit Locally, Run Globally
OSG Campus Bridging Today LSF PBS Local User Credential Condor Local Cluster Submit Host (Bosco) OSG Cloud External Campus (could also submit to XSEDE)
Summary OSG is focused on the researcher/artist Campus bridging = cultural bridging A single submit model (Bosco) can be useful OSG is exploring how best to collaborate with XSEDE on campus bridging
Introduction to XSEDE and its Campus Bridging Program Jim Ferguson, XSEDE TEOS team Education, Outreach & Training Director, NICS jwf@utk.edu
Acknowledgements Craig Stewart, Rich Knepper, and Therese Miller of Indiana University, and others on the XSEDE campus bridging team. John Towns, NCSA, XSEDE PI
XD Solicitation/XD Program eXtreme Digital Resources for Science and Engineering (NSF 08-571) High-Performance Computing and Storage Services aka Track 2 awardees High-Performance Remote Visualization and Data Analysis Services 2 awards; 5 years; $3M/year proposals due November 4, 2008 Integrating Services (5 years, $26M/year) Coordination and Management Service (CMS) 5 years; $12M/year Technology Audit and Insertion Service (TAIS) 5 years; $3M/year Advanced User Support Service (AUSS) 5 years; $8M/year Training, Education and Outreach Service (TEOS) 5 years, $3M/year
XSEDE Vision The eXtreme Science and Engineering Discovery Environment (XSEDE): enhances the productivity of scientists and engineers by providing them with new and innovative capabilities and thus facilitates scientific discovery while enabling transformational science/engineering and innovative educational programs Providing capabilities and services beyond flops We provide the integrated environment allowing for the coherent use of the various resources and services supported by NSF.
Science requires diverse digital capabilities XSEDE is a comprehensive, expertly managed and evolving set of advanced heterogeneous high-end digital services, integrated into a general-purpose infrastructure. XSEDE is about increased user productivity increased productivity leads to more science increased productivity is sometimes the difference between a feasible project and an impractical one Integrating services for TEOS, AUSS, CMS, and TIS
XSEDE’s Distinguishing Characteristics - Governance World-class leadership partnership will be led by NCSA, NICS, PSC, TACC and SDSC CI centers with deep experience partners who strongly complement these CI centers with expertise in science, engineering, technology and education Balanced governance model strong central management provides rapid response to issues and opportunities delegation and decentralization of decision-making authority openness to genuine stakeholder participation stakeholder engagement, advisory committees improved professional project management practices formal risk management and change control decision making facilitated by level of trust developed between partners that is unprecedented…
What is campus bridging? Term originated by Ed Seidel as he charged six task forces of the NSF Advisory Committee for Cyberinfrastructure. Considerable info and final ACCI Task force report, online at pti.iu.edu/campusbridging The Taskforce definition: Campus bridging is the seamlessly integrated use of cyberinfrastructure operated by a scientist or engineer with other cyberinfrastructure on the scientist’s campus, at other campuses, and at the regional, national, and international levels as if they were proximate to the scientist . . . Catchy name, great ideas … interest > our ability to implement yet Vision: Help XSEDE create the software, tools and training that will allow excellent interoperation between XSEDE infrastructure researchers local (campus) cyberinfrastructure; Enable excellent usability from the researcher’s standpoint for a variety of modalities and types of computing: HPC, HTC, and data intensive computing Promote better use of local, regional and national CI resources
Campus Bridging Use Cases 03/01/12 03/01/12 Campus Bridging Use Cases InCommon Authentication Economies of scale in training and usability Long term remote interactive graphic session Use of data resources from campus on XSEDE, or from XSEDE at a campus Support for distributed workflows spanning XSEDE and campus-based data, computational, and/or visualization resources Shared use of computational facilities mediated or facilitated by XSEDE Access to “____ as a Service” mediated or facilitated by XSEDE 20 20
Year 2 Strategic Plan Leveraging the XSEDE process of implementing new services via the systems engineering process (Architecture & Design => Software Development and Integration => Operations), begin deploying some new services that deliver campus bridging services Communicate effectively via Campus Champions, advocates for XSEDE now located at over 100 institutions. Develop relationship and terminology with OSG, as they have been bridging between institutions for several years.
Year 2 Strategic Plan Complete planned pilot projects GFFS Pilot Program CUNY – PI: Paul Muzio KU – PI: Thorbjorn Axelsson Miami – PI: Joel Zysman TAMU – PI: Guy Almes Begin delivering selected campus-bridging related tools GFFS Documentation ROCKS Rolls Communicate “what is campus bridging”
09/23/11 Example: More consistency in CI setups => economies of scale for all In reality, the four cluster admins depicted here being in agreement are all right. Experienced cluster admins all learned how to use what they learn when the tools were still developing, so the tool each sysadmin knows the best is the tool that lets that sysadmin do their work the best The only way to develop consistency is to provide installers that will make their work easier The XSEDE architecture group is developing installers for file management tools *A la Steven Colbert, the “4 out of 5…” comment is not intended to be a factual statement 23
Your Comments, Please! Do we have the right direction? What is Campus Bridging to You?
Thank You! jwf@utk.edu
Global Federated File System GFFS Andrew Grimshaw
Basic idea and canonical use cases Accessing the GFFS Attaching (provisioning) data to the GFFS Deployment
Map resources into a global directory structure Basic idea: Map resources into a global directory structure Map global directory structure into local file system PDB NCBI EMBL SEQ_1 Data Cluster 1 Cluster 2 Cluster N Processing APP 1 APP 2 APP N Applications SEQ_1 APP 1 SEQ_2 SEQ_3 APP 2 Biology Biochemistry Partner Institution Partner Institution Research Institution
Access center resource from campus Access campus resource from center Canonical use cases Definitions Resource is {compute | job | data | identity | …} Access means create, read, update, delete Access center resource from campus Access campus resource from center Access campus resource from another campus Sharing file system or instrument data Sharing clusters
Basic idea and canonical use cases Accessing the GFFS Attaching (provisioning) data to the GFFS Deployment
Accessing the GFFS Via a file system mount Global directory structure mapped directly into the local operating system via FUSE mount XSEDE resources regardless of location can be accessed via the file system Files and directories can be accessed by programs and shell scripts as if they were local files Jobs can be started by copying job descriptions into directories One can see the jobs running or queued by doing an “ls”. One can “cd” into a running job and access the working directory where the job is running directly mkdir XSEDE nohup grid fuse –mount local:XSEDE &
E.g., Access a job’s running directory
Via a command line tools, e.g., Accessing the GFFS Via a command line tools, e.g., cp local:fred.txt /home/grimshaw/fred.txt rm /home/grimshaw/fred.txt
GUI Grid Client Typical folder based tool Tools to define, run, manage jobs Tools to manage “grid” queues Tools to “export” data Grid shell Shell has tab completion, history, help, scripting, etc.
GUI Grid Client: View Access Control To view access control information: Browse to and highlight resource, then select Security tab
Basic idea and canonical use cases Accessing the GFFS Attaching (provisioning) data to the GFFS Deployment
Exporting (mapping) data into the Grid Data clients Data clients Links directories and files from source location to GFFS directory and user-specified name Presents unified view of the data across platforms, locations, domains, etc. Sarah controls authorization policy. Sarah’s department file server Sarah’s instrument in the lab Sarah’s TACC workspace Avaki grids provide wide-area access to processing, data, and application resources in a single, uniform operating environment. Avaki grids unify resources across locations and administrative domains with different hardware, operating systems, and system configurations, creating an environment that is secure and easy to administer. Wide-area data access. Avaki’s data grid provides wide-area access to data at its source location based on business policies, eliminating the need to copy data explicitly each time it is used. Distributed processing. Avaki’s compute grid provides wide-area access to available processing resources based on business policies, while managing utilization and aggregation of processing resources for fast, efficient job completion with minimum overhead. Global Naming. All resources are given a unique, location independent name in an Avaki grid. This allows users or applications to refer to them by the same name where ever the user or resource is. It also allows them to move about the grid when necessary without the user having to know – in the event of system congestion, maintenance or failure. Policy-based configuration and administration. Avaki grids easily conform and adapt to business requirements, allow for central or local control over resources, and are easy to administer. Resource accounting. Resources consumed are accounted for by user, system, application and project. Fine-grained security. Organizations can control access to grid resources at any level of granularity without the need for a central authority. Automatic failure detection and recovery. Avaki grids provide fast, transparent recovery from individual system failures, insulating users from scheduled or unscheduled downtime. Windows TACC Linux
Exporting/sharing data User selects Server that will perform the export Directory path on that server Path in GFFS to link it to grid export /containers/Big-State-U/Sarah-server /development/sources /home/Sarah/dev Can also export Windows shares Directory structures via ssh (slow – like sshFX)
Basic idea and canonical use cases Accessing the GFFS Attaching (provisioning) data to the GFFS Deployment
There is an installer for client side access Deployment Sites that wish to export or share resources must run a Genesis II or UNICORE 6 container There will be an installer for the GFFS package for SPs, and a “Campus Bridging” package There is an installer for client side access There are training materials Used at TG 11 In the process of being turned into videos
On-Demand Research Computing - Infrastructure as a Service - - Software as a Service - www.cac.cornell.edu/redcloud
Infrastructure as a Service (IaaS) Cloud Red Cloud provides on-demand: Computing Cycles: Virtual Servers in Cloud “Instances” Storage: Virtual Disks in Elastic Block Storage (“EBS”) Volumes Red Cloud Virtual Servers “Cloud Instances” Cloud Management Virtual Disks “Elastic Block Storage (EBS)” Virtual Server Users www.cac.cornell.edu/redcloud
Software as a Service (SaaS) Cloud with MATLAB GridFTP Server MyProxy Server Web Server SQL Server Compute Nodes Dell C6100 NVIDIA Tesla M2070s Head Node Network Interconnect GPU Chassis Dell C410x DDN Storage www.cac.cornell.edu/redcloud
Motivation Research computing means many different things… Scientific workflows have different requirements at each step Cloud is only part of the solution Connecting to and from other CI resources is important Nobody likes a bad surprise Transparency, no hidden costs Need a way to bound financial risk Economies of scale Sharing hardware and software where it makes sense Pay for what you need, when you need it Customized environments for various disciplines Collaboration tools Data storage & analysis tools Flexibility to support different computing models (e.g. Hadoop) www.cac.cornell.edu/redcloud
Provides Predictable, Reproducible, Reliable Performance Convenient We publish hardware specifications (CPU, RAM, network) and do not oversubscribe. Convenient Need system up and running yesterday. Need a big fast machine for only a few months, weeks or days. Need a small server to run continuously. No Hidden Costs No cost for network traffic in or out of the cloud. Fast Access to Your Data Fast data transfers via 10Gb Ethernet in or out of the cloud at no additional charge. Globus Online access Economies of scale IaaS: Infrastructure SaaS: Software Expert Help System, application, and programming consulting are available. Easy Budgeting with Subscriptions No billing surprises! IaaS is Amazon API Compatible Migrate when your requirements outgrow Red Cloud. www.cac.cornell.edu/redcloud
Some Use Cases to Consider Support for Scientific Workflows Pre & post-processing of data and results Data analysis Globus Online for fast reliable data transfer https://www.globusonline.org/ Collaboration Wiki hosting Customized data analysis & computational environments Web Portals Science Gateways Domain Specific Portals Hub Zero http://hubzero.org/pressroom http://nanohub.org Event-Driven Science https://opensource.ncsa.illinois.edu/confluence/display/SGST/Semantic+Geostreaming+Toolkit Education, Outreach & Training Pre-configured systems & software tools providing consistent training platform Common laboratory computing environment Bursting Additional software and hardware on demand www.cac.cornell.edu/redcloud
Subscription-based Recovery Model with MATLAB Cornell University $500/core year* Other Academic $750/core year Institutions Cornell University $750/core year Other Academic $1200/core year Institutions *A core year is equal to 8585 hours Each subscription account includes 50GB of storage www.cac.cornell.edu/redcloud
What if ??? Consulting Additional Storage Cornell Users $59.90/hour $0.91/GB/year Other Academic $85.47/hour $1.45/GB/year Institutions www.cac.cornell.edu/redcloud
Internet2 Spring Members’ Meeting 2012 Penguin Computing / IU Partnership HPC “cluster as a service” and Cloud Services Internet2 Spring Members’ Meeting 2012 Rich Knepper (rich@iu.edu) Manager, Campus Bridging Indiana University
What is POD Services On-demand HPC system Compute, storage, low latency fabrics, GPU, non-virtualized Robust software infrastructure Full automation User and administration space controls Secure and seamless job migration Extensible framework Complete billing infrastructure Services Custom product design Site and workflow integration Managed services Application support HPC support expertise Skilled HPC administrators Leverage 13 yrs serving HPC market Internet (150Mb, burstable to 1Gb)
Clouds look serene enough - But is ignorance bliss? In the cloud, do you know: Where your data are? What laws prevail over the physical location of your data? What license you really agreed to? What is the security (electronic / physical) around your data? And how exactly do you get to that cloud, or get things out of it? How secure your provider is financially? (The fact that something seems unimaginable, like cloud provider such-and-such going out of business abruptly, does not mean it is impossible!) Photo by http://www.flickr.com/photos/mnsc/ http://www.flickr.com/photos/mnsc/2768391365/sizes/z/in/photostream/ http://creativecommons.org/licenses/by/2.0/
Penguin Computing & IU partner for “Cluster as a Service” Just what it says: Cluster as a Service Cluster physically located on IU’s campus, in IU’s Data Center Available to anyone at a .edu or FFRDC (Federally Funded Research and Development Center) To use it: Go to podiu.penguincomputing.com Fill out registration form Verify via your email Get out your credit card Go computing This builds on Penguin’s experience - currently host Life Technologies' BioScope and LifeScope in the cloud (http://lifescopecloud.com)
We know where the data are … and they are secure
An example of NET+ Services / Campus Bridging "We are seeing the early emergence of a meta-university — a transcendent, accessible, empowering, dynamic, communally constructed framework of open materials and platforms on which much of higher education worldwide can be constructed or enhanced.” Charles Vest, president emeritus of MIT, 2006 NET+ Goal: achieve economy of scale and retain reasonable measure of control See: Brad Wheeler and Shelton Waggener. 2009. Above-Campus Services: Shaping the Promise of Cloud Computing for Higher Education. EDUCAUSE Review, vol. 44, no. 6 (November/December 2009): 52-67. Campus Bridging goal – make it all feel like it’s just a peripheral to your laptop (see pti.iu.edu/campusbridging)
IU POD – Innovation Through Partnership True On-Demand HPC for Internet2 Creative Public/Private model to address HPC shortfall Turning lost EC2 dollars into central IT expansion Tiered channel strategy expansion to EDU sector Program and discipline-specific enhancements under way Objective third party resource for collaboration EDU, Federal and Commercial
POD IU (Rockhopper) specifications Server Information Architecture Penguin Computing Altus 1804 TFLOPS 4.4 Clock Speed 2.1GHz Nodes 11 compute; 2 login; 4 management; 3 servers CPUs 4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node Memory Type Distributed and Shared Total Memory 1408 GB Memory per Node 128GB 1333MHz DDR3 ECC Local Scratch Storage 6TB locally attached SATA2 Cluster Scratch 100TB Lustre Further Details OS CentOS 5 Network QDR (40Gb/s) Infiniband, 1Gb/s ethernet Job Management Software SGE Job Scheduling Software Job Scheduling policy Fair Share Access keybased ssh login to headnodes remote job control via Penguin's PODShell
Available applications at POD IU (Rockhopper) Package name Summary COAMPS Coupled ocean / atmosphere meoscale prediction system Desmond Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. GAMESS GAMESS is a program for ab initio molecular quantum chemistry. Galaxy Galaxy is an open, web-based platform for data intensive biomedical research. GROMACS GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. HMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. Intel compilers and libraries LAMMPS LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. MM5 The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is supported by several pre- and post-processing programs, which are referred to collectively as the MM5 modeling system. mpiBLAST mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. NAMD NAMD is a parallel molecular dynamics code for large biomolecular systems.
Thank you! Questions?