VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London
VPH NoE, 2010 Contents Introduction to grid computing What resources are available to VPH? How to gain access to computational resources Deploying applications on EU resources Making things easier with VPH ToolKit components Coupling multiscale simulations AHE demonstration Going further
VPH NoE, 2010 What is a Grid? Allow resource sharing between multiple, distinct organisations Cross administrative domains Security is important – only authorized users are able to use resources Differs from cluster computing “Computing as a utility” – Clouds too?
VPH NoE, 2010 Resources Available to VPH Researchers DEISA Heterogeneous HPC infrastructure currently formed by eleven European national supercomputing centres that are interconnected by a dedicated high performance network. EGI/EGEE Consists of 41,000 CPU available to users 24 hours a day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs. EGI includes national grid initiatives Clouds – Amazon EC3 etc
VPH NoE, 2010 Compute Allocations for VPH Conducted a survey of all VPH-I projects and Seed EPs to assess their computational requirements. This was used to prepare an application to the DEISA Virtual Communities programme for an allocation of resources to support the work of all VPH-I projects, managed by the NoE. We have set up these VPH VC allocations and run them for the past 2 years – now extended until the end of DEISA in 2011
VPH NoE, 2010 EGEE/EGI EGEE has transformed into EGI and will start to integrate national grid infrastructures Consists of 41,000 CPU available to users 24 hours a day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs. EGEE/EGI have agreed to provide access to VPH researchers through their BioMed VO. Ask for more details if you would like to use this. If you would like access contact
VPH NoE, 2010 DEISA Project Allocations + euHeart in second wave, and other non-VPH EU projects VPH was awarded 2 million standard DEISA core hours for 2009, renewed for 2010 and 2011 HECToR (Cray, UK) SARA (IBM Power 6, Netherlands)
VPH NoE, 2010 Choosing the right resources for your application Parallel/MPI/HPC applications: Use DEISA (some DEISA sites have policies to promote large-scale computing. Embarrassingly parallel/task farming Use EGI/National Grid Initiative Split multiscale application between appropriate resources
VPH NoE, 2010 Uniform access to resources at all scales Middleware provides underlying computational infrastructure infrastructure Allows the different resources to be used seamlessly (grid, cloud etc). Automates machine to machine communication with minimal user input Virtualises resources – user need know nothing of the underlying operating systems etc Requires X.509 certificate!
VPH NoE, 2010 Accessing DEISA/EGEE DEISA Provides access through Unicore/SSH (some sites have Globus) Applications compiled at terminal, then submitted to batch queue or via Unicore EGI/EGEE Accessed via gLite, often submit job to compile National Grid Initiative Should run gLite if part of EGI May run Globus/Unicore
VPH NoE, 2010 The Application Hosting Environment Based on the idea of applications as web services Lightweight hosting environment for running unmodified applications on grid resources (DEISA, TeraGrid) and on local resources (departmental clusters) Community model: expert user installs and configures an application and uses the AHE to share it with others Simple clients with very limited dependencies Non-invasive – no need for extra software to be installed on resources for-accessing-compute-resources/ahe.html
VPH NoE, 2010 Motivation for the AHE Problems with current middleware solutions: Difficult for an end user to configure and/or install Dependent on lots of supporting software also being installed Require modified versions of common libraries Require non-standard ports to be opened on firewall We have access to many international grids running different middleware stacks, and need to be able to seamlessly interoperate between them
VPH NoE, 2010 AHE Functionality Launch simulations on multiple grid resources Single interface to monitor and manipulate all simulations launched on the various grid resource Run simulations without manually having to stage files and GSISSH in Retrieve files to local machine when simulation is done Can use a combination of different clients – PDA, desktop GUI, command line
VPH NoE, 2010 AHE Design Constraints Client does not have require any other middleware installed locally Client is NAT'd and firewalled Client does not have to be a single machine Client needs to be able to upload and download files but doesn’t have local installation of GridFTP Client doesn’t maintain information on how to run the application Client doesn’t care about changes to the backend resources
VPH NoE, 2010 AHE 2.0 Provides a Single Platform to: Launch applications on Unicore and Globus 4 grids by acting as an OGSA-BES and GT4 client Create advanced reservations using HARC and launch applications in to the reservation Steer applications using the RealityGrid steering API or GENIUS project steering system Launch cross-site MPIg applications AHE 3 plans to support: SPRUCE urgent job submission Lightweight certificate sharing mechanisms
VPH NoE, 2010 Virtualizing Applications Application Instance/Simulation is central entity; represented by a stateful WS- Resource. State properties include: simulation owner target grid resource job ID simulation input files and urls simulation output files and urls job status
VPH NoE, 2010 AHE Reach DEISA UK NGS Leeds Manchester Oxford RAL HPCx NGS Local resources GridSAM Globus UNICORE TeraGrid Globus
VPH NoE, Cross site access 18 Scientific Application (e.g. bloodflow simulation HemeLB) Application Hosting Environment (AHE) as scientific-specific Client technology Globus GRAM 4 Interface TeraGrid Globus 4 stack Security Policies with gridmaps GLUE- based information GridFTP Interface HPC Ressource within NGS HPC Ressource within DEISA Storage Resources (e.g. tape archives, robots) Common Environment massively parallel jobs with HemeLB Common Environment DEISA Modules with HemeLB massively parallel jobs with HemeLB OGSA-BES Interface Grid Middleware UNICORE with BES Security Policies (e.g. XACML) GLUE- based information Stefan Zasada, Steven Manos, Morris Riedel, Johannes Reetz, Michael Rambadt et al., Preparation for the Virtual Physiological Human (VPH) project that requires interoperability of numerous Grids
VPH NoE, 2010 AHE Deployment AHE Server Released as a VirtualBox VM image – download image file and import to VirtualBox All required services, containers etc pre-configured User just needs to configure services for their target resources/applications AHE Client User’s machine must have Java installed User downloads and untars client package Imports X.509 certificate into Java keystore using provided script Configures client with endpoints of AHE services supplied by expert user
VPH NoE, 2010 Hosting a New Application Expert user must: Install and configure application on all resources on which it is being shared Create an XSLT template for the application (easily cloned from exiting template) Add the application to the RMInfo.xml file Run a script to reread the configuration Documentation covers whole process of deploying AHE & applications on NGS and TeraGrid
VPH NoE, 2010 Authentication via local credentials A designated individual puts a single certificate into a credential repository controlled by our new “gateway” service. User uses a local authentication service to authenticate to our gateway service. Our gateway service provides a session key (not shown) to our modified AHE client and our modified AHE Server to enable the AHE client to authenticate to the AHE Server. Our gateway service obtains a proxy certificate from its credential repository as necessary and gives it to our modified AHE Server to interact with the grid. User now has no certificate interaction. Private key of the certificate is never exposed to the user. User Credential Repository Our “gateway” service Grid Middleware Computational Grid PI, sysadmin, … certificate Modified AHE Server using our modified AHE client and a local authentication service Audited Credential Delegation
VPH NoE, 2010 ProjectName Certificate ResourceName Certificate Proxy Key Certificate Key ResourceName ProjectName Proxy UserID Credential Repository DB_LAS u 1,pw 1 u 2,pw 2 u 3,pw 3 u 4,pw 4 Local Database Authentication Server Authorization Module Paramterized Role Based Access Control Gateway Service Trusted CA Revoked CA Role [Tasks] UserID [Role] u 1,pw 1 ACD Architecture RolePermission: Role [Task] UserRole: UserID [Role] Recent AHE + ACD usability study found combined solution to be much more usable than other interfaces
VPH NoE, 2010 Deploying multiscale applications Loosely coupled: Codes launched sequentially – output of one app is input to next Could be distributed across multiple resources Perl scripts GSEngine Tightly coupled: Codes launched simulations Could be distributed across multiple resources RealityGrid Steering API MPIg
VPH NoE, 2010 Workflow Tools Many tools exist to allow users to connect up applications running on a grid in to workflow: Taverna, OMII-BPEL, Triana, GSEngine, Scripts Most workflow tools interface to and orchestrate web services Workflows can also include data access, data processing and analysis services Can assist with connecting different VPH models together
VPH NoE, 2010 Constructing workflows with the AHE By calling command line clients from Perl script complex workflows can be achieved Easily create chained or ensemble simulations For example jobs can be chained together: ahe-prepare prepare a new simulation for the first step ahe-start start the step ahe-monitor poll until step complete ahe-getoutput download output files repeat for next step
VPH NoE, 2010 Constructing a workflow ahe-prepare ahe-start ahe-monitor ahe-getoutput
VPH NoE, 2010 Computational Techniques Ensemble MD is suited for HPC GRID Simulate each system many times from same starting position Each run has randomized atomic energies fitting a certain temperature Allows conformational sampling Start ConformationSeries of Runs End Conformations Cx C2 C1 C4 C3 Equilibration Protocols eq1 eq2 eq3 eq4 eq5 eq6 eq7 eq8 Launch simultaneous runs (60 sims, each 1.5 ns)
VPH NoE, 2010 GSEngine GSEngine is a workflow orchestration engine developed by the ViroLab project Can be used to orchestrate applications launched by AHE It allows services to be orchestrated using both point and click and scripting interfaces Workflows stored in a repository and shared between users Many of the aims of ViroLab similar to VPH-I projects, so GSEngine will be useful here Included in VPH ToolKit: noe.eu/home/tools/compute- resources/workflow/gsengine.html
VPH NoE, 2010 Closing the loop between simulation and user Monitoring a running simulation: Pause, resume or stop if required Altering parameters in a running simulation Receive feedback via on-line visualization Restarting a simulation from a known checkpoint Reproducibility, provenance Computational Steering Render SimulateControlVisualize
VPH NoE, 2010 AHE/ReG Steering Integration Simulation Steering library Visualization Registry Steering WS connect register bind data transfer (sockets/files) bind Steering library AHE launch start Extended AHE Client launch Slide adapted from Andrew Porter
VPH NoE, 2010 Cross-site Runs with MPI-g MPI-g has been designed to run allow single applications run across multiple machines Some problems won’t fit on a single machine, and require the RAM/processors of multiple machines on the grid. MPI-g allows for jobs to be turned around faster by using small numbers of processors on several machines - essential for clinician
VPH NoE, 2010 MPI-g Requires Co-Allocation We can reserve multiple resources for specified time periods Co-allocation is useful for meta-computing jobs using MPIg, viz and for workflow applications. We use HARC - Highly Available Robust Co- scheduler (developed by Jon Maclaren at LSU). Slide courtesy Jon Maclaren
VPH NoE, 2010 HARC HARC provides a secure co-allocation service Multiple Acceptors are used Works well provided a majority of Acceptors stay alive Paxos Commit keeps everything in sync Gives the (distributed) service high availability Deployment of 7 acceptors --> Mean Time To Failure ~ years Transport-level security using X.509 certificates HARC is a good platform on which to build portals/other services XML over HTTPS - simplerthan SOAP services Easy to interoperate with Very easy to use with the Java Client API
VPH NoE, 2010 Multiscale Applications on European e-Infrastructures (MAPPER) VPHFusion Computional Biology Material Science Engineering MAPPER Distributed Multiscale Computing Needs Partners: UvA, UCL, UU, PSC, Cyfronet, LMU, UNIGE, Chalmers, MPG €2.5M, starting 1 st October for 3 years
VPH NoE, 2010 MAPPER Infrastructure
VPH NoE, 2010 Going further 1. Download and install AHE Server: Download: Quick start guide: 2. Host a new application (e.g. /bin/date) Generic app tutorial: 3. Modify these example workflow scripts to run your application: http :// workflow.tgz