Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre.

Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre University of Reading

Motivation Grid computing is “distributed computing performed transparently across multiple administrative domains” Implies both ease of use and security –hard to get both simultaneously! Ease of setup and maintenance also highly desirable Difficulty in achieving this is major block to uptake of Grid computing Currently hard for science projects to build their own Grids without very significant technical help In this talk: –Ease of use (transparency) comes from Styx Grid Services –Security comes from SSH

Running jobs on a Grid Basic use of a Grid boils down to: –uploading the required input files for a job –running the job –downloading the output (More advanced use includes workflows, delegation, etc) Many users ask – why not just use SSH? –File transfer with SFTP/SCP –Execution through SSH exec or SSH login

Advantages of SSH Trusted and understood by system administrators Very widely used, bugs get fixed quickly Lots of implementations and good tools –e.g. WinSCP (Windows Explorer-like interface to remote systems) Choice of authentication methods including: –password –public-private key pair –pluggable (e.g. GSI-SSH for Globus logins) Can mount remote filesystems exposed through SSH –sshfs for Linux (analogous to NFS) –SftpDrive for Windows –Hence can work on a remote file without downloading all of it: potentially important in environmental sciences Can execute remote programs with SSH Hence SSH can be the nucleus of a simple Grid

What’s wrong with Globus security? Globus uses X.509 certificates and time-limited proxies –Proxies can be used to temporarily delegate authority to a third party –Certificates have typical life-time of 1 year High level of security but proven usability problem –Lots of certificate formats – different tools require different formats –Users can’t “remember” the certificate so need to have a copy on every computer they use (or on USB stick or shared disk) –Illegal sharing of certificates: “mine doesn’t work, can I use yours?” –Users known to run SSH as a grid job, then log on to that to get a familiar environment! Therefore poor usability leads to poor security in practice –and annoys users no end Conclusion – user certificates should be avoided if possible MyProxy can help with these problems –cf. NERC DataGrid –but is an extra server to manage

Styx Grid Services Simple, lightweight system for exposing executables as a service –Executable is installed on a service provider (host) SGSs are executed just like local programs –myprog –i input.dat –o output.dat –( myprog is a wrapper script that masquerades as the original executable) –files transferred automatically, user doesn’t have to know where Supports interactive use –including computational steering –But executables exposed through SGS must be non-graphical “Workflows” can be constructed with shell scripts –data can be streamed directly between the services –extract | process | render –Supported by Taverna Emphasis on ease of deployment and use, not feature completion

How SGS works Server contains complete description of executable in XML –includes input and output files, command-line parameters SGSRun program downloads XML description and parses the command line Creates new service instance and uploads input files Starts the service and monitors progress Uploads stdin and downloads stdout and stderr as the service runs, redirecting them from and to the console Downloads output files when the service finishes <param type="unflaggedOption" name="inputfile"/> <param type="unflaggedOption" name="outputfile"/> <input type="fileFromParam" name="inputfile"/> <output type="fileFromParam" name="outputfile"/> <output type="stream" name="stderr"/>

SGS and security SGS server can be run in two modes: Daemon mode: –Standalone server (a container for services) –Traffic optionally encrypted through Secure Sockets (SSL) –Authentication through custom protocol –need to maintain own user database –Jobs run as a generic user Tunnelled mode: –Server process executed through Secure Shell (SSH) –Client and server communicate down the encrypted channel –Authentication through SSH –No separate user database – just need login on host system –Jobs run with permissions of the specific user analogous to other systems e.g. Subversion Client interface is the same in both cases –Choice is purely down to service providers

SGS + SSH = … You can execute remote jobs with SSH alone, but only stdin, stdout and stderr are communicated down the line –Need to upload and download input and output files "manually" Styx allows an arbitrary number of channels to be sent down the secure line … –Data streams –Input and output files –Progress and status messages –Steering messages … through use of the Styx protocol for distributed systems –File-sharing protocol similar to NFS –We have pure-Java implementation of Styx (http://jstyx.sf.net) –Any resource can be represented as a URL: styx+ssh://myhost/myservice/instances/1/outputs/stdout

Demo 1: A basic Grid job Remote execution of GULP (General Utility Lattice Program) –Julian Gale Calculates lots of properties of crystal lattices –e.g. Helmholtz free energy Reads input from stdin, prints output to stdout –gulp < infile Running remote job exactly the same as running locally Client-side stub and server-side SGS framework communicate through Styx messages on the secure channel GULP Client GULP stub Styx messages exchanged on SSH channel SGS

Demo 2: Condor job SGS system can be installed on a Condor submit host If user specifies a directory of input files instead of a single file, jobs are split across worker nodes in the pool –gulp inputs outputs –One job per file in the inputs directory –SGS system automatically creates Condor submit file and monitors progress Progress is displayed on the client's console Easy way to specify parameter sweep jobs, ensemble data processing etc. Could apply to Sun GridEngine and other DRMs –Interactive use may not be possible depending on DRM GULP Client GULP stub Condor worker nodes SGS Condor submit host SSH

Submission to Globus resources Two options: Use GSI-SSH instead of SSH –SSH with Globus authentication –(thanks to CCLRC for Java code to GSI-SSHTerm) –doesn’t quite work yet… ;-) Submit to Condor-G instead of Condor (right) –OxGrid uses Condor-G to submit jobs to National Grid Service –Very similar to normal Condor operation GULP Client GULP stub Globus resources SGS Condor-G Submit host SSH

Long-running jobs and robustness Client might disconnect the SSH connection deliberately or accidentally This might bring down the SGS server process! Client would not be able to re-connect –(In daemon mode this is less of a problem as the server is persistent) We have designed but not yet implemented a solution to this –A little coding and a lot of thinking and testing is required! This is also needed to support workflows properly (services need to connect to one another to transfer data directly) In progress!

Case study: GCEP project Grid for Coupled-model Ensemble Prediction –Uses clusters in Reading, British Antarctic Survey and RAL Run climate models (MPI jobs) then analyse output (single- machine jobs) –Focusses on ensembles, so want to run same program over different input Scientists write programs in whatever language they like Deploy on the GCEP servers and create the XML description Anyone with SSH access to the servers can then run the programs through SGS as if they were local –programs can be run on clusters through Sun Grid Engine –Data transfers happen automatically MPI jobs on clusters Trivially parallel jobs on Condor pool of ordinary desktops (Reading Campus Grid)

Limitations Robustness Slow data transfers because encrypted –could use alternative transport –There are ways to improve this but need more testing SGS does not provide a resource broker –But can use Condor-G for this Users can't (yet) submit arbitrary executables Complex executables (that spawn other exes) might be hard to deploy in SGS –But we haven't really tried yet Can't deploy a GUI app as an SGS

Conclusions To use SGS-SSH all you need is: –An SSH login to the remote system –The SGS software (5MB of pure Java libraries) Users run Grid jobs securely just like ordinary local programs Can submit to Condor, Globus and other DRMs Can create "workflows" of Styx Grid Services with shell scripts –Data can be transferred directly between services SGS already available: SGS-SSH needs more work –Version 0.2.0 of JStyx downloaded 218 times so far –(most of them probably just want Styx implementation, not SGS )

Future work Case studies! Robustness Optimize data transfer speed GridSAM integration (possible) –already has framework for submission to various DRMs –but limited by JSDL limitation of “one job at a time” Compare with my_condor_submit –From e-Minerals project

Acknowledgements and references Thanks to… –David Wallom of OERC for helping to integrate with OxGrid –Tom Oinn of Taverna project for Taverna integration –Vita Nuova Holdings Ltd for technical help with Styx protocol See also… –Reading e-Science Centre booth –Papers in AHM proceedings 2004,5,6 –http://jstyx.sf.net

Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre.

Similar presentations

Presentation on theme: "Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre.

Similar presentations

Presentation on theme: "Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre."— Presentation transcript:

Similar presentations

About project

Feedback