Legion: The Grid OS Architecture and User View Anand Natrajan ( ) Marty Humphrey ( ) The Legion Project, University of Virginia ( )
Grid Environment Computers Networks People Data Devices Disjoint file systems Disjoint namespaces Multiple administration domains Unpredictable load, availability, failures Security problems
Grid OS Requirements Wide-area High Performance Complexity Management Extensibility Security Site Autonomy Input / Output Heterogeneity Fault-tolerance Scalability Simplicity Single Namespace Resource Management Platform Independence Multi-language Legacy Support
Legion - A Grid OS
Tools MPI / PVM P-space studies - multi-run Parallel C++ Parallel object-based Fortran CORBA binding Object migration Accounting Remote builds and compilations Fault-tolerant MPI libraries Post-mortem debugger Console objects Parallel 2D file objects Collections Licence support
Commercial Support - Avaki Corp. Mentat Legion Avaki Web Venture funded Headquartered in Boston Growing number of employees Multi-tiered support offering
Protein Folding with CHARMM Molecular Dynamics Simulations structures to sample (r,R gyr ) space R gyr
IBM Blue Horizon SDSC 375MHz Power3 512/1184 IBM Blue Horizon SDSC 375MHz Power3 512/1184 Resources Available HP V-class CalTech 440 MHz PA /128 HP V-class CalTech 440 MHz PA /128 IBM SP3 UMich 375MHz Power3 24/24 IBM SP3 UMich 375MHz Power3 24/24 IBM Azure UTexas 160MHz Power2 32/64 IBM Azure UTexas 160MHz Power2 32/64 Sun HPC SDSC 400MHz SMP 32/64 Sun HPC SDSC 400MHz SMP 32/64 DEC Alpha UVa 533MHz EV56 32/128 DEC Alpha UVa 533MHz EV56 32/128
Transparent Remote Execution User initiates “run” User/Legion selects site Legion copies binaries Legion copies input files Legion starts job(s) Legion monitors progress Legion copies output files
Mechanics of CHARMM Runs Legion Register binaries Create task directories & specification Dispatch runs Dispatch more runs
Types Of Applications Legacy applications Legion-aware applications –I/O library –2D file object Applications Using Stdgrid Parameter Space Studies Parallel Programs –MPI, PVM, MPL, Basic Fortran Support (BFS)
Grid Application Requirements Security Fault-tolerance Heterogeneity Collaboration … Legion supports these and other needs
Heterogeneous Runs BT-Med Ocean Model
Cross-Organisation Collaboration Different companies Proprietary simulations and data Each needs the other Form virtual partnership
Platforms Windows NT, 2K, 98, 95 Sun (Solaris) SGI (Irix, Origin) Intel (Linux, Free BSD) DEC (Unix, Linux) Cray (T90, T3E) IBM (AIX, SP-2) HP (HPUX) Codine LoadLeveler Maui PBS NQS LSF
Applications Biochemistry and Molecular Science Information Retrieval Materials Science Climate Modelling Neuroscience Aerospace Astronomy Graphics NPACI - SDSC, UCSD, Caltech, UTexas, Umich, UCB, UVa. DoD MSRCs - NAVO & ARL, NASA Ames
User View Command-Line Interface
Setup Setup shell environment variables. ~legion/setup.sh OR export LEGION=/home/legion/Legion export LEGION_OPR=/home/maya/OPR. $LEGION/bin/legion_env.sh Specifies where binaries and configuration files can be found Sets root context
Login Authentication to system legion_login /users/stephen Currently uses password - other mechanisms, e.g., Kerberos ticket possible Login object (a.k.a. Authentication object) - /users/stephen - is user’s proxy to world Login object generates certificate identifying user
Context Space / hostsusershome mach1mach2youmemydir progfile1ttysubdir Unix-like legion_ls legion_pwd legion_cd legion_cat...
Context Space Network-wide, transparent file system Location-independent read/write of files Convenient transfer of files between context space and local file system I/O libraries for access Unix-like utilities
Context Example legion_ls /
Another Context legion_ls /hosts
Yet Another Context legion_ls /users
More Context Fun
Other Context Commands Locate a LOID in context space legion_list_names Locate an object on a machine legion_whereis Find status of an object legion_object_info List metadata of an object legion_list_attributes
Status Of An Object legion_object_info -c work
Physical Location Of Object legion_whereis -c work
Context Space vs. Local Space Local space = your machine’s directory structure –OS-specific, Machine-specific –Use cp, copy, etc. –e.g., C:\Program Files\, /usr/bin, /mnt/disk1 Context space = Legion’s directory structure –OS-independent, Machine-independent –Use legion_cp, etc.
Context Space and Local Space Transfer one file from local space to context space legion_cp -localsrc Transfer one file from context space to local space legion_cp -localdest
Context Space and Local Space Copying local directory to context space legion_cp -r -localsrc OR legion_import_tree Copying context directory to local space legion_cp -r -localdest
Context Space and Local Space Map (not copy!) local directory to context space temporarily legion_export_dir Does NOT make copy of local directory Merely provides Legion-like access to local directory –Use legion_cat on local files
Making Context Space… Local sub-directory with Legion NFS daemon –Use cat on context files FTP directory with FTP interface Windows directory with Samba interface URL tree with HTTP interface
I/O Performance –X-Axis = number of clients simultaneously performing 1MB reads on 10MB files –Y-Axis = total read bandwidth –Each point = average of multiple runs –Clients = 400MHz Intels, NFS Server = 800MHz Intel
Making Context Space… Local sub-directory with Legion NFS daemon –Use cat on context files FTP directory with FTP interface Windows directory with Samba interface URL tree with HTTP interface
Flexible Context Space Context Directory Disk e ftp legion_export_dir legion_import_tree Samba NFSHTTP FTP
Access Control MayI for each object implements access control on a per-function basis Users named by login object Sets of users grouped by contexts legion_change_permissions [+-rwx] [-v] legion_change_permissions +r /users/fred /home/grimshaw/myfile
Access Control Example
Unified Console Prog. File TTY User creates tty object User starts running program Legion passes tty LOID to program Program produces stdout, stderr User shares tty LOID User shares tty LOID
TTY Object Redirect run-time output to central (or multiple) consoles Connect and disconnect dynamically Debug quickly and simply Monitor status, errors, easily Share console with others legion_tty
User View Web Interface
Logging In
Listing Contents Of A Context
Control Window
Status Window
StdOut Window
StdErr Window
Listing Classes (Contents of /class)
Listing Hosts (Contents of /hosts)
List Attributes Of An Object
Start A Run
Check The Status Of A Job
Start An Amber (BioGrid) Run
Check The Status Of An Amber Run
Graphically Check An Amber Run
Interact With Amber Run
Start A Hawley-Hydro Run
Check The Status Of A Hydro Run
Graphically Check A Hydro Run
Run RenderGrid Jobs (P-Space Jobs)
Check The Status of A RenderGrid Job
Check Accounting Logs
User View Windows Interface
Windows Browser
Context Space in Windows Ability to export local directories into Legion’s context space Easy-to-use interface Ability of users to control when shared directories are visible to other users
Access Control Ability of users to specify access control policies Fine-grained nature of policies Allow/Deny read access to users or groups Allow/Deny write access to users or groups Ease with which access rights can be changed Speed at which access rights are propagated through Legion space
Windows Legion FTP Daemon
Windows Job Sandbox
Windows Process Control
National Legion Net
Summary Philosophy –Grid as a Single Virtual Machine –Provide mechanisms; let others build policies Architecture –Object-based, integrated –Default policies for scheduling, security, … User Interfaces –Command-line, Web, Windows, FTP, HTTP, …
Future Directions Improved user interfaces More robust system Research activities - University of Virginia Commercial activities - Avaki Corporation Legion-G? Continued Continued support for nationwide grid, grid applications