Manchester Computing Supercomputing, Visualization & e-Science CS 602 — eScience and Grids John Brooke Donal Fellows

Slides:



Advertisements
Similar presentations
UNICORE – The Seamless GRID Solution Hans–Christian Hoppe A Member of the ExperTeam Group Pallas GmbH Hermülheimer Straße 10 D–50321 Brühl, Germany
Advertisements

Security Daniel Mallmann MWSG meeting Amsterdam December 2005.
High Performance Computing Course Notes Grid Computing.
Member of the ExperTeam Group Ralf Ratering Pallas GmbH Hermülheimer Straße Brühl, Germany
The UNICORE GRID Project Karl Solchenbach Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D Brühl, Germany.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Chapter 9 Chapter 9: Managing Groups, Folders, Files, and Object Security.
UNICORE Programming Client Plug-ins Grid Summer School, July28, 2004 Ralf Ratering Intel Parallel and Distributed Solutions Division (PDSD)
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
UNICORE Introduction to the Intel Client …and a look behind the scenes HLRS, May 5, 2004 Ralf Ratering Intel Parallel and Distributed Solutions Division.
UNICORE Introduction to the Intel Client and a look behind the scenes… Grid Summer School, July 28, 2004 Ralf Ratering Intel Parallel and Distributed Solutions.
Member of the ExperTeam Group Ralf Ratering Pallas GmbH Hermülheimer Straße Brühl, Germany
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
UNICORE UNiform Interface to COmputing REsources Olga Alexandrova, TITE 3 Daniela Grudinschi, TITE 3.
UNICORE Programming Client Plug-ins HLRS, May 5, 2004 Ralf Ratering Intel Parallel and Distributed Solutions Division (PDSD)
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
Chapter 9: Moving to Design
Member of the ExperTeam Group Ralf Ratering Pallas GmbH Hermülheimer Straße Brühl, Germany
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
Windows Server 2008 Chapter 11 Last Update
Understanding and Managing WebSphere V5
Course 6421A Module 7: Installing, Configuring, and Troubleshooting the Network Policy Server Role Service Presentation: 60 minutes Lab: 60 minutes Module.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
1 Guide to Novell NetWare 6.0 Network Administration Chapter 11.
Understanding the CORBA Model. What is CORBA?  The Common Object Request Broker Architecture (CORBA) allows distributed applications to interoperate.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
An Introduction to Software Architecture
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
DISTRIBUTED COMPUTING
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Four Windows Server 2008 Remote Desktop Services,
Computer Emergency Notification System (CENS)
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
UNICORE Plugins – How to Design Application Specific Interfaces Krzysztof Benedyczak Michał Wroński.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
EUROGRID – An Integrated User–Friendly Grid System Hans–Christian Hoppe, Karl Solchenbach A Member of the ExperTeam Group Pallas GmbH Hermülheimer Straße.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Research Infrastructures Information Day Brussels, March 25, 2003 Victor Alessandrini IDRIS - CNRS.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Resource Brokering on Complex Grids EUROGRID and GRIP Presented by John Brooke ESNW October 3/4 UK/Japan N+N.
REST By: Vishwanath Vineet.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
Advanced Computing Facility Introduction
Chapter 2: System Structures
University of Technology
An Introduction to Software Architecture
Presentation transcript:

Manchester Computing Supercomputing, Visualization & e-Science CS 602 — eScience and Grids John Brooke Donal Fellows

Manchester Computing Supercomputing, Visualization & e-Science Lecture 1: What is a Grid We examine how the Grid concept arose, what its relation is to other concepts such as e-Science and CyberInfrastructure. We examine a more precise definition for a Computational Grid. There are other types of Grid but this is the main focus of this module

CS602 3 e-Science “In the future, e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists.” Dr John Taylor, Director General of the Research Councils, OST

CS602 4 Cyber Infrastructure  Term coined by US Blue Ribbon panel - describes the emergence of an infrastructure linking high-performance computers, experimental facilities, data repositories.  Seems to be distinguished from term Grid, which is considered more to apply directly to computation and cluster style computing.  May or may not be the same thing as eScience.  eScience focuses on the way that science is done, cyber- infrastructure on how the infrastructure is provided to support this way of working.

CS602 5 Grids as Virtual Organizations Used in paper Anatomy of the Grid (Foster, Kesselman, Tuecke) “ … Grid concept is coordinated resource sharing in dynamic, multi- institutional virtual organizations …” There is an analogy with an electrical Power Grid where producers share resources to provide a unified service to consumers. A large unresolved question is how do Virtual Organizations federate across security boundaries (e.g. firewalls) and organisational boundaries (resource allocation). Grids may have hierarchical structures, e.g. the EU DataGrid, or may have more federated structures, e.g. EuroGrid

CS602 6 What can Grids be used for? User with laptop/PDA (web based portal) VR and/or AG nodes HPC resources Scalable MD, MC, mesoscale modelling “Instruments”: XMT devices, LUSI,… Visualization engines Steering ReG steering API Storage devices Grid infrastructure (Globus, Unicore,…) Moving the bottleneck out of the hardware and into the human mind… Performance control/monitoring

CS602 7 HypothesesDesign Integration Annotation / Knowledge Representation Information Sources Information Fusion Clinical Resources Individualised Medicine Data Mining Case-Base Reasoning Data Capture Clinical Image/Signal Genomic/Proteomic Analysis Knowledge Repositories Model & Analysis Libraries Grids for Knowledge/Information Flow

CS602 8 Parallel and Distributed Computing  Parallel computing is the synchronous coupling of computing resource, usually in a single machine architecture or single administrative domain, e.g. a cluster.  Distributed computing refers to a much looser use of resources, often across multiple administrative domains.  Grid computing is an attempt to provide a persistent and reliable infrastructure for distributed computing.  Users may wish to run workflows many times over a set of distributed resources, e.g. in bioinformatics applications.  Users may wish to couple heterogeneous resources for scientific collaboration, e.g. telescopes, computers, databases, video-conferencing facilities.

CS602 9 Re-usability and Components  We wish to develop sufficient reusable components to provide common facilities so that applications and services can interoperate.  We can do this by various approaches, in Globus a toolkit is developed, in Unicore all actions on the Grid are modelled by abstractions encapsulated in an inheritance hierarchy.  As part of this course you should start to identify the strengths and weaknesses of these two approaches.  More radical approaches are to impose a meta-operating system to present the resources as a virtual computer. This was tried by the Legion project and the idea partially survives in the concept of a DataGrid.

CS Toolkits for Grid Functions  Software development toolkits  Standard protocols, services & APIs  A modular “bag of technologies”  Enable incremental development of grid-enabled tools and applications  Reference implementations  Learn through deployment and applications  Open source Diverse global services Core services Local OS A p p l i c a t i o n s

CS602 Layered Architecture Applications / Problem Solving Environments Grid Services HBMGASS Grid Fabric LSF MPI NQE Application Toolkits GlobusView Solaris GSI-FTPMDS Grid Resources Linux PBS GSIGRAM DUROCMPICH-Gglobusrun ManchesterImperial College EPCCOxford QMLoughborough Manchester QM-LUSI/XMT UNICOSIRIXTru64 SRB LUSI PortalComponent RepositoryVisualization & SteeringComputational PSE Component FrameworkVIPAR

CS Core Functions for Grids Acknowledgements to Bill Johnston of LBL

CS  The GGF Document “Core Functions for Production Grids” is attempting to define Grids by the minimal set of functions that a Grid must implement to be “usable”  This is a higher level approach that does not attempt to specify how the functions are implemented, or what base technology is used to implement them  In the original Globus Toolkit functions were implemented in C and could be called via APIs, scripts or used on the command line  In Unicore functions were abstracted as a hierarchy of Java classes, then mapped to Perl scripts at a lower level, the “Incarnation process”.  In the Open Grid Services Architecture there is a move to a Web services based approach, the hosting environment assumes prominence. A Set of Core Functions for Grids

CS Converging Technologies Agents Grid Computing Web Service & Semantic Web Technologies

CS Web Services  Early Grids were built on the technologies used for accessing supercomputers, e.g. ssh, shell scripts, ftp. Information services were built on directory services such as LDAP, Lightweight Directory Access Protocol.  However in the commercial sphere Web Services are becoming dominant based on SOAP, Simple Object Access Protocol, WSDL, Web Services Description Language and UDDI.  Early Grid systems such as Unicore and Globus are trying to refactor their functionality in terms of Web Services.  The key Grid concept not captured in Web services, is State, e.g what is the state of a job queue, the load on a resource, etc..

CS Other Types of Grid  The word Grid is very loosely used.  Some aspects of collaborative video-conferencing and advanced visualization are termed Grid.  These are currently trying to use technology developed for running computations, the results are not always usable.  This is just one indication that we must conceptualise what abstractions we need to capture in Grid software.  We also need to develop abstractions for both high and low level protocols, for security models, for user access policies.  The Unicore system we present has captured the key semantics and abstractions of a Computational Grid.

CS Access Grid  Manchester official UK Constellation site Solar Terrestrial Physics Workshop Teleradiology, Denver

Manchester Computing Supercomputing, Visualization & e-Science Lecture 2: Computational Resource If the Grid concept is to move from a vague analogy to a workable scientific concept, the terms need to be more carefully defined. Here we describe one approach to defining one key abstraction, namely computational resource.

CS Terminology  We identify a problem: terms in distributed computing are used loosely and are thus not amenable to analysis.  We identify a possible programme: to seek for invariants which are conserved or are subject to identifiable constraints.  We now try to trace an analysis of the concept of “Computational Resource” since distributed computing networks are increasingly referred to as Grids.  An electricity grid distributes electrical power, a water grid distributes water, and information grid distributes information.  What does a computational grid distribute?

CS The Analogy with a Power Grid The power grid delivers electrical power in the form of a wave (A/C wave) The form of the wave can change over the Grid but there is a universal (scalar) measure of power, Power = voltage x current. This universal measure facilitates the underlying economy of the power grid. Since it is indifferent to the way the power is produced (gas, coal, hydro etc…) different production centres can all switch into the same Grid. To define the abstractions necessary for a Computational Grid we must understand what we mean by computational resource.

CS Information Grids Information can be quantified as bits with sending and receiving protocols. Bandwidth x time gives measure of information flow. Allows Telcos to charge. Internet protocols allow discovery of static resource (e.g. WWW pages). Information “providers” do not derive income directly according to volume of information supplied. Use other means (e.g. advertising, grants) to sustain resources needed. Current Web is static, do not need to consider dynamic state, hence extensions needed for Open Grid Services Architecture.

CS What is Computational Power? Is there an equivalent of voltage x current? Megaflops? Power is a rate of delivery of energy, so should we take Mflops/second. However this is application dependent. Consider two different computations Time factors not important. 2.Distributed collaborative working on a CFD problem with computation and visualization of results in multiple locations. Time and synchronicity are important! 3.But both may use exactly the same number of Mflops.

CS Invariants in Distributed Computation  To draw an analogy with the current situation we refer to the status of physics in the 17th and 18th centuries.  It was not clear what the invariant quantities were that persisted through changes in physical phenomena.  Gradually quantities such as momentum, energy, electric charge were isolated and their invariance expressed in the form of Conservation Laws.  Without Conservation Laws, a precise science of physics is inconceivable.  We have extended our scope to important inequalities, e.g. Second Law of Thermodynamics, Bell’s inequality.  We must have constraints and invariants or analysis or modeling are impossible.

CS An Abstract Space for Job-Costing  Define a job as a vector of computational resources  (r1,r2,…,rn)  A Grid resource advertises a cost function for each resource  (c1,c2,…,cn)  Cost function takes vector argument to produce job cost  (r1*c1 + r2*c2 + … + rn*cn)

CS A Dual Job-Space Thus we have a space of “requests” defined as a vector space of the computational needs of users over a Grid. For many jobs most of the entries in the vector will be null. We have another space of “services” who can produce “cost vectors” for costing for the user jobs (providing they can accommodate them). This is an example of a dual vector space. A strictly defined dual space is probably too rigid but can provide a basis for simulations. The abstract job requirements will need to be agreed. It may be a task for a broker to translate a job specification to a “user job” for a given Grid node. A Mini-Grid can help to investigate a given Dual Job-Space with vectors of known length.

CS Dual Space Cost vector Job vector CostCost User Job Scalar cost in tokens 1 2

CS Computational Resource Computational jobs ask questions about the internal structure of the provider of computational power in a manner that an electrically powered device does not. For example, do we require specific compilers, libraries, disk resource, visualization servers? What if it goes wrong, do we get support? If we transfer data and methods of analysis over the Internet is it secure? A resource broker for high performance computation is a different order of complexity to a broker for an electricity supplier.

CS Emergent Behaviour Given this complexity, self-sustaining global Grids are likely to emerge rather than be planned. Planned Grids can be important for specific tasks, the EU DataGrid project is an example. They are not required to be self-sustaining and questions of accounting and resource transfer are not of central interest. We consider the EUROGRID multi-level structure as an emergent phenomenon that could have some pointers to the development of large scale, complex, self-sustaining computational Grids. The Unicore Usite and Vsite structure is an elegant means of encapsulating such structure.

CS Fractal Structure and Complexity Grids are envisaged as having internal structure and also external links. Via the external links (WANS, intercontinental networks) Grids can be federated. Action of joining Grids raises interesting research questions:  1. How do we conceptualise the joining of two Grids?  2. Is there a minimum set of services that defines a Grid.  3. Are there environments for distributed services and computing that are not Grids (e.g. a cluster)  We focus on the emergent properties of virtual organisations in considering whether they are Virtual Organizations.

CS Resource Requestor and Provider Spaces  Resource requestor space (RR), in terms of what the user wants: e.g. Relocatable Weather Model, 10^6 points, 24 hours, full topography.  Resource Provider space (RP), 128 processors, Origin 3000 architecture, 40 Gigabytes Memory, 1000 Gigabytes disk space, 100 Mb/s connection.  We may even forward on requests from one resource provider to another, recasting of O3000 job in terms of IA64 cluster, gives different resource set.  Linkage and staging of different stages of workflow require environmental support, a hosting environment.

CS RR space RP space RR space request Request referral sync Figure 1: Request from RR space at A mapped into resource providers at B and C, with C forwarding a request formulated in RR space to RP space at D. B and C synchronize at end of workflow before results returned to the initiator A. A B C D RR and RP Spaces

CS Resume  We have shown how some concepts from abstract vector spaces may be able to provide a definition of Computational Resource.  We do not know as yet what conservation laws or constraints could apply to such an abstraction and whether these would be useful in analysing distributed computing.  We believe that we can show convincingly that simple scalar measures such as Megaflops are inadequate to the task.  This invalidates the “league table” concept such as the Top 500 computers. Compuational resource will be increasingly judged by its utility within a given infrastructure.

CS The Resource Universe What is the “Universe” of resources for which we should broker? One might use a search engine but then there is no agreed resource description language nor would users be able to run on most of the resources selected. Globus uses a hierarchical directory structure, MDS based on LDAP. Essentially this is a “join the Grid model”, based on the VO concept. By making Vsites capable of brokering we can potentially access the whole universe of Vsites. Concept of a Shadow Resource DAG makes the resource search structurally similar to its implementation, maintains AJO abstraction.

CS Towards a Global Grid Economy? Much access to HPC resources is via national grants or the resources are private (governmental, commercial). Many problems with sharing resources, what incentives? Grid resources can be owned by international projects but resources are allocated by national bodies. This is like collaboration in large scale facilities, e.g. CERN. Europe has to go down the shared resource route, US doesn’t. Will this produce separate types of Grid economy? The problems of accounting and resource trading are rarely touched on. Mini-Grids can help explore technical issues outside of political ones.

CS Summary  The three different views of a distributed infrastructure relate to the way it is used.  We need to abstract usage patterns and see if we can link them to invariants that can be quantified.  We have investigated in depth the concept of “Computational Resource”.  This ties into all three definitions 1.eScience collaborations use resources 2.Cyber-infrastructures connect resources 3.Grids distribute resources

CS Human Factors  A prediction arises from this: that the abstracted idea of human collaboration will be essential to success in this field.  In an electricity Grid the human participants are completely anonymised and only influence via mass action e.g. a power surge.  Patterns of usage in eScience will be much more complex and dynamic.  It will belong to the post-Ford model of industrial production, this time the product will be knowledge.  Our search to abstractions to encapsulate this will be far more challenging and exciting.

Manchester Computing Supercomputing, Visualization & e-Science Lecture 3: Introduction to Unicore Unicore is the Grid middleware system you will study in depth. It is a complete system based on a three tier architecture. We have chosen it as an illustration because of its compact and complete nature and because it is very well-engineered for a Computational Grid. Thanks to Michael Parkin who created the slides in this lecture

CS UNICORE Grid  UNiform Interface to COmputing REsources  European Grid infrastructure to give secure and seamless access to High Performance Computing (HPC) resources Secure:Strong authentication of users based on X509 certificates. Communication using SSL connections over a TCP/IP/Internet connection - Defined in the UNICORE Protocol Layer (UPL) specification. Seamless: Uniform interface and consistent access to computing resources regardless of the underlying hardware, systems software, etc.- Achieved using Abstract Job Objects (AJO).  HPC resources based in centres in Switzerland, Germany, Poland, France, and United Kingdom integrated into a single grid

CS UNICORE Grid Architecture Client: Interface to the user. Prepares and submits the job over the unsecured network to… Gateway:The entry point to the computing centre and secured network. Authenticates the user and passes job to… Server:Schedules the job for execution, translates the job to commands appropriate for the target system. The UNICORE architecture is based on three layers:

CS UNICORE Terminology  USiteA site providing UNICORE Services (e.g. CSAR).  VSiteA computing resource within the USite.  USpaceDedicated file space on VSite. May only exist during the execution of a job.  XSpacePermanent storage on the VSite. (e.g. users home directory).

CS UNICORE Security  Between user and computing centre communications over SSL Users X.509 certificate stored in the client. Certificate encrypts data using Secure Sockets Layer (SSL) technology - Industry standard method for protecting web communications bit encryption strength. Defined in the UNICORE Protocol Layer (UPL) standard. –Prevents eavesdropping on and tampering with communications and data. –Provides instant authentication of visitor's identity instead of requiring individual usernames and passwords.  Within the computing centre communications are within secure network Local site policy can specify encrypted communication if necessary.

CS UNICORE Protocol Layer (UPL)  Is a set of rules by which data is exchanged between computers.  Request/reply structure.

CS The Abstract Job Object (AJO)  Collection of approximately 250 Java classes representing actions, tasks, dependencies and resources v4.0 can be downloaded from  Specify work to be done at a remote site seamlessly No knowledge of underlying execution mechanism required. Example classes: ExecuteScriptTask ListDirectory CompileTask Dependency Processor, Storage  Signed, serialised Java object transmitted from the Client to gateway using the UPL

CS Simplified AJO Class Diagram (1) ExecuteScriptTask ChangePermissions, CopyFile, CreateDirectory, DeleteFile, FileCheck, ListDirectory, RenameFile, SymbolicLink AbstractJobAbstractActionDependencyActionGroupAbstractTaskFileTaskFileTransferExecuteTaskFileActionUserTaskResource CopySpooled, DeclarePortfolio, DeleteSpooled, IncarnateFiles, MakePortfolio, Spool, UnSpool CopyPortfolioTask, ExportTask, GetPortfolio, ImportTask, PutPortfolio {ordered} Memory, Node, PerformanceResource, Processor, RunTime, Storage CapacityResource Diagram shows how an Abstract Job object can be constructed from Tasks and groups of tasks. Resources can be allocated to each task..

CS Simplified AJO Class Diagram (2) OutcomeAbstractTask_OutcomeActionGroup_Outcome ChangePermissions_Outcome CopyFile_Outcome CopyPortfolio_Outcome CopyPortfolioToOutcome_Outcome CopySpooled_Outcome CreateDirectory_Outcome DeclarePortfolio_Outcome DeleteFile_Outcome DeletePortfolio_Outcome DeleteSpooled_Outcome ExecuteTask_Outcome ExportTask_Outcome FileCheck_Outcome GetPortfolio_Outcome ImportTask_Outcome IncarnateFiles_Outcome ListDirectory_Outcome MakeFifo_Outcome, MakePortfolio_Outcome, MoveFifoToOutcome_Outcome PutPortfolio_Outcome, RenameFile_Outcome Spool_Outcome, SymbolicLink_Outcome UnSpool_Outcome AbstractJob_Outcome

CS AJO Example 1: ListDirectory add() addResource() :storage:listDirectory:abstractJob Directory set using setTarget(string target) method. AbstractJob consigned to gateway

CS AJO Example 2: ImportTask 5. add() addResource() :copyPortfolioToOutcome:dependency:abstractJob  Used to download files on a specified VSite to the Client.  Import task imports a file from the Storage area to the jobs USpace. (Portfolio represents a collection of files in the USpace). AbstractJob consigned to gateway. :importTask:storage 3. add() 4. add() 1. add()2. add() File name set using addFile(string target) method Dependency ensures that file(s) are in the USpace before copied to outcome

CS Dependency (1) d1: dependency AJO Example 3: ExecuteScriptTask :executeScriptTask:abstractJob:makePortfolio:incarnateFiles:actionGroup:scriptTyped2 :dependency:resourceSetName (String)Script (byte[ ][ ])Files (String[ ]) IncarnateFiles MakePortfolio ResourceSet + this diagram to be completed… setScriptType() setResource() AbstractJob consigned to gateway Script arguments set using setCommandLine(string args) method. add() Dependencies ensure that files arrive before task is executed add()

Manchester Computing Supercomputing, Visualization & e-Science Lectures 4-5: Unicore Client We now present a client side view of the Computational Grid. This will allow you to begin the practical exercises before engaging with the full complexity of the server side components and complete Grid architecture of Unicore. We thank Ralf Ratering of Intel for permission to use this material.

CS UNICORE  A production-ready Grid system that connects Supercomputers and Clusters to a Computing Grid.  Originally developed in German research projects UNICORE ( ) and UNICORE Plus ( ) –Client implemented by Pallas (now Intel PDSD) –Server implemented by Fujitsu as sub-contractor of Pallas  Further enhanced in European research projects –Eurogrid ( ), Grip( ), OpenMolGrid ( ), NextGrid ( ), SimDat ( ), others  Used as middleware for NaReGI

CS The UNICORE Client  Graphical Interface to UNICORE Grids  Platform-independent Java application  Open Source available from UNICORE Forum  Functionality: –Job Preparation, Monitoring and Control –Complex Workflows –File Management –Certificate Handling –Integrated Application Support

CS UNICORE Server Components UUDB IDB Client NJS TSI Gateway AJO Incarnation Database Translates AJO to platform specific incarnation Contains resource descriptions Target System Interface Only component that must live on target system Perl or Java implementations Executes jobs or submit jobs to batch sub system Network Job Supervisor Main server component Manages jobs Performs Authorization UNICORE User Database Maps certificates onto logins Abstract Job Object Platform independent description of tasks, dependencies and resources Performs Authentication Runs at DMZ

CS Today History of UNICORE Client Versions Early Prototypes developed in UNICORE project First stable version 3.0 Final version in UNICORE Plus: 4.1 Build 5 UNICORE 5 Open Source available at Pallas UNICOREpro version 1

CS Starting the Client  Prerequisites: Java ≥ –If not available, choose bundled download package UNICORE Configuration directory in your HOME directory Get test certificates from Test Grid CA service

CS Ready to go? „Hello Grid World!“ UNICORE Site == Gateway Typically represents a computing center Virtual Site == Network Job Supervisor Typically represents target system DEM O 1. Execute a simple script on the Test Grid 2. Get back standard output and standard error

CS Gateway Behind the Scenes: Authentication Establish SSL Connection Send User Certificate Send Gateway Certificate Trust User Certificate Issuer? Trust Gateway Certificate Issuer? Gateway Certificate Client User Certificate

CS Behind the Scenes: Authorization IDB TSI UUDB Certificate 2 Certificate 3 Certificate 4 Certificate 5 Certificate 1 Login B Login C Login D Login E Login A Typical UNICORE User Test Grid User User Certificate User Login AJO Certificate== SSL Certificate? Client NJS Gateway User Certificate AJO

CS Behind the Scenes: Creation & Submission Script Container Abstract Job Object ExecuteScriptTask IncarnateFiles CLIENT SERVER Script_HelloWorld Create file with script contents 2.Execute as script Job Directory (USpace) A temporary directory at the target system where the job will be executed

CS Monitoring the Job Status Successful: job has finished succesfully Not successful: job has finished, but a task failed Executing: Parts of a job are running or queued Running: Task is running Queued: Task is queued at a batch sub system Pending: Task is waiting for a predecessor to finish Killed: Task has been killed manually Held: Task has been held manually Ready: Task is ready to be processed by NJS Never run: Task was never executed

CS The Primes Example public void breakKey() { try { BufferedReader br = new BufferedReader(new FileReader("primes.txt")); while (true) { inputLine = br.readLine(); st = new StringTokenizer(inputLine," "); val = new BigInteger(st.nextToken()); if ( (N.mod(val).compareTo(BigInteger.ZERO)) == 0) { p = val; q = N.divide(val); return; } } catch (NullPointerException e) { System.out.println("Done!"); } catch (IOException e) { System.err.println("IO Error:" + e); } p = BigInteger.ZERO; q = BigInteger.ZERO; } ArrBreakKey.java Primes.txt

CS CLIENT SERVER „Gridify“ the Primes Example ArrBreakKey.java Job Directory (USpace) ArrBreakKey.java 1. Import java file ArrBreakKey.class 2. Compile java file 3. Execute class file 4. Get result in stdout/stderr DEM O

CS CLIENT SERVER Behind the Scenes: Software Resources Command Task Executes a Software Resource, or Command (a binary that will be imported into the Job Directory) APPLICATION javac 1.4 Description „Java Compiler“ INVOCATION [ /usr/local/java/bin/javac ] END Incarnation Database (IDB) Application Resources contain system specific information, absolute paths, libraries, environment variables, etc.

CS CLIENT SERVER Behind the Scenes: Fetching Outcome Job Directory (USpace) ArrBreakKey.java Files Directory ArrBreakKey.class 2. Compile java file stdout, stderr 3. Execute class file stdout, stderr Fetch Outcome Session Directory Configurable in User Defaults: Paths->Scratch Directory stdout, stderr

CS Integrated Application Example: POV-Ray Scene Description #include "colors.inc" #include "shapes.inc" camera { location direction z } plane {y, 0.0 texture {pigment {RichBlue }}} object { WineGlass translate -x*12.15} light_source { colour White }... POV-Ray Application CLIENT SERVER Command Line Parameters Display Demo Image from Pov-Ray Distribution Job Directory (USpace) Include Files Libraries Remote File System (XSpace) Input Files Output Image

CS Behind the Scenes: Plug-In Concept  Add your own functionality to the Client! –Heavily used in research projects all over the world –More than 20 plug-ins already exist  No changes to basic Client Software needed  Plug-Ins are written in Java  Distribution as signed Jar Archives

CS Existing Plug-Ins (incomplete)  CPMD, Car-Parinello Molecular Dynamics (FZ Jülich)  Gaussian (ICM Warsaw)  Amber (ICM Warsaw)  Visualizer (ICM Warsaw)  SQL Database Access (ICM Warsaw)  PDB Search (ICM Warsaw)  Nastran (University of Karlsruhe)  Fluent (University of Karlsruhe)  Star-CD (University of Karlsruhe)  Dyna 3D (T-Systems Germany)  Local Weather Model (DWD)  POV-Ray (Pallas GmbH) ...  Resource Broker (University of Manchester)  Interactive Access (Parallab Norway)  Billing (T-Systems Germany)  Application Coupling (IDRIS France)  Plugin Installer (ICM Warsaw)  Auto Update (Pallas GmbH) ...

CS Using 3rd Party Plug-Ins  Get Plug-in Jar archive from Web-Site, , CD-ROM, etc.  Store it in Client‘s Plug-In directory  Client will check Plug-In Signature Import Plug-In certificates from the Actions menu in the Keystore Editor Is one certificate in the chain a trusted entry in the keystore? Is the signing certificate a trusted entry in the keystore? REJECT yesno Add signing certificate to keystore? LOAD noyes REJECTLOAD yesno

CS Task Plugins  Add a new type of task to the Client GUI  New task can be integrated into complex jobs  Application support: CPMD, Fluent, Gaussian, etc. Add task item Settings item Icon Plugin info

CS Supporting an Application at a Site  Install the application itself  Add entry to the Incarnation Database (IDB) APPLICATION Boltzmann 1.0 Description „Boltzmann Simulation“ INVOCATION [ /usr/local/boltzmann/bin/linuxExec.bin ] END

CS Plug-In Example: CPMD  Workflow for Car–Parrinello molecular dynamics code Input: conf_file2 RESTART Input: conf_file1 re-iterate Wavefunction Optimization Geometry Optimization further optimization ? MD Run Output: stdout stderr RESTART.1, LATEST,... Other... Visualization ? further evaluation

CS Plug-In Example: CPMD  CPMD plugin constructs UNICORE workflow

CS Plug-In Example: CPMD  CPMD wizard assists in setting up the input parameters

CS Plug-In Example: CPMD  Visualize results

CS Extension Plugins  Add any other functionality  Resource Broker, Interactive Access, etc. JPA toolbar Settings item Extensions menu Virtual site toolbar Plugin info

CS Plug-In Example: Resource Broker  Specify resource requests in your job  Submit it to a broker site  Get back offers from broker

CS CLIENT Example: Steering a Simulation SERVER Lattice-Boltzmann Simulation Code input file reads Editor DEM O output.gif writes Export Panel sample.gif writes Sample Panel control file reads Control Panel Job Directory Plugin Task

CS Specifying Resource Requests  Tasks can have resource sets containing requests  If not resource set is attached, default resources are used  Resource sets can be edited, loaded and saved  If a resource request does not match resources available at a site, the Client displays an error Resource Set 1 Resource Set 2

CS Behind the Scenes: Authorization UUDB Client NJS Gateway User Certificate User Login User Certificate AJO User Certificate Sub- AJO Site A UUDB NJS Gateway User Certificate Sub- AJO Site B User Certificate Sub- AJO SSL Certificate == Trusted NJS?

CS Using File Tasks CLIENT SERVER 1SERVER 2 Home Temp Spool Root Local USpace Home Temp Root USpace Storage Server

CS Complex Workflow: Control Tasks Do N LoopDo Repeat LoopHold TaskIf Then Else

CS  UNICORE jobs stop execution when a task fails  Sometimes Task failure is acceptable –If and DoRepeat conditions –Tasks that try to use restart files –Whenever you do not care about task success  Set „Ignore Failure“ flag on Task Behind the Scenes: Ignore Failure Right Mouse Click in Dependency Editor

CS Loops: Accessing the Iteration Counter  Iteration variable: $UC_ITERATION_COUNTS  Lives on server side  Supported in –Script Tasks –File Tasks –Re-direction of stdout/stderr  Nested loops: iteration numbers are separated by „ _ “, e.g. „ 2_3 “  Caution: counter will not be propagated to sub jobs

CS Job Monitor Actions Get new status for a site, job or task Get stdout, stderr and exported files of a job Remove job from server. Deletes local and remote temporary directories Kill job Hold job execution Resume a job that was held by a „Hold Job“ action or a Hold task Copy a job from the job monitor. The job can be pasted into the job preparation tree and re-run e.g. with different parameters Show dependencies of job Show resources for task

CS Caching Resource Information  Client works on cached resource information –UNICORE Sites, Virtual Sites, available resources  Resource Cache will be updated on... –... startup –... refresh on „Job Monitoring“ tree node  Client uses cached information in Offline mode

CS Accessing other UNICORE Sites UNICORE Sites will be read from an XML file Can be a URL on the web Virtual Sites are configured at the UNICORE Site Job Monitor Root Performing a „Refresh“ on this node will reload UNICORE Sites

CS Configuration: Using Different Identities Using different identities Key entries: Who am I?

CS Browsing Remote File Systems  Remote File Chooser –Used in Script Task, Command Task, for File Imports, Exports, etc. Select virtual site or „Local“ Preemptive file chooser mode will enhance performance on fast file systems

CS The Client Log  „clientlog.txt“ or „clientlog.xml“  Used by developers to figure out problems User Defaults->Paths: User Defaults->Logging Settings: Enable under Windows, when no console is used Use PLAIN INFO should be fine

CS Starting the Client Revisited  client.jar in lib directory –start with.exe (Windows) or run script (Unix/Linux) –or: „ java –jar client.jar “  Command line options –Choose an alternative configuration directory: -Dcom.pallas.unicore.configpath= –Enable the security manager: -Dcom.pallas.unicore.security.manager

CS Outlook: OGSA Grid Services Client UUDBIDBNJS TSI UPL Grid Service UPL GS Factory Registry UPL GS Factory Handles Register XML File Contains Registry handles in addition to classical UNICORE Site addresses HTTPS Request AJO Passes through firewalls Grid Services invisible to user UPL GS Factory Start UPL GS Handle

CS Summary  With the UNICORE Client you can easily run and monitor complex jobs on a UNICORE Grid  Download the Client from or and have fun...

Manchester Computing Supercomputing, Visualization & e-Science Lectures 6-7: Programming Unicore Client Plug-Ins We now show how the Unicore client can be extended by programming application-specific plugins. This extends standard Java technology to a Grid context and brings flexibility and generality to the Unicore client. We thank Ralf Ratering of Intel for permission to reproduce this material

CS Overview  Introduction –Existing Plug-Ins  AJO Plugin –An Extension Plugin submitting „raw“ Abstract Job Objects that do appear in the Job Monitor  Small Service Plugin –An Extension Plugin using containers for service jobs that do not appear in the Job Monitor  Boltzmann Plugin –A Task Plugin that integrates the Boltzmann Lattice simulation into the Client GUI

CS  Job Preparation –File, execution and control tasks –Complex workflows –Editing, copying, –saving, etc.  Resource Handling  Job Monitoring  Job Control  Remote File Browsing  Certificate Handling Functionality of the UNICOREpro Client

CS Plug-In Concept  Add your own functionality to the Client! –Heavily used in research projects all over the world –More than 20 plug-ins already exist  No changes to basic Client Software needed  Plug-Ins are written in Java  Distribution as signed Jar Archives

CS Deployment and Installation  User gets plugin jar archive from Web-Site, , CD- ROM, etc.  Store it in Client‘s plugin path 1.Lib directory 2.User Defaults Plugin directory  Client checks plugin jar signature Is one certificate in the chain a trusted entry in the keystore? Is the signing certificate a trusted entry in the keystore? REJECT yesno Add signing certificate to keystore? LOAD noyes REJECTLOAD yesno

CS Task Plugins  Add a new type of task to the Client GUI  New task can be integrated into complex jobs  Application support: CPMD, Fluent, Gaussian, etc. Add task item Settings item Icon Plugin info

CS Extension Plugins  Add any other functionality  Resource Broker, Interactive Access, etc. JPA toolbar Settings item Extensions menu Virtual site toolbar Plugin info

CS Supporting an Application at a Site  Install the application itself  Add entry to the IDB APPLICATION Boltzmann 1.0 Description „Boltzmann Simulation“ INVOCATION [ /usr/local/boltzmann/bin/linuxExec.bin ] END

CS Example Use: CPMD  Workflow for Car–Parrinello molecular dynamics code Input: conf_file2 RESTART Input: conf_file1 re-iterate Wavefunction Optimization Geometry Optimization further optimization ? MD Run Output: stdout stderr RESTART.1, LATEST,... Other... Visualization ? further evaluation

CS Example Use: CPMD  CPMD plugin constructs UNICORE workflow

CS Example Use: CPMD  CPMD wizard assists in setting up the input parameters

CS Example Use: CPMD  Visualize results

CS Example Use: On Demand Weather Prediction  On demand mesoscale weather prediction system  Based on relocatable version of DWD’s prediction model  Works from regular prediction data, topography and soil database

CS Example Use: On Demand Weather Prediction User Workstation Topography & soil data Regular prediction data GME2LM interpolation to LM grid LM calculation of mesoscale prediction 1–5 MByte 50–100 MByte LM-forecast data visualisation ~50 MByte input datasets for LM (1–20 GByte)

CS Example Use: Coupled CAE Applications  Run coupled aerospace simulations (electromagnetism)  Use CORBA as coupling substrate  Provide internal portal for Airbus engineers

CS Example Use: Resource Broker  Specify resource requests in your job  Submit it to a broker site  Get back offers from broker

CS Existing Application Plug-Ins  FZ Jülich –CPMD, OpenMolGrid  ICM Warsaw –Gaussian, Amber, SQL Database Access  University of Karlsruhe –Nastran, Fluent, Star-CD  T-Systems –Dyna 3D  DWD –Local Weather Model  Pallas GmbH –POV-Ray, Script, Command, Compile, Globus Proxy Certificate

CS Existing Extension Plug-Ins  University of Manchester –Resource Broker  Parallab Norway –Interactive Access  T-Systems Germany –Billing  IDRIS France –Application Coupling  ICM Warsaw –Plugin Installer  Pallas GmbH –Auto Update, AJO Submitter, Small Service Plugin

CS AJO Plugin  Idea: Easy way to develop your own AJOs  Use Client infrastructure –Certificates –Usites, Vsites and Resources –User interface  Use JMC to control AJO –Watch status –Fetch and display Outcome –Send Control Actions

CS Example: Execute an Application Resource  Select an Application Resource and execute it at virtual site  Submit AJO containing UserTask  Use Job Monitor to get back output  Implement 2 classes –Main Plugin Class –AJO Request Class  Build a Jar Archive named „*Plugin.jar“  Sign the Jar with your Certificate

CS Using Application Resources Incarnation Data Base APPLICATION AJOTest 1.0 Description „Demo Resource for AJO Plugin“ INVOCATION [ echo „Hello World!“ ] END CLIENT SERVER Network Job Supervisor (NJS) Resource Set Memory (64, 128, 32000)... APPLICATION AJOTest 1.0 APPLICATION CPMD Context MPI... Resource Manager Plugin AJOTest resource available? Add to AJO UserTask Display message Submit as Request

CS Client Requests  GetFilesFromUSpace  SendFilesToUspace  GetFilesFromXSpace  SendFilesToXSpace  GetByteArrayFromXSpace  SendByteArrayToXSpace  GetListings  GetUsites  GetVsites  GetResources  GetRunningJobs  GetJobStatus  GetOutcome  GetSpooledFiles ... Client Observer Request Observable Start as new thread Notify when finished

CS Class AJORequest public class AJORequest extends ObservableRequestThread {... public void run() { UserTask userTask = new UserTask("UserTask"); userTask.addResource(software); User user = ResourceManager.getUser(vsite); AbstractJob job = new AbstractJob("AJORequest_„ + ResourceManager.getNextObjectIdentifier()); job.setVsite(vsite); job.setEndorser(user); job.add(userTask); Reply reply=null; try { reply = polling(job, vsite, user); } catch (Exception e) { logger.log(Level.SEVERE, "Submitting AJO in polling mode failed.", e); } notifyObservers(this, reply); } public abstract class ObservableRequestThread extends ObservableThread { public void setInterrupted(boolean interrupted) { public Reply nonPolling(AbstractJob job, Vsite vsite, User user, Vector streamedFiles); public Reply polling(AbstractJob job, Vsite vsite, User user, Vector streamedFiles); } public abstract class ObservableThread extends Thread implements IObservable { public void addObserver(IObserver anObserver); public void deleteAllObservers(); public void deleteObserver(IObserver anObserver); public void notifyObservers(Object theObserved, Object changeCode); }

CS Class AJOPlugin public abstract class UnicorePlugable { public HelpSet getHelpSet() { public abstract String getPluginInfo(); public JMenuItem getSettingsItem() { public abstract void startPlugin(); public abstract void stopPlugin(); protected Client getClient(); } public abstract class ExtensionPlugable extends UnicorePlugable { public JMenuItem getCustomMenu(); public Component getJPAToolBarComponent(); public Component getVsiteToolBarComponent(); public Object setupSpecialVsiteFeatures( Vsite vsite, AbstractJob job); } public class AJOPlugin extends ExtensionPlugable implements IObserver { public String getPluginInfo() { return „AJO plugin example“; } public Component getVsiteToolBarComponent() { return startButton; } public void startPlugin() { startButton = new JButton(new ServiceAction()); } public void stopPlugin() { /* empty */ } private void submitServiceJob(SoftwareResource software, Vsite vsite) { AJORequest request = new AJORequest(software, vsite); request.addObserver(this); request.start(); } public void observableUpdate(Object theObserved, Object changeCode) { Reply reply = (Reply)changeCode;... } private class ServiceAction {... } }

CS Small Service Plugin  Idea: Do complete handling of jobs from plugin –Build, submit and monitor AJO –Fetch back outcome and exported files  Use Client Containers to construct AJO

CS AJOs and Containers  Client containers encapsulate complex AJOs  Manage imports, exports and execution  Hold parameters, keep status, check errors Execute Group Import Group Export Group

CS Container Hierarchy Add your own container

CS Implementing the Container

CS Small Service Plugin Job Directory serviceOutput.txt SmallServiceContainer CLIENT SERVER Execute writes SmallService AJOGetJobStatus Repeat until Status==DONE GetOutcome Spool Area serviceOutput.txt GetSpooledFiles DeleteJob

CS Class SmallServiceContainer public class SmallServiceContainer extends UserContainer {... public void buildActionGroup() { String unicoreDir = ResourceManager.getUserDefaults().getUnicoreDir(); String userHome = ResourceManager.getUserDefaults().getUserHome(); String filename = userHome + File.separator + "serviceOutput.txt"; FileExport[] exports = { new FileExport(this, FileStorage.NSPACE_STRING, "serviceOutput.txt", filename, true, true)}; setFileExports(exports); super.buildActionGroup(); }

CS Class SmallServicePlugin public class SmallServicePlugin extends ExtensionPlugable implements IObserver { public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof GetJobStatus) {... if (status == AbstractActionStatus.DONE) { sendGetOutcome(); } } else if (theObserved instanceof GetOutcome) { sendGetSpooledFiles(); } else if (theObserved instanceof GetSpooledFiles) { sendDeleteJobs(); } else if (theObserved instanceof DeleteJob) {} } public void startPlugin() { job = new JobContainer(); task = new SmallServiceContainer(job); job.addTask(task); startButton = new JButton(new ServiceAction()); } private void submitServiceJob(Vsite vsite) { job.setName( ResourceManager.getServicePrefix() + "SmallServiceJob" + ResourceManager.getNextObjectIdentifier()); job.setVsite(vsite); job.setUser(ResourceManager.getUser(vsite)); job.run(); } }

CS { folder="."; initcond="spinodal"; steerfile="control"; gifanimfile="output.gif"; unicore_demo = 1; writecolour=1; writecolgif=1; makedir = "yes"; g_cc=2.0 ; tau_r = 1.0 ;tau_b = 1.0; rho = 1.0; tmax=5000 ; dt = 10 ; gravity=0.0; nx=128 ; ny=128; } The Lattice Boltzmann Application  Simulation of fluent mixing  Output: a gif animation  Intermediate sample files are generated  Control file can change parameters while application is executing Duration „Mixing Factor“

CS Command Task CLIENT Job Directory Running Boltzmann using a Command Task Input BoltzmannInput.txt SERVER Import with renaming C:\tmp\output.gif Export output.gif readswrites Boltzmann Application Resource Execute Set tmax to 300

CS Disadvantages of Command Task  Input file has to be edited outside Client  Imports and Exports have to be specified manually  No integrated GUI for parameters  Results have to be visualized outside client  No additional functionality possible –sample files –application steering Use a specialized Boltzmann Plugin Task!

CS The Boltzmann Plugin  Task Plugin –Add Boltzmann tasks to jobs –Input file editor –Automatically import input file –Export and visualize sample files –Send control files  Implemented Classes –Main plugin class –Plugin Container –JPA Panel –Sample Panel –Control Panel

CS Class BoltzmannPlugin Icon Format public class BoltzmannPlugin extends TaskPlugable { public ActionContainer getContainerInstance(GroupContainer parentContainer) { BoltzmannContainer container = new BoltzmannContainer(parentContainer); container.setName("New_" + getName() + counter); counter++; return container; } public String getIconPath() { return "org/gridschool/unicore/plugins/boltzmann/boltzmann.gif"; } public String getName() { return "Boltzmann"; } public String getPluginInfo() { return "Grid School Example: The Boltzmann Plugin“; } public JPAPanel getPanelInstance(ActionContainer container) { return new BoltzmannJPAPanel(getClient(), (BoltzmannContainer)container); } public void startPlugin() {} public void stopPlugin() {} }

CS PluginJPAPanel CLIENT Job Directory Run and steer Boltzmann from Plugin Input output.gif SERVER Boltzmann Application Resource PluginContainer Export Input file Execute reads SamplePanel Sample.gif writes Get File From Uspace Request writes ControlPanel Control Send File To Uspace Request reads Editor Export Panel

CS Class BoltzmannJPAPanel  Set parameters in container  Use RemoteTextEditor, ImportPanel and ExportPanel  Implements interface Applyable ContainerJPAPanel applyValues resetValues updateValues

CS Remote Text Editor  Load, edit and save files from remote and local file spaces private RemoteTextEditor textEditor = new RemoteTextEditor(); private void buildComponents() { JTabbedPane tabbedPane = new JTabbedPane(); tabbedPane.add(textEditor, "Input File");... } public void applyValues() { container.setInputFile(textEditor.getFile()); container.setInputString(textEditor.getText());... } public void resetValues() { textEditor.setText(container.getInputString()); textEditor.setFile(container.getInputFile());... } public void updateValues(boolean vsiteChanged) { if (vsiteChanged) { textEditor.setVsite(container.getVsite()); }... }

CS Import and Export Panels  Specify file imports and exports from the GUI  Use out of the box New Import Remove Import Browse file systems

CS Class BoltzmannContainer public class BoltzmannContainer extends UserContainer { private String inputString; protected void buildExecuteGroup() { byte[] contents = StringTools.dos2Unix(inputString).getBytes(); IncarnateFiles incarnateFiles =new IncarnateFiles("INCARNATEFILES"); incarnateFiles.addFile(INPUT_FILENAME, contents); ResourceSet taskResourceSet = getResourceSet().getResourceSetClone(); taskResourceSet.add(getPreinstalledSoftware()); UserTask executeTask = new UserTask(getName(), null, taskResourceSet, getEnv(), getCommandLine(), null, getRedirectStdout(), getRedirectStderr(), isVerboseOn(), isVersionOn(), null,getMeasureTime(), getDebug(), getProfile()); executeGroup = new ActionGroup(getName() + "_EXECUTION"); executeGroup.add(incarnateFiles); executeGroup.add(executeTask); try { executeGroup.addDependency(incarnateFiles, executeTask); } catch (InvalidDependencyException e) { logger.log(Level.SEVERE, "Cannot add dependency.", e); } } public ErrorSet checkContents() { ErrorSet err = super.checkContents(); if (inputString == null || inputString.trim().length() == 0) { err.add(new UError(getIdentifier(), "No input file specified")); } } }

CS Additional Outcome Panels  Implement interface IPanelProvider in Container public class BoltzmannContainer extends UserContainer implements IPanelProvider {.... public int getNrOfPanels() { return 2; } public JPanel getPanel(int i) { if (i == 0) { if (samplePanel == null) { samplePanel = new BoltzmannSamplePanel(); } return samplePanel; } else { if (controlPanel == null) { controlPanel = new BoltzmannControlPanel(); } return controlPanel; } } public String getPanelTitle(int i) { if (i == 0) { return "Sample"; } else { return "Control"; } } public void finalizePanel() {} }

CS Class BoltzmannControlPanel public class BoltzmannControlPanel extends JPanel implements IObserver { private RemoteTextEditor editor;... private JobContainer getJobContainer() { return ResourceManager.getCurrentInstance().getJMCTree().getCurrentJob(); } private BoltzmannContainer getBoltzmannContainer() { return (BoltzmannContainer) ResourceManager.getCurrentInstance().getJMCTree().getFocussedObject(); } private void sendControlFile() { JobContainer jobContainer = getJobContainer(); AJOIdentifier ajoId = (AJOIdentifier)jobContainer.getIdentifier(); Vsite vsite = jobContainer.getVsite(); String[] filenames = {CONTROL_FILE}; byte[][] contents = new byte[1][]; String inputString = StringTools.dos2Unix(editor.getText()); contents[0] = inputString.getBytes(); SendFilesToUspace request = new SendFilesToUspace(ajoId, filenames, contents, vsite); request.addObserver(this); request.start(); } public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof SendFilesToUspace) { AbstractJob_Outcome outcome = (AbstractJob_Outcome)changeCode; logger.info("SendFilesToUspace result: " + outcome.getStatus()); } } }

CS Class BoltzmannSamplePanel public class BoltzmannSamplePanel extends JPanel implements IObserver {... private void getSampleFile() { JobContainer jobContainer = getJobContainer(); AJOIdentifier ajoId = (AJOIdentifier)jobContainer.getIdentifier(); Vsite vsite = jobContainer.getVsite(); String[] filenames = {SAMPLE_FILE}; GetFilesFromUspace request = new GetFilesFromUspace(ajoId, filenames, vsite); request.addObserver(this); request.start(); } public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof GetFilesFromUspace) { AbstractJob_Outcome outcome = (AbstractJob_Outcome)changeCode; logger.info("GetFileFromUspace result: " + outcome.getStatus()); if (outcome.getStatus().isEquivalent(AbstractActionStatus.SUCCESSFUL)) { GetFilesFromUspace request = (GetFilesFromUspace)theObserved; File imageFile = (File)request.getLocalFiles().firstElement(); Image image = Toolkit.getDefaultToolkit().createImage(imageFile.getAbsolutePath()); imagePanel.setImage(image); imagePanel.repaint(); return; } } } }

CS Summary  Extension Plugins –Easy way to submit custom AJOs –Use Client infrastructure  Task Plugins –Integrated Application support –Use sub classes of UserContainer –Use Client GUI elements  UNICOREpro Client Plugin Programmer‘s Guide – → Documentswww.unicorepro.com

Manchester Computing Supercomputing, Visualization & e-Science Lecture 8: Resource Broker A resource broker for Unicore. This software was designed to provide and imporatant Grid abstraction, namely that the middleware should find the resources appropriate to the users request. In this way the user does not need to know what resources are on the Grid or to maintain lists of appropriate resources.

CS Abstract Functions for a Resource Broker  Resource discovery, for workflows as well as single jobs.  Resource capability checking, do the offering sites have ALL necessary capability and environmental support for instantiating the workflow.  Inclusion of Quality of Service policies in the offers.  Information necessary for the negotiation between client and provider and mechanisms for ensuring contract compliance. Document submitted to GPA-RG group of GGF.

CS Design of EuroGrid Resource Broker To utilise the structure of UNICORE, in particular the AJO. To utilise the Usite/Vsite structure, in particular to extend the Vsite to the concept of a Brokering Vsite. Two modes of operation are possible: A simple Resource Check request: “Can this job run here”, checks static qualities like software resources (e.g. Gaussian98 A9) as well as dynamic resources like quotas (disk quotas, CPU, etc.) A Quality of Service request: returns a range of turnaround time, and cost, as part of a Ticket. If the Ticket is presented (within its lifetime) with the job, the turnaround and cost estimates should be met.

CS Ancestral Broker  The API allows two levels of operation: Resource Checking: Static requirements, capability and capacity. QoS Checking: Performance vs cost. Tickets can be issued as a “guarantee”.  Protocol can be used symmetrically by Broker. User Broker NJS Execution NJS Execution NJS 1 CheckQoS 2 CheckQoS 3 CheckQoS_Outcome 4 CheckQoS_Outcome

CS The Brokering Process T3E O3000 VPP300 Broker User 1. QoS Request 4. Ticket(s) 3. Ticket(s) 2. QoS Request

CS Resource Broker Graph Manchester Computing EUROGRID Green (O3000) Fuji (VPP) Turing (T3E) IDRIS Cray T3E IBM ??? LeSC UK Grid ?? ? T3Es `R’ US BrokeringVsite Dumb Vsite Figure 1: Possible Resource Broker Graph

CS Advanced Features of the Broker In addition the interface allows: a Ticket may contain a modified resource set which must be used with the Ticket. Multiple Tickets may be returned from a single site This powerful mechanism allows a Broker to: Bid for jobs requiring resources it can’t find, e.g. when receiving a bid requiring 256 processors for 1 hour, it could return a Ticket for 128 processors for twice as long Return a spread of offers, representing different priorities and corresponding costs Refine abstract application-specific resource requirements to concrete resource requirements, using built-in performance information about the code.

CS Example Application-Specific Broker An application Specific Resource Requirement is defined, that can express “Run the DWD local weather model code, over X grid points, simulating a Y hour period”. Standardisation is controlled by the Sun-style Community Source Licence (new interfaces must be returned to the community for publication). A broker can then be designed that takes such a resource requirement, e.g. “DWD local weather model code, over 1000 grid points, simulating over a 24 hour period”, and returns concretised offers, such as “16 T3E processors for an hour, done by midday, for £32”, “32 O3000 processors for 30 minutes, done in an hour, for £45”. So the knowledge of the application performance is kept in a single place – the Broker, away from the users. Consequently, the users never have to learn the code’s performance characteristics.

CS Interoperability across Grids  The emerging infrastructure with multiple Grids is already complex.  One cannot guarantee to have a uniform middleware such as UNICORE or Globus across all Grids.  Therefore a translation service is necessary.  We can link this to semantic information via a Grid Resource Ontology.  We are then starting to get to the right level of abstraction for a genuine infrastructure for computational resource.  It now no longer matters what you call this, the abstractions reflect the underlying reality of usage and must be flexible to change with differing usage.

CS Interoperable Broker: Method 1 1.The Network Job Supervisor delegates the Resource Check to the Broker at the Vsite. 2.The Unicore brokering track utilises the Incarnation Data Base exactly as for the ancestral broker. 3.The Globus track uses a translator of the QoS check object. The translation service is extendable. 4.The results of the translation are used to drive the LDAP search and the Globus broker then utilises MDS to perform this.

CS NJS Broker Unicore Broker Globus Broker IDB Translator Filter Basic Translator MDS(GRIIS/GRIS) Delegates resource check Lookupresources Delegates translation Uses to drive LDAP search Performs Diagram Of Broker Architecture Architecture: Method 1

CS Ontologies Need ontologies at BOTH application and infrastructure level. If we can create a Grid Resource Ontology, creation of specialist translation classes from basic Grid translator becomes possible. Incarnation Data Base at sites can be created via ontology, it contains site specific information which the clients job specification cannot do. So brokers take client request formulated in RR space, at each site use translator to convert to RR space, offers come back with capability and QoS.

CS NJS Broker Unicore Broker Globus Broker IDB Translator Filter Ontology engine Resource Discovery Service Resource Discovery Service Delegates resource check Lookupresources Delegates translation Uses to drive MDS search Hierarchical Grid Search Diagram Of Broker Architecture Architecture: Method 2 Filter Uses to Drive MDS Search Nodal Grid Search OtherBrokers Resource Discovery Service Resource Discovery Service

Manchester Computing Supercomputing, Visualization & e-Science Lectures 9-10: Grid Interoperability A useful test of the validity of the Grid concept is to show that different middleware systems can be made to interoperate. Here we show how the Grid Interoperability Project enabled the high-level abstractions of Unicore to be mapped onto the Globus toolkit. We thank Phillip Wieder for permission to use this material.

CS Outline Introduction to GRIP Interoperability Layer: Design Interoperability Layer: Realisation UNICORE – Globus: Work in Progress Summary

CS The GRid Interoperability Project Development of an interoperability layer between the two Grid systems Interoperable applications Contributions made to the Global Grid Forum UNICORE towards Grid Services … to realise the interoperability of UNICORE and Globus and to work towards standards for interoperability in the Global Grid Forum: See

CS Focus of this Talk … to describe the UNICORE – Globus interoperability layer in detail concentrating the design & the implementation. Out of scope Interoperable applications Standardization work Briefly discussed Interoperable resource broker

CS Interoperability Layer: Design

CS Client NJS Gateway TSI UUDB Gateway NJS (brokering) USite Starting Point: UNICORE Architecture UPL (Unicore Protocol Layer) NJS TSI Protocol IDB Abstract Non-abstract authorisation authentication incarnation multi-site jobs Batch system cmds files UPL Target System A Uspace Target System B

CS The software the TSI interfaces to can be: Batch system or scheduler UNIX shell (batch system emulated) GRID resource manager component (like Globus GRAM) Term “Batch system” used in this talk Target system == Execution system TSI runs on Note of Clarification

CS Client NJSUUDB Gateway Interfacing Globus through UNICORE IDB Globus server Target System Uspace TSI Globus client MDS Globus host MDS – Monitoring & Discovery Service

CS Challenges Grid approach UNICORE: User oriented workflow environment Globus: Services, APIs & portal builder UNICORE as a workflow portal for Globus Security UNICORE: End-to-end security model Globus: Requires transitive trust Don’t violate UNICORE’s security model Resource description UNICORE: One model for discovery & request Globus: Different models Map from MDS (LDAP), map to RSL

CS Interoperability Modules The following modules have been defined: Security Resources & information Job preparation Job submission & monitoring & control Output retrieval & file management

CS Interoperability Layer: Realisation

CS Security Basics Public/private key infrastructure to establish connections X509v3 certificates (incl. extensions) UNICORE: End-to-end security, jobs signed Keys & certificates are stored in a keystore at the client side Globus: Transitive trust, proxy certificates Keys & certificates are stored on the file system

CS Client NJS Gateway Interfacing Globus through UNICORE Globus server Target System Uspace TSI Globus client MDS Globus host X.509 user cert Globus proxy GSI enabled auth & comm GSI – Grid Security Infrastructure

CS Security Interoperation Proxy Certificate Plugin generates a proxy from the UNICORE user’s private key The proxy certificate is transferred to the user’s Uspace Proxy used for every task involving GSI enabled authentication & communication Configure Globus client (TSI) to use proxy Configure Globus server to trust signing CA Details next slide...

CS CLIENT SERVER SSO Proxy certificate Create proxy & encapsulate in Site-specific Security Object (SSO) Network Job Supervisor (NJS) Gateway Job Directory (Uspace) Unpack proxy into $USPACE/.proxy Proxy certificate Proxy Certificate Creation & Transfer

CS Resources & Information Globus host specific information (hostname, port,...) is configured at the TSI No extensions to the UNICORE Incarnation Database Interoperable Resource Broker for UNICORE IDB and Globus Monitoring & Directory Service (MDS) Alpha version Currently mapping between UNICORE & MDS resource descriptions Extensible

CS Filter Resource Broker Architecture (early 2003) Translator GIIS – Grid Index Information Service GRIS – Grid Resource Information Service Broker UNICORE Broker Globus Broker NJSIDBMDS (GIIS/GRIS) Basic Translator Delegates resource check Lookup resources Delegates translation Uses to drive LDAP search Performs LDAP search Diagram: John Brooke, University of Manchester

CS The Target System Interface (TSI)... implements the target system/batch system specific functions to manage the incarnated tasks on the specific system. Normally runs as root (set*id) Single threaded, multiple workers to support multi- threaded NJS NJS – TSI communication via plain sockets Two implementations: Perl & Java

CS TSI Flavours TSI Grid Service work in progress Globus 3 Globus 2 Unix Shell × Batch system JavaPerl TSI Impl. Target × × × work in progress prototype using OGSI::Lite

CS TSI: Perl Implementation VendorTypeOSBatch Sub-System HitachiSR 8000HI-UX/MPPNQS IBMSPAIX LoadLeveler (+DCE), LSF (prototype) FujitsuVPP seriesUXP/VNQS NECSX seriesSuper UXNQS CrayT3E, SV1UNICOSNQE Various PCsIA32 clustersLinuxPBS, CCS SGIO2000/3000, OnyxIRIXNQS Workstations (e.g. SUN, SGI, Linux) nativeemulated

CS TSI: Java Implementation... implements the same functionality as the Perl TSI. Alpha version Unix only since uses set*id via Java Native Interface (JNI) Globus 2 version makes use of the Java CoG Kit Basis for interface to Globus Toolkit 3 (work in progress) NJS remains unchanged

CS The Globus TSI Implemented interop. modules: Job preparation Job submission & monitoring & control Output retrieval & File management Target system: Globus Toolkit 2.x Perl (beta, inside firewall) & Java (alpha, outside firewall) implementations Both versions under development Current focus: GT3 & TSI Grid Service

CS Batch-/Operating system (PBS, LSF, Linux,...) TSI Shepherd NJS initiatecontrol / data fork TSI: Architecture (Perl Implementation) TSI Workers batch / os commands TSI

CS TSI Shepherd TSI Workers NJS initiatecontrol / data fork Globus proxy Globus 2 Batch-/Operating System Globus TSI Globus protocols TSI Globus server Globus client

CS Job Preparation Globus RSL job &("executable"=/var/eurogrid/Globus/bin/globus-sh-exec) ("directory"=/filespace/uspace_d3d775a/) ("hostCount"="1") ("count"="1") ("maxTime"="10") ("maxMemory"="1024") ("queue"="low") ("stdout"= ("stderr"= Incarnated UNICORE job #TSI_USPACE_DIR /filespace/uspace_d3d775a/ #TSI_OUTCOME_DIR /filespace/outcome_d3d775a/.../ #TSI_TIME 600 #TSI_MEMORY 1024 #TSI_NODES NONE #TSI_PROCESSORS 1 Mapping

CS Job Submission, Monitoring & Control Globus proxy GRAM Client GRAM Gatekeeper Globus 2 GRAM Job Manager Batch-/Operating system create TSI &("executable"=/var/eurogrid/Globus/bin/globus- sh-exec) ("directory"=/filespace/uspace_d3d775a/) ("hostCount"="1") ("count"="1") ("maxTime"="10") ("maxMemory"="1024") ("queue"="low") ("stdout"= d775a/.../stdout) ("stderr"= d775a/.../stderr) Job submission Job control & status info TSI Worker GRAM – Globus Resource Allocation Manager

CS TSI Worker Globus proxy GRAM Client GASS Server GRAM Gatekeeper Globus 2 GRAM Job Manager GASS Client Batch-/Operating system TSI Output Retrieval stdout & stderr create Job submission Job control & status info GASS – Global Access to Secondary Storage

CS File Management Necessary if TSI & Globus on different target systems Usage of GridFTP or GASS (automatic staging possible) Maintainance of remote Uspace (“Gspace”) TSI Worker Globus proxy GRAM Client GASS Server GRAM Gatekeeper Globus 2 GRAM Job Manager GASS Client TSI stdout & stderr create Job submission Job control & status info GSpace file staging & maintainance USpace

CS TSI Shepherd NJS initiatecontrol / data fork Globus proxy GRAM Client GASS Server GRAM Gatekeeper Globus 2 GRAM Job Manager GASS Client Batch-/Operating System create Globus 2 Target System Interface Globus protocols Job preparation TSI GSpace USpace

CS Globus API Submission: globusrun (returns ) Monitoring: globusrun –status Control: globus-job-cancel Output retrieval: globus-gass-server File Transfer: globus-url-copy (supports GridFTP & HTTP(S) for GASS transfers)... or the corresponding Java Commodity Grid (CoG) Kit API methods (Java TSI)

CS Behind the Scenes: Create Uspace # Incarnation of task # Incarnation produced for Vsite at #TSI_IDENTITY zdv190 NONE #TSI_USPACE_DIR /opt/Unicore/filespace/uspace_8fdce574/ #TSI_EXECUTESCRIPT # Commands to incarnate a Uspace /bin/mkdir -p -m700 /opt/Unicore/filespace/uspace_8fdce574/ /bin/mkdir -p -m700 /opt/Unicore/filespace/outcome_8fdce574/...

CS Behind the Scenes: Job Submission # Incarnation of task... #TSI_IDENTITY zdv190 NONE #TSI_USPACE_DIR /opt/Unicore/filespace/uspace_8fdce574/ #TSI_SUBMIT #TSI_JOBNAME SimpleScript #TSI_OUTCOME_DIR /opt/Unicore/filespace/outcome_8fdce574/AA.. #TSI_TIME 600 #TSI_MEMORY 1024 #TSI_NODES 1 #TSI_PROCESSORS 1 #TSI_HOST_NAME zam289_grip_test... #TSI_QUEUE low #TSI_ NONE... # Incarnation of ExecuteTask, UserTask or ExecuteScriptTask... RETURNS:

CS Behind the Scenes: Job Monitoring #TSI_IDENTITY zdv190 NONE #TSI_GETSTATUSLISTING RETURNS: QSTAT RUNNING

CS TSI Modules tsi: Perl script to be executed, TSI configuration Globus server information Initialisation: Contact NJS; create Workers Start repository process MainLoop: Listen to NJS & process input No changes

CS TSI Modules (cont.) Submit: Job submission to resource manager, returns jobID; prerequ.: pre-staging complete, target job description available GetStatusListing: Returns list of states (SUCCESSFUL, FAILED, PENDING, QUEUED, EXECUTING) for known jobIDs JobControl: abort, cancel, hold, resume Job submission/ monitoring/ control

CS TSI Modules (cont.) PutFiles: Writes files sent by NJS to target system GetDirectory: Return dir. & content to NJS EndProcessing: Job finished (check for stdout & stderr)? Close GASS server, update repository Reporting: Logging, debugging Log Globus output File transfer

CS TSI Modules (cont.) BecomeUser: set*id; No changes ExecuteScript: Execute script; No changes DataTransfer: GASS control Globus: Job repository & Globus specific var.s JobPreparation: Mapping from UNICORE job description to RSL Globus TSI specific

CS “Classic” TSI Setup SSL Client Client firewall Gateway NJS TSI Globus Server Server firewall SSL Server demilitarized zone TSI & Globus on target system & inside firewall: +Ignore Globus firewall issues +Uspace == “Gspace” -“Restricted” interoperability -> no direct remote access

CS “Remote” TSI Setup TSI & Globus outside firewall & on different machines: +Interoperation with any Globus server possible -Maintainance of temporary “Gspace” SSL Client Client firewall Gateway NJS TSI Globus Server Server firewall SSL Server demilitarized zone SSL

CS UNICORE – Globus: Work in Progress

CS TSI Developments Java TSI as Grid Service Client (GT3) Currently only Job Submission Add file transfer & other services TSI Grid Service & NJS – TSI protocol TSI portType(s) (WSDL) XML Schema message definition Perl TSI & OGSI::Lite hosting environment

CS TSI Shepherd TSI Workers NJS initiatecontrol/data as before Globus proxy Grid Service Client Master Job Factory Service (MJFS) Managed Job Service (MJS) creates web services Interfacing GT3 GRAM Globus 3 TSI createService create SOAP Batch system

CS Batch-/Operating system (PBS, LSF, Linux,...) TSI Grid Service Factory NJS createServiceSOAP messages create TSI Grid Service TSI Grid Service Instances batch / os commands OGSI::Lite TSI Grid Service Client

CS Resource Broker Developments UNICORE ontology (basis: JavaDoc) Ontology for MDS (basis: GLUE schema) Ontology mapping Integrate ontology engine into broker Resource broker portType Towards a Grid resource ontology

CS Resource Broker Architecture Diagram: Donal Fellows, University of Manchester Compute Resource Broker NJS IDB UUDB ExpertBroker DWDLMExpert Other LocalResourceChecker UnicoreRC GlobusRC Translator OntologicalTranslator Ontology SimpleTranslator MDSGRAM TSI ICMExpert Look up static resources Look up configuration Verify delegated identities Delegate to application-domain expert code Delegate to Grid architecture-specific engine for local resource check Pass untranslatable resources to Unicore resource checker Look up resources Look up dynamic resources Delegate resource domain translation Look up translations appropriate to target Globus resource schema Broker hosted in NJS Get back set of resource filters and set of untranslatable resources TicketManager UNICORE Components EUROGRID Broker Globus Components GRIP Broker Key: Inheritance relation Get signed ticket (contract) Look up signing identity

CS Other Activities XML Schema for UNICORE resource model OGSI’fication: UUDB portType, resource database portType,... UNICORE Service Data Framework GGF: standardize portTypes, protocols, GRIP is not the end

CS Summary

CS Interoperability Abstraction Single sign-on: Use SSO to transfer alternative security credentials through to the target system Resource discovery: extend resource broker Resource request: map UNICORE job description to representation needed Use batch system specific APIs/commands for job submission/monitoring & data transfer Income/Outcome staging to/from Uspace Note: This MAY imply changes not only to the TSI

CS How to Start? Take interoperability modules as starting point Consider security & resource/information representation/management carefully Define UNICORE client extensions if necessary Are server modifications necessary? Specify Perl modules to be implemented/changed

CS Recommended Reading Grid Interoperability Project: UNICORE software download: UNICORE Plus Final Report: Final-Report.pdf (Good intro to UNICORE) “An Analysis of the UNICORE Security Model”, GGF public comment period: (contains GRIP part; subsequent docs ready for submission)

CS Recommended Reading (cont.) Java Commodity Grid Kit: (Also good intro to Globus programming) Globus Resource Allocation Manager: management.html “Globus Firewall Requirements” rewall%20Requirements-5.pdf OGSI::Lite – A Perl Hosting Environment:

Manchester Computing Supercomputing, Visualization & e-Science Lecture 12 - Case Study This case study presents the RealityGrid project. It has used most types of Grid middleware, Unicore, Globus and its own Perl web services implementation OGSI::Lite. It has also created application APIs for computational steering

CS The RealityGrid Project Mission: “Using Grid technology to closely couple high performance computing, high throughput experiment and visualization, RealityGrid will move the bottleneck out of the hardware and back into the human mind.” Scientific aims:  to predict the realistic behavior of matter using diverse simulation methods (Lattice Boltzmann, Molecular Dynamics and Monte Carlo) spanning many time and length scales  to discover new materials through integrated experiments.

CS Partners Academic  University College London  Queen Mary, University of London  Imperial College  University of Manchester  University of Edinburgh  University of Oxford  University of Loughborough Industrial  Schlumberger  Edward Jenner Institute for Vaccine Research  Silicon Graphics Inc  Computation for Science Consortium  Advanced Visual Systems  Fujitsu

CS RealityGrid Characteristics  Grid-enabled (Globus, UNICORE)  Component-based, service-oriented  Steering is central –Computational steering –On-line visualisation of large, complex datasets –Feedback-based performance control –Remote control of novel, grid-enabled, instruments (LUSI)  Advanced Human-Computer Interfaces (Loughborough)  Everything is (or should be) distributed and collaborative  High performance computing, visualization and networks  All in a materials science domain –multiple length scales, many "legacy" codes (Fortran90, C, C++, mostly parallel)

CS Exploring Parameter Space through Computational Steering Initial condition: Random water/ surfactant mixture. Self-assembly starts. Rewind and restart from checkpoint. Lamellar phase: surfactant bilayers between water layers. Cubic micellar phase, low surfactant density gradient. Cubic micellar phase, high surfactant density gradient.

CS Computational Steering – Why?  Terascale simulations can generate in days data that takes months to understand  Problem: to efficiently explore and understand the parameter spaces of materials science simulations  Computational steering aims to short circuit post facto analysis –Brute force parameter sweeps create a huge data-mining problem –Instead, we use computational steering to navigate to interesting regions of parameter space –Simultaneous on-line visualization develops and engages scientist's intuition –thus avoiding wasted cycles exploring barren regions, or even doing the wrong calculation

CS Computational Steering – How?  We instrument (add "knobs" and "dials" to) simulation codes through a steering library  Library provides: –Pause/resume –Checkpoint and windback –Set values of steerable parameters –Report values of monitored (read-only) parameters –Emit "samples" to remote systems for e.g. on-line visualization –Consume "samples" from remote systems for e.g. resetting boundary conditions  Images can be displayed at sites remote from visualization system, using e.g. SGI OpenGL VizServer, or Chromium  Implemented in 5+ independent parallel simulation codes, F90, C, C++

CS Philosophy  Provide right level of steering functionality to application developer  Instrumentation of existing code for steering –should be easy –should not bifurcate development tree  Hide details of implementation and supporting infrastructure –eg. application should not be aware of whether communication with visualisation system is through filesystem, sockets or something else –permits multiple implementations –application source code is proof against evolution of implementation and infrastructure

CS Steering and Visualization Simulation Visualization data transfer Client Steering library Display

CS Architecture Communication modes: Shared file system Files moved by UNICORE daemon GLOBUS-IO SOAP over http/https Simulation Visualization data transfer Client Steering library Data mostly flows from simulation to visualization. Reverse direction is being exploited to integrate NAMD&VMD into RealityGrid framework.

CS Steering in the OGSA Steering client Simulation Steering library Visualization Registry Steering GS connect publish find bind data transfer publish bind Client Steering library

CS Steering in OGSA continued…  Each application has an associated OGSA-compliant “Steering Grid Service” (SGS)  SGS provides public interface to application –Use standard grid service technology to do steering –Easy to publish our protocol –Good for interoperability with other steering clients/portals –Future-proofed next step to move away from file-based steering or Modular Visualisation Environments with steering capabilities  SGSs used to bootstrap direct inter-component connections for large data transfers  Early working prototype of OGSA Steering Grid Service exists –Based on light-weight Perl hosting environment OGSI::Lite –Lets us use OGSI on a GT2 Grid such as UK e-Science Grid today

CS Steering client  Built using C++ and Qt library – currently have execs. for Linux and IRIX  Attaches to any steerable RealityGrid application  Discovers what commands are supported  Discovers steerable & monitored parameters  Constructs appropriate widgets on the fly  Web client (portal) under development

CS program lbe use lbe_init_module use lbe_steer_module use lbe_invasion_module RealityGrid-L2: LB3D on the L2G Visualization SGI Onyx Vtk + VizServer Simulation LB3D with RealityGrid Steering API Laptop Vizserver Client Steering GUI GLOBUS used to launch jobs SGI OpenGL VizServer Simulation Data GLOBUS-IO Steering (XML) File based communication via shared filesystem: Steering GUI X output is tunnelled back using ssh. ReG steering GUI

CS Performance Control application component 1 component 2 component 3 application performance steerer component performance steerer

CS Advance Reservation and Co-allocation: Summary of Requirements  Computational steering + remote, on-line visualization demand: –co-allocation of HPC (processors) and visualization (graphics pipes and processors) resources –at times to suit the humans in the loop advanced reservation  For medium to large datasets, Network QoS is important –between simulation and visualization, –visualisation and display  Integration with Access Grid –want to book rooms and operators too  Cannot assume that all resources are owned by same VO  Want programmable interfaces that we can rely on –must be ubiquitous, standard, and robust  Reservations (agreements) should be re-negotiable  Hard to change attitudes of sysadmins and (some) vendors

CS Steering and Workflows  Steering adds extra channels of information and control to Grid services.  Steering and steered components must be state-aware, underlying mechanisms in OS and lower-level schedulers, monitors, brokers must be continually updated with changing state.  How do we store and restore the metadata for the state of the parameter space search?  Human factors are built into our architecture, humans continually interact with orchestrated services. What implications for workflow languages?

CS Collaborative Aspects  Multiple groups exploring multiple regions of parameter space.  How to record and restore the state of the collaboration?  How to extend the collaboration over multiple sessions?  What are the services and abstractions necessary to bootstrap collaborative sessions?  How do we reliably recreate the resources required by the services, in terms of computation, visualization, instrumentation and networking.

CS Integration with Access Grid? Service for Bootstrapping session Contains “just enough” Information to start other Services, red arrows indicate bootstrapping Virtual Venues Server Multicast addressing Bridges Visualization Workflow Workflows saved from Previous sessions or Created in this session Simulation Workflow Workflows saved from Previous sessions or Created in this session Data Source Workflow Workflows saved from Previous sessions or Created in this session Process Repository Collaborative processes Captured using ontology Can be enacted by Workflow engines Application Repository Uses application specific ontology to describe what in silico processes need To be utilised for the session Participants location and access rights Application data, computation and visualization requirements Who participates? What do they use?

CS How far have we got? Linking US Extended Terascale Facilities and UK HPC resources via a Trans- Atlantic Grid  We used these combined resources as the basis for an exciting project –to perform scientific research on a hitherto unprecedented scale  Computational steering, spawning, migrating of massive simulations for study of defect dynamics in gyroid cubic mesophases  Visualisation output was streamed to distributed collaborating sites via the Access Grid  Workshop presentation with FZ Juelich and HLRS, Stuttgart on the theme of computational steering.  At Supercomputing, Phoenix, USA, November 2003 TRICEPS entry won “Most Innovative Data-Intensive Application”

CS Summary  All our workflow concepts are built around the idea of Steerable Grid Services.  Resources used by services have complex state, may migrate, may be reshaped.  Collaborative aspects of “Humans in the loops” are becoming more and more important.  The problems of allocating and managing the resources necessary for realistic modelling are very hard, they require (at present) getting below the Grid abstractions.  Clearly the Grid abstractions are not yet sufficiently comprehensive and in particular lack support for expression of synchronicity.