Download presentation
Presentation is loading. Please wait.
Published byElwin Gilmore Modified over 9 years ago
1
Manchester Computing Supercomputing, Visualization & e-Science CS 602 — eScience and Grids John Brooke j.m.brooke@man.ac.uk Donal Fellows donal.fellows@man.ac.uk
2
Manchester Computing Supercomputing, Visualization & e-Science Lecture 1: What is a Grid We examine how the Grid concept arose, what its relation is to other concepts such as e-Science and CyberInfrastructure. We examine a more precise definition for a Computational Grid. There are other types of Grid but this is the main focus of this module
3
CS602 3 e-Science “In the future, e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists.” Dr John Taylor, Director General of the Research Councils, OST
4
CS602 4 Cyber Infrastructure Term coined by US Blue Ribbon panel - describes the emergence of an infrastructure linking high-performance computers, experimental facilities, data repositories. Seems to be distinguished from term Grid, which is considered more to apply directly to computation and cluster style computing. May or may not be the same thing as eScience. eScience focuses on the way that science is done, cyber- infrastructure on how the infrastructure is provided to support this way of working.
5
CS602 5 Grids as Virtual Organizations Used in paper Anatomy of the Grid (Foster, Kesselman, Tuecke) “ … Grid concept is coordinated resource sharing in dynamic, multi- institutional virtual organizations …” There is an analogy with an electrical Power Grid where producers share resources to provide a unified service to consumers. A large unresolved question is how do Virtual Organizations federate across security boundaries (e.g. firewalls) and organisational boundaries (resource allocation). Grids may have hierarchical structures, e.g. the EU DataGrid, or may have more federated structures, e.g. EuroGrid
6
CS602 6 What can Grids be used for? User with laptop/PDA (web based portal) VR and/or AG nodes HPC resources Scalable MD, MC, mesoscale modelling “Instruments”: XMT devices, LUSI,… Visualization engines Steering ReG steering API Storage devices Grid infrastructure (Globus, Unicore,…) Moving the bottleneck out of the hardware and into the human mind… Performance control/monitoring
7
CS602 7 HypothesesDesign Integration Annotation / Knowledge Representation Information Sources Information Fusion Clinical Resources Individualised Medicine Data Mining Case-Base Reasoning Data Capture Clinical Image/Signal Genomic/Proteomic Analysis Knowledge Repositories Model & Analysis Libraries Grids for Knowledge/Information Flow
8
CS602 8 Parallel and Distributed Computing Parallel computing is the synchronous coupling of computing resource, usually in a single machine architecture or single administrative domain, e.g. a cluster. Distributed computing refers to a much looser use of resources, often across multiple administrative domains. Grid computing is an attempt to provide a persistent and reliable infrastructure for distributed computing. Users may wish to run workflows many times over a set of distributed resources, e.g. in bioinformatics applications. Users may wish to couple heterogeneous resources for scientific collaboration, e.g. telescopes, computers, databases, video-conferencing facilities.
9
CS602 9 Re-usability and Components We wish to develop sufficient reusable components to provide common facilities so that applications and services can interoperate. We can do this by various approaches, in Globus a toolkit is developed, in Unicore all actions on the Grid are modelled by abstractions encapsulated in an inheritance hierarchy. As part of this course you should start to identify the strengths and weaknesses of these two approaches. More radical approaches are to impose a meta-operating system to present the resources as a virtual computer. This was tried by the Legion project and the idea partially survives in the concept of a DataGrid.
10
CS602 10 Toolkits for Grid Functions Software development toolkits Standard protocols, services & APIs A modular “bag of technologies” Enable incremental development of grid-enabled tools and applications Reference implementations Learn through deployment and applications Open source Diverse global services Core services Local OS A p p l i c a t i o n s
11
CS602 Layered Architecture Applications / Problem Solving Environments Grid Services HBMGASS Grid Fabric LSF MPI NQE Application Toolkits GlobusView Solaris GSI-FTPMDS Grid Resources Linux PBS GSIGRAM DUROCMPICH-Gglobusrun ManchesterImperial College EPCCOxford QMLoughborough Manchester QM-LUSI/XMT UNICOSIRIXTru64 SRB LUSI PortalComponent RepositoryVisualization & SteeringComputational PSE Component FrameworkVIPAR
12
CS602 12 Core Functions for Grids Acknowledgements to Bill Johnston of LBL
13
CS602 13 The GGF Document “Core Functions for Production Grids” is attempting to define Grids by the minimal set of functions that a Grid must implement to be “usable” This is a higher level approach that does not attempt to specify how the functions are implemented, or what base technology is used to implement them In the original Globus Toolkit functions were implemented in C and could be called via APIs, scripts or used on the command line In Unicore functions were abstracted as a hierarchy of Java classes, then mapped to Perl scripts at a lower level, the “Incarnation process”. In the Open Grid Services Architecture there is a move to a Web services based approach, the hosting environment assumes prominence. A Set of Core Functions for Grids
14
CS602 14 Converging Technologies Agents Grid Computing Web Service & Semantic Web Technologies
15
CS602 15 Web Services Early Grids were built on the technologies used for accessing supercomputers, e.g. ssh, shell scripts, ftp. Information services were built on directory services such as LDAP, Lightweight Directory Access Protocol. However in the commercial sphere Web Services are becoming dominant based on SOAP, Simple Object Access Protocol, WSDL, Web Services Description Language and UDDI. Early Grid systems such as Unicore and Globus are trying to refactor their functionality in terms of Web Services. The key Grid concept not captured in Web services, is State, e.g what is the state of a job queue, the load on a resource, etc..
16
CS602 16 Other Types of Grid The word Grid is very loosely used. Some aspects of collaborative video-conferencing and advanced visualization are termed Grid. These are currently trying to use technology developed for running computations, the results are not always usable. This is just one indication that we must conceptualise what abstractions we need to capture in Grid software. We also need to develop abstractions for both high and low level protocols, for security models, for user access policies. The Unicore system we present has captured the key semantics and abstractions of a Computational Grid.
17
CS602 17 Access Grid Manchester official UK Constellation site Solar Terrestrial Physics Workshop Teleradiology, Denver
18
Manchester Computing Supercomputing, Visualization & e-Science Lecture 2: Computational Resource If the Grid concept is to move from a vague analogy to a workable scientific concept, the terms need to be more carefully defined. Here we describe one approach to defining one key abstraction, namely computational resource.
19
CS602 19 Terminology We identify a problem: terms in distributed computing are used loosely and are thus not amenable to analysis. We identify a possible programme: to seek for invariants which are conserved or are subject to identifiable constraints. We now try to trace an analysis of the concept of “Computational Resource” since distributed computing networks are increasingly referred to as Grids. An electricity grid distributes electrical power, a water grid distributes water, and information grid distributes information. What does a computational grid distribute?
20
CS602 20 The Analogy with a Power Grid The power grid delivers electrical power in the form of a wave (A/C wave) The form of the wave can change over the Grid but there is a universal (scalar) measure of power, Power = voltage x current. This universal measure facilitates the underlying economy of the power grid. Since it is indifferent to the way the power is produced (gas, coal, hydro etc…) different production centres can all switch into the same Grid. To define the abstractions necessary for a Computational Grid we must understand what we mean by computational resource.
21
CS602 21 Information Grids Information can be quantified as bits with sending and receiving protocols. Bandwidth x time gives measure of information flow. Allows Telcos to charge. Internet protocols allow discovery of static resource (e.g. WWW pages). Information “providers” do not derive income directly according to volume of information supplied. Use other means (e.g. advertising, grants) to sustain resources needed. Current Web is static, do not need to consider dynamic state, hence extensions needed for Open Grid Services Architecture.
22
CS602 22 What is Computational Power? Is there an equivalent of voltage x current? Megaflops? Power is a rate of delivery of energy, so should we take Mflops/second. However this is application dependent. Consider two different computations 1.Seti@home. Time factors not important. 2.Distributed collaborative working on a CFD problem with computation and visualization of results in multiple locations. Time and synchronicity are important! 3.But both may use exactly the same number of Mflops.
23
CS602 23 Invariants in Distributed Computation To draw an analogy with the current situation we refer to the status of physics in the 17th and 18th centuries. It was not clear what the invariant quantities were that persisted through changes in physical phenomena. Gradually quantities such as momentum, energy, electric charge were isolated and their invariance expressed in the form of Conservation Laws. Without Conservation Laws, a precise science of physics is inconceivable. We have extended our scope to important inequalities, e.g. Second Law of Thermodynamics, Bell’s inequality. We must have constraints and invariants or analysis or modeling are impossible.
24
CS602 24 An Abstract Space for Job-Costing Define a job as a vector of computational resources (r1,r2,…,rn) A Grid resource advertises a cost function for each resource (c1,c2,…,cn) Cost function takes vector argument to produce job cost (r1*c1 + r2*c2 + … + rn*cn)
25
CS602 25 A Dual Job-Space Thus we have a space of “requests” defined as a vector space of the computational needs of users over a Grid. For many jobs most of the entries in the vector will be null. We have another space of “services” who can produce “cost vectors” for costing for the user jobs (providing they can accommodate them). This is an example of a dual vector space. A strictly defined dual space is probably too rigid but can provide a basis for simulations. The abstract job requirements will need to be agreed. It may be a task for a broker to translate a job specification to a “user job” for a given Grid node. A Mini-Grid can help to investigate a given Dual Job-Space with vectors of known length.
26
CS602 26 4 - Dual Space Cost vector Job vector CostCost User Job Scalar cost in tokens 1 2
27
CS602 27 Computational Resource Computational jobs ask questions about the internal structure of the provider of computational power in a manner that an electrically powered device does not. For example, do we require specific compilers, libraries, disk resource, visualization servers? What if it goes wrong, do we get support? If we transfer data and methods of analysis over the Internet is it secure? A resource broker for high performance computation is a different order of complexity to a broker for an electricity supplier.
28
CS602 28 Emergent Behaviour Given this complexity, self-sustaining global Grids are likely to emerge rather than be planned. Planned Grids can be important for specific tasks, the EU DataGrid project is an example. They are not required to be self-sustaining and questions of accounting and resource transfer are not of central interest. We consider the EUROGRID multi-level structure as an emergent phenomenon that could have some pointers to the development of large scale, complex, self-sustaining computational Grids. The Unicore Usite and Vsite structure is an elegant means of encapsulating such structure.
29
CS602 29 Fractal Structure and Complexity Grids are envisaged as having internal structure and also external links. Via the external links (WANS, intercontinental networks) Grids can be federated. Action of joining Grids raises interesting research questions: 1. How do we conceptualise the joining of two Grids? 2. Is there a minimum set of services that defines a Grid. 3. Are there environments for distributed services and computing that are not Grids (e.g. a cluster) We focus on the emergent properties of virtual organisations in considering whether they are Virtual Organizations.
30
CS602 30 Resource Requestor and Provider Spaces Resource requestor space (RR), in terms of what the user wants: e.g. Relocatable Weather Model, 10^6 points, 24 hours, full topography. Resource Provider space (RP), 128 processors, Origin 3000 architecture, 40 Gigabytes Memory, 1000 Gigabytes disk space, 100 Mb/s connection. We may even forward on requests from one resource provider to another, recasting of O3000 job in terms of IA64 cluster, gives different resource set. Linkage and staging of different stages of workflow require environmental support, a hosting environment.
31
CS602 31 RR space RP space RR space request Request referral sync Figure 1: Request from RR space at A mapped into resource providers at B and C, with C forwarding a request formulated in RR space to RP space at D. B and C synchronize at end of workflow before results returned to the initiator A. A B C D RR and RP Spaces
32
CS602 32 Resume We have shown how some concepts from abstract vector spaces may be able to provide a definition of Computational Resource. We do not know as yet what conservation laws or constraints could apply to such an abstraction and whether these would be useful in analysing distributed computing. We believe that we can show convincingly that simple scalar measures such as Megaflops are inadequate to the task. This invalidates the “league table” concept such as the Top 500 computers. Compuational resource will be increasingly judged by its utility within a given infrastructure.
33
CS602 33 The Resource Universe What is the “Universe” of resources for which we should broker? One might use a search engine but then there is no agreed resource description language nor would users be able to run on most of the resources selected. Globus uses a hierarchical directory structure, MDS based on LDAP. Essentially this is a “join the Grid model”, based on the VO concept. By making Vsites capable of brokering we can potentially access the whole universe of Vsites. Concept of a Shadow Resource DAG makes the resource search structurally similar to its implementation, maintains AJO abstraction.
34
CS602 34 Towards a Global Grid Economy? Much access to HPC resources is via national grants or the resources are private (governmental, commercial). Many problems with sharing resources, what incentives? Grid resources can be owned by international projects but resources are allocated by national bodies. This is like collaboration in large scale facilities, e.g. CERN. Europe has to go down the shared resource route, US doesn’t. Will this produce separate types of Grid economy? The problems of accounting and resource trading are rarely touched on. Mini-Grids can help explore technical issues outside of political ones.
35
CS602 35 Summary The three different views of a distributed infrastructure relate to the way it is used. We need to abstract usage patterns and see if we can link them to invariants that can be quantified. We have investigated in depth the concept of “Computational Resource”. This ties into all three definitions 1.eScience collaborations use resources 2.Cyber-infrastructures connect resources 3.Grids distribute resources
36
CS602 36 Human Factors A prediction arises from this: that the abstracted idea of human collaboration will be essential to success in this field. In an electricity Grid the human participants are completely anonymised and only influence via mass action e.g. a power surge. Patterns of usage in eScience will be much more complex and dynamic. It will belong to the post-Ford model of industrial production, this time the product will be knowledge. Our search to abstractions to encapsulate this will be far more challenging and exciting.
37
Manchester Computing Supercomputing, Visualization & e-Science Lecture 3: Introduction to Unicore Unicore is the Grid middleware system you will study in depth. It is a complete system based on a three tier architecture. We have chosen it as an illustration because of its compact and complete nature and because it is very well-engineered for a Computational Grid. Thanks to Michael Parkin who created the slides in this lecture
38
CS602 38 UNICORE Grid UNiform Interface to COmputing REsources European Grid infrastructure to give secure and seamless access to High Performance Computing (HPC) resources Secure:Strong authentication of users based on X509 certificates. Communication using SSL connections over a TCP/IP/Internet connection - Defined in the UNICORE Protocol Layer (UPL) specification. Seamless: Uniform interface and consistent access to computing resources regardless of the underlying hardware, systems software, etc.- Achieved using Abstract Job Objects (AJO). HPC resources based in centres in Switzerland, Germany, Poland, France, and United Kingdom integrated into a single grid
39
CS602 39 UNICORE Grid Architecture Client: Interface to the user. Prepares and submits the job over the unsecured network to… Gateway:The entry point to the computing centre and secured network. Authenticates the user and passes job to… Server:Schedules the job for execution, translates the job to commands appropriate for the target system. The UNICORE architecture is based on three layers:
40
CS602 40 UNICORE Terminology USiteA site providing UNICORE Services (e.g. CSAR). VSiteA computing resource within the USite. USpaceDedicated file space on VSite. May only exist during the execution of a job. XSpacePermanent storage on the VSite. (e.g. users home directory).
41
CS602 41 UNICORE Security Between user and computing centre communications over SSL Users X.509 certificate stored in the client. Certificate encrypts data using Secure Sockets Layer (SSL) technology - Industry standard method for protecting web communications. - 128-bit encryption strength. Defined in the UNICORE Protocol Layer (UPL) standard. –Prevents eavesdropping on and tampering with communications and data. –Provides instant authentication of visitor's identity instead of requiring individual usernames and passwords. Within the computing centre communications are within secure network Local site policy can specify encrypted communication if necessary.
42
CS602 42 UNICORE Protocol Layer (UPL) Is a set of rules by which data is exchanged between computers. Request/reply structure.
43
CS602 43 The Abstract Job Object (AJO) Collection of approximately 250 Java classes representing actions, tasks, dependencies and resources v4.0 can be downloaded from www.unicore.org. Specify work to be done at a remote site seamlessly No knowledge of underlying execution mechanism required. Example classes: ExecuteScriptTask ListDirectory CompileTask Dependency Processor, Storage Signed, serialised Java object transmitted from the Client to gateway using the UPL
44
CS602 44 Simplified AJO Class Diagram (1) ExecuteScriptTask ChangePermissions, CopyFile, CreateDirectory, DeleteFile, FileCheck, ListDirectory, RenameFile, SymbolicLink AbstractJobAbstractActionDependencyActionGroupAbstractTaskFileTaskFileTransferExecuteTaskFileActionUserTaskResource CopySpooled, DeclarePortfolio, DeleteSpooled, IncarnateFiles, MakePortfolio, Spool, UnSpool CopyPortfolioTask, ExportTask, GetPortfolio, ImportTask, PutPortfolio {ordered} Memory, Node, PerformanceResource, Processor, RunTime, Storage CapacityResource Diagram shows how an Abstract Job object can be constructed from Tasks and groups of tasks. Resources can be allocated to each task..
45
CS602 45 Simplified AJO Class Diagram (2) OutcomeAbstractTask_OutcomeActionGroup_Outcome ChangePermissions_Outcome CopyFile_Outcome CopyPortfolio_Outcome CopyPortfolioToOutcome_Outcome CopySpooled_Outcome CreateDirectory_Outcome DeclarePortfolio_Outcome DeleteFile_Outcome DeletePortfolio_Outcome DeleteSpooled_Outcome ExecuteTask_Outcome ExportTask_Outcome FileCheck_Outcome GetPortfolio_Outcome ImportTask_Outcome IncarnateFiles_Outcome ListDirectory_Outcome MakeFifo_Outcome, MakePortfolio_Outcome, MoveFifoToOutcome_Outcome PutPortfolio_Outcome, RenameFile_Outcome Spool_Outcome, SymbolicLink_Outcome UnSpool_Outcome AbstractJob_Outcome
46
CS602 46 AJO Example 1: ListDirectory add() addResource() :storage:listDirectory:abstractJob Directory set using setTarget(string target) method. AbstractJob consigned to gateway
47
CS602 47 AJO Example 2: ImportTask 5. add() addResource() :copyPortfolioToOutcome:dependency:abstractJob Used to download files on a specified VSite to the Client. Import task imports a file from the Storage area to the jobs USpace. (Portfolio represents a collection of files in the USpace). AbstractJob consigned to gateway. :importTask:storage 3. add() 4. add() 1. add()2. add() File name set using addFile(string target) method Dependency ensures that file(s) are in the USpace before copied to outcome
48
CS602 48 Dependency (1) d1: dependency AJO Example 3: ExecuteScriptTask :executeScriptTask:abstractJob:makePortfolio:incarnateFiles:actionGroup:scriptTyped2 :dependency:resourceSetName (String)Script (byte[ ][ ])Files (String[ ]) IncarnateFiles MakePortfolio ResourceSet + this diagram to be completed… setScriptType() setResource() AbstractJob consigned to gateway Script arguments set using setCommandLine(string args) method. add() Dependencies ensure that files arrive before task is executed add()
49
Manchester Computing Supercomputing, Visualization & e-Science Lectures 4-5: Unicore Client We now present a client side view of the Computational Grid. This will allow you to begin the practical exercises before engaging with the full complexity of the server side components and complete Grid architecture of Unicore. We thank Ralf Ratering of Intel for permission to use this material.
50
CS602 50 UNICORE A production-ready Grid system that connects Supercomputers and Clusters to a Computing Grid. Originally developed in German research projects UNICORE (1997-2000) and UNICORE Plus (2000-2003) –Client implemented by Pallas (now Intel PDSD) –Server implemented by Fujitsu as sub-contractor of Pallas Further enhanced in European research projects –Eurogrid (2000-2003), Grip(2001-2003), OpenMolGrid (2002-2005), NextGrid (2004-2008), SimDat (2004-2008), others Used as middleware for NaReGI
51
CS602 51 The UNICORE Client Graphical Interface to UNICORE Grids Platform-independent Java application Open Source available from UNICORE Forum Functionality: –Job Preparation, Monitoring and Control –Complex Workflows –File Management –Certificate Handling –Integrated Application Support
52
CS602 52 UNICORE Server Components UUDB IDB Client NJS TSI Gateway AJO Incarnation Database Translates AJO to platform specific incarnation Contains resource descriptions Target System Interface Only component that must live on target system Perl or Java implementations Executes jobs or submit jobs to batch sub system Network Job Supervisor Main server component Manages jobs Performs Authorization UNICORE User Database Maps certificates onto logins Abstract Job Object Platform independent description of tasks, dependencies and resources Performs Authentication Runs at DMZ
53
CS602 53 1997199819992000200120022003Today History of UNICORE Client Versions Early Prototypes developed in UNICORE project First stable version 3.0 Final version in UNICORE Plus: 4.1 Build 5 UNICORE 5 Open Source available at www.unicorepro.com www.unicore.org Pallas UNICOREpro version 1
54
CS602 54 Starting the Client Prerequisites: Java ≥ 1.4.1 –If not available, choose bundled download package UNICORE Configuration directory in your HOME directory Get test certificates from Test Grid CA service
55
CS602 55 Ready to go? „Hello Grid World!“ UNICORE Site == Gateway Typically represents a computing center Virtual Site == Network Job Supervisor Typically represents target system DEM O 1. Execute a simple script on the Test Grid 2. Get back standard output and standard error
56
CS602 56 Gateway Behind the Scenes: Authentication Establish SSL Connection Send User Certificate Send Gateway Certificate Trust User Certificate Issuer? Trust Gateway Certificate Issuer? Gateway Certificate Client User Certificate
57
CS602 57 Behind the Scenes: Authorization IDB TSI UUDB Certificate 2 Certificate 3 Certificate 4 Certificate 5 Certificate 1 Login B Login C Login D Login E Login A Typical UNICORE User Test Grid User User Certificate User Login AJO Certificate== SSL Certificate? Client NJS Gateway User Certificate AJO
58
CS602 58 Behind the Scenes: Creation & Submission Script Container Abstract Job Object ExecuteScriptTask IncarnateFiles CLIENT SERVER Script_HelloWorld1234... 1.Create file with script contents 2.Execute as script Job Directory (USpace) A temporary directory at the target system where the job will be executed
59
CS602 59 Monitoring the Job Status Successful: job has finished succesfully Not successful: job has finished, but a task failed Executing: Parts of a job are running or queued Running: Task is running Queued: Task is queued at a batch sub system Pending: Task is waiting for a predecessor to finish Killed: Task has been killed manually Held: Task has been held manually Ready: Task is ready to be processed by NJS Never run: Task was never executed
60
CS602 60 The Primes Example public void breakKey() { try { BufferedReader br = new BufferedReader(new FileReader("primes.txt")); while (true) { inputLine = br.readLine(); st = new StringTokenizer(inputLine," "); val = new BigInteger(st.nextToken()); if ( (N.mod(val).compareTo(BigInteger.ZERO)) == 0) { p = val; q = N.divide(val); return; } } catch (NullPointerException e) { System.out.println("Done!"); } catch (IOException e) { System.err.println("IO Error:" + e); } p = BigInteger.ZERO; q = BigInteger.ZERO; } 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79... ArrBreakKey.java Primes.txt
61
CS602 61 CLIENT SERVER „Gridify“ the Primes Example ArrBreakKey.java Job Directory (USpace) ArrBreakKey.java 1. Import java file ArrBreakKey.class 2. Compile java file 3. Execute class file 4. Get result in stdout/stderr DEM O
62
CS602 62 CLIENT SERVER Behind the Scenes: Software Resources Command Task Executes a Software Resource, or Command (a binary that will be imported into the Job Directory) APPLICATION javac 1.4 Description „Java Compiler“ INVOCATION [ /usr/local/java/bin/javac ] END Incarnation Database (IDB) Application Resources contain system specific information, absolute paths, libraries, environment variables, etc.
63
CS602 63 CLIENT SERVER Behind the Scenes: Fetching Outcome Job Directory (USpace) ArrBreakKey.java Files Directory ArrBreakKey.class 2. Compile java file stdout, stderr 3. Execute class file stdout, stderr Fetch Outcome Session Directory Configurable in User Defaults: Paths->Scratch Directory stdout, stderr
64
CS602 64 Integrated Application Example: POV-Ray Scene Description #include "colors.inc" #include "shapes.inc" camera { location direction z } plane {y, 0.0 texture {pigment {RichBlue }}} object { WineGlass translate -x*12.15} light_source { colour White }... POV-Ray Application CLIENT SERVER Command Line Parameters Display Demo Image from Pov-Ray Distribution Job Directory (USpace) Include Files Libraries Remote File System (XSpace) Input Files Output Image
65
CS602 65 Behind the Scenes: Plug-In Concept Add your own functionality to the Client! –Heavily used in research projects all over the world –More than 20 plug-ins already exist No changes to basic Client Software needed Plug-Ins are written in Java Distribution as signed Jar Archives
66
CS602 66 Existing Plug-Ins (incomplete) CPMD, Car-Parinello Molecular Dynamics (FZ Jülich) Gaussian (ICM Warsaw) Amber (ICM Warsaw) Visualizer (ICM Warsaw) SQL Database Access (ICM Warsaw) PDB Search (ICM Warsaw) Nastran (University of Karlsruhe) Fluent (University of Karlsruhe) Star-CD (University of Karlsruhe) Dyna 3D (T-Systems Germany) Local Weather Model (DWD) POV-Ray (Pallas GmbH) ... Resource Broker (University of Manchester) Interactive Access (Parallab Norway) Billing (T-Systems Germany) Application Coupling (IDRIS France) Plugin Installer (ICM Warsaw) Auto Update (Pallas GmbH) ...
67
CS602 67 Using 3rd Party Plug-Ins Get Plug-in Jar archive from Web-Site, Email, CD-ROM, etc. Store it in Client‘s Plug-In directory Client will check Plug-In Signature Import Plug-In certificates from the Actions menu in the Keystore Editor Is one certificate in the chain a trusted entry in the keystore? Is the signing certificate a trusted entry in the keystore? REJECT yesno Add signing certificate to keystore? LOAD noyes REJECTLOAD yesno
68
CS602 68 Task Plugins Add a new type of task to the Client GUI New task can be integrated into complex jobs Application support: CPMD, Fluent, Gaussian, etc. Add task item Settings item Icon Plugin info
69
CS602 69 Supporting an Application at a Site Install the application itself Add entry to the Incarnation Database (IDB) APPLICATION Boltzmann 1.0 Description „Boltzmann Simulation“ INVOCATION [ /usr/local/boltzmann/bin/linuxExec.bin ] END
70
CS602 70 Plug-In Example: CPMD Workflow for Car–Parrinello molecular dynamics code Input: conf_file2 RESTART Input: conf_file1 re-iterate Wavefunction Optimization Geometry Optimization further optimization ? MD Run Output: stdout stderr RESTART.1, LATEST,... Other... Visualization ? further evaluation
71
CS602 71 Plug-In Example: CPMD CPMD plugin constructs UNICORE workflow
72
CS602 72 Plug-In Example: CPMD CPMD wizard assists in setting up the input parameters
73
CS602 73 Plug-In Example: CPMD Visualize results
74
CS602 74 Extension Plugins Add any other functionality Resource Broker, Interactive Access, etc. JPA toolbar Settings item Extensions menu Virtual site toolbar Plugin info
75
CS602 75 Plug-In Example: Resource Broker Specify resource requests in your job Submit it to a broker site Get back offers from broker
76
CS602 76 CLIENT Example: Steering a Simulation SERVER Lattice-Boltzmann Simulation Code input file reads Editor DEM O output.gif writes Export Panel sample.gif writes Sample Panel control file reads Control Panel Job Directory Plugin Task
77
CS602 77 Specifying Resource Requests Tasks can have resource sets containing requests If not resource set is attached, default resources are used Resource sets can be edited, loaded and saved If a resource request does not match resources available at a site, the Client displays an error Resource Set 1 Resource Set 2
78
CS602 78 Behind the Scenes: Authorization UUDB Client NJS Gateway User Certificate User Login User Certificate AJO User Certificate Sub- AJO Site A UUDB NJS Gateway User Certificate Sub- AJO Site B User Certificate Sub- AJO SSL Certificate == Trusted NJS?
79
CS602 79 Using File Tasks CLIENT SERVER 1SERVER 2 Home Temp Spool Root Local USpace Home Temp Root USpace Storage Server
80
CS602 80 Complex Workflow: Control Tasks Do N LoopDo Repeat LoopHold TaskIf Then Else
81
CS602 81 UNICORE jobs stop execution when a task fails Sometimes Task failure is acceptable –If and DoRepeat conditions –Tasks that try to use restart files –Whenever you do not care about task success Set „Ignore Failure“ flag on Task Behind the Scenes: Ignore Failure Right Mouse Click in Dependency Editor
82
CS602 82 Loops: Accessing the Iteration Counter Iteration variable: $UC_ITERATION_COUNTS Lives on server side Supported in –Script Tasks –File Tasks –Re-direction of stdout/stderr Nested loops: iteration numbers are separated by „ _ “, e.g. „ 2_3 “ Caution: counter will not be propagated to sub jobs
83
CS602 83 Job Monitor Actions Get new status for a site, job or task Get stdout, stderr and exported files of a job Remove job from server. Deletes local and remote temporary directories Kill job Hold job execution Resume a job that was held by a „Hold Job“ action or a Hold task Copy a job from the job monitor. The job can be pasted into the job preparation tree and re-run e.g. with different parameters Show dependencies of job Show resources for task
84
CS602 84 Caching Resource Information Client works on cached resource information –UNICORE Sites, Virtual Sites, available resources Resource Cache will be updated on... –... startup –... refresh on „Job Monitoring“ tree node Client uses cached information in Offline mode
85
CS602 85 Accessing other UNICORE Sites UNICORE Sites will be read from an XML file Can be a URL on the web Virtual Sites are configured at the UNICORE Site Job Monitor Root Performing a „Refresh“ on this node will reload UNICORE Sites
86
CS602 86 Configuration: Using Different Identities Using different identities Key entries: Who am I?
87
CS602 87 Browsing Remote File Systems Remote File Chooser –Used in Script Task, Command Task, for File Imports, Exports, etc. Select virtual site or „Local“ Preemptive file chooser mode will enhance performance on fast file systems
88
CS602 88 The Client Log „clientlog.txt“ or „clientlog.xml“ Used by developers to figure out problems User Defaults->Paths: User Defaults->Logging Settings: Enable under Windows, when no console is used Use PLAIN INFO should be fine
89
CS602 89 Starting the Client Revisited client.jar in lib directory –start with.exe (Windows) or run script (Unix/Linux) –or: „ java –jar client.jar “ Command line options –Choose an alternative configuration directory: -Dcom.pallas.unicore.configpath= –Enable the security manager: -Dcom.pallas.unicore.security.manager
90
CS602 90 Outlook: OGSA Grid Services Client UUDBIDBNJS TSI UPL Grid Service UPL GS Factory Registry UPL GS Factory Handles Register XML File Contains Registry handles in addition to classical UNICORE Site addresses HTTPS Request AJO Passes through firewalls Grid Services invisible to user UPL GS Factory Start UPL GS Handle
91
CS602 91 Summary With the UNICORE Client you can easily run and monitor complex jobs on a UNICORE Grid Download the Client from www.unicore.org or www.unicorepro.com and have fun...
92
Manchester Computing Supercomputing, Visualization & e-Science Lectures 6-7: Programming Unicore Client Plug-Ins We now show how the Unicore client can be extended by programming application-specific plugins. This extends standard Java technology to a Grid context and brings flexibility and generality to the Unicore client. We thank Ralf Ratering of Intel for permission to reproduce this material
93
CS602 93 Overview Introduction –Existing Plug-Ins AJO Plugin –An Extension Plugin submitting „raw“ Abstract Job Objects that do appear in the Job Monitor Small Service Plugin –An Extension Plugin using containers for service jobs that do not appear in the Job Monitor Boltzmann Plugin –A Task Plugin that integrates the Boltzmann Lattice simulation into the Client GUI
94
CS602 94 Job Preparation –File, execution and control tasks –Complex workflows –Editing, copying, –saving, etc. Resource Handling Job Monitoring Job Control Remote File Browsing Certificate Handling Functionality of the UNICOREpro Client
95
CS602 95 Plug-In Concept Add your own functionality to the Client! –Heavily used in research projects all over the world –More than 20 plug-ins already exist No changes to basic Client Software needed Plug-Ins are written in Java Distribution as signed Jar Archives
96
CS602 96 Deployment and Installation User gets plugin jar archive from Web-Site, Email, CD- ROM, etc. Store it in Client‘s plugin path 1.Lib directory 2.User Defaults Plugin directory Client checks plugin jar signature Is one certificate in the chain a trusted entry in the keystore? Is the signing certificate a trusted entry in the keystore? REJECT yesno Add signing certificate to keystore? LOAD noyes REJECTLOAD yesno
97
CS602 97 Task Plugins Add a new type of task to the Client GUI New task can be integrated into complex jobs Application support: CPMD, Fluent, Gaussian, etc. Add task item Settings item Icon Plugin info
98
CS602 98 Extension Plugins Add any other functionality Resource Broker, Interactive Access, etc. JPA toolbar Settings item Extensions menu Virtual site toolbar Plugin info
99
CS602 99 Supporting an Application at a Site Install the application itself Add entry to the IDB APPLICATION Boltzmann 1.0 Description „Boltzmann Simulation“ INVOCATION [ /usr/local/boltzmann/bin/linuxExec.bin ] END
100
CS602 100 Example Use: CPMD Workflow for Car–Parrinello molecular dynamics code Input: conf_file2 RESTART Input: conf_file1 re-iterate Wavefunction Optimization Geometry Optimization further optimization ? MD Run Output: stdout stderr RESTART.1, LATEST,... Other... Visualization ? further evaluation
101
CS602 101 Example Use: CPMD CPMD plugin constructs UNICORE workflow
102
CS602 102 Example Use: CPMD CPMD wizard assists in setting up the input parameters
103
CS602 103 Example Use: CPMD Visualize results
104
CS602 104 Example Use: On Demand Weather Prediction On demand mesoscale weather prediction system Based on relocatable version of DWD’s prediction model Works from regular prediction data, topography and soil database
105
CS602 105 Example Use: On Demand Weather Prediction User Workstation Topography & soil data Regular prediction data GME2LM interpolation to LM grid LM calculation of mesoscale prediction 1–5 MByte 50–100 MByte LM-forecast data visualisation ~50 MByte input datasets for LM (1–20 GByte)
106
CS602 106 Example Use: Coupled CAE Applications Run coupled aerospace simulations (electromagnetism) Use CORBA as coupling substrate Provide internal portal for Airbus engineers
107
CS602 107 Example Use: Resource Broker Specify resource requests in your job Submit it to a broker site Get back offers from broker
108
CS602 108 Existing Application Plug-Ins FZ Jülich –CPMD, OpenMolGrid ICM Warsaw –Gaussian, Amber, SQL Database Access University of Karlsruhe –Nastran, Fluent, Star-CD T-Systems –Dyna 3D DWD –Local Weather Model Pallas GmbH –POV-Ray, Script, Command, Compile, Globus Proxy Certificate
109
CS602 109 Existing Extension Plug-Ins University of Manchester –Resource Broker Parallab Norway –Interactive Access T-Systems Germany –Billing IDRIS France –Application Coupling ICM Warsaw –Plugin Installer Pallas GmbH –Auto Update, AJO Submitter, Small Service Plugin
110
CS602 110 AJO Plugin Idea: Easy way to develop your own AJOs Use Client infrastructure –Certificates –Usites, Vsites and Resources –User interface Use JMC to control AJO –Watch status –Fetch and display Outcome –Send Control Actions
111
CS602 111 Example: Execute an Application Resource Select an Application Resource and execute it at virtual site Submit AJO containing UserTask Use Job Monitor to get back output Implement 2 classes –Main Plugin Class –AJO Request Class Build a Jar Archive named „*Plugin.jar“ Sign the Jar with your Certificate
112
CS602 112 Using Application Resources Incarnation Data Base APPLICATION AJOTest 1.0 Description „Demo Resource for AJO Plugin“ INVOCATION [ echo „Hello World!“ ] END CLIENT SERVER Network Job Supervisor (NJS) Resource Set Memory (64, 128, 32000)... APPLICATION AJOTest 1.0 APPLICATION CPMD 3.1... Context MPI... Resource Manager Plugin AJOTest resource available? Add to AJO UserTask Display message Submit as Request
113
CS602 113 Client Requests GetFilesFromUSpace SendFilesToUspace GetFilesFromXSpace SendFilesToXSpace GetByteArrayFromXSpace SendByteArrayToXSpace GetListings GetUsites GetVsites GetResources GetRunningJobs GetJobStatus GetOutcome GetSpooledFiles ... Client Observer Request Observable Start as new thread Notify when finished
114
CS602 114 Class AJORequest public class AJORequest extends ObservableRequestThread {... public void run() { UserTask userTask = new UserTask("UserTask"); userTask.addResource(software); User user = ResourceManager.getUser(vsite); AbstractJob job = new AbstractJob("AJORequest_„ + ResourceManager.getNextObjectIdentifier()); job.setVsite(vsite); job.setEndorser(user); job.add(userTask); Reply reply=null; try { reply = polling(job, vsite, user); } catch (Exception e) { logger.log(Level.SEVERE, "Submitting AJO in polling mode failed.", e); } notifyObservers(this, reply); } public abstract class ObservableRequestThread extends ObservableThread { public void setInterrupted(boolean interrupted) { public Reply nonPolling(AbstractJob job, Vsite vsite, User user, Vector streamedFiles); public Reply polling(AbstractJob job, Vsite vsite, User user, Vector streamedFiles); } public abstract class ObservableThread extends Thread implements IObservable { public void addObserver(IObserver anObserver); public void deleteAllObservers(); public void deleteObserver(IObserver anObserver); public void notifyObservers(Object theObserved, Object changeCode); }
115
CS602 115 Class AJOPlugin public abstract class UnicorePlugable { public HelpSet getHelpSet() { public abstract String getPluginInfo(); public JMenuItem getSettingsItem() { public abstract void startPlugin(); public abstract void stopPlugin(); protected Client getClient(); } public abstract class ExtensionPlugable extends UnicorePlugable { public JMenuItem getCustomMenu(); public Component getJPAToolBarComponent(); public Component getVsiteToolBarComponent(); public Object setupSpecialVsiteFeatures( Vsite vsite, AbstractJob job); } public class AJOPlugin extends ExtensionPlugable implements IObserver { public String getPluginInfo() { return „AJO plugin example“; } public Component getVsiteToolBarComponent() { return startButton; } public void startPlugin() { startButton = new JButton(new ServiceAction()); } public void stopPlugin() { /* empty */ } private void submitServiceJob(SoftwareResource software, Vsite vsite) { AJORequest request = new AJORequest(software, vsite); request.addObserver(this); request.start(); } public void observableUpdate(Object theObserved, Object changeCode) { Reply reply = (Reply)changeCode;... } private class ServiceAction {... } }
116
CS602 116 Small Service Plugin Idea: Do complete handling of jobs from plugin –Build, submit and monitor AJO –Fetch back outcome and exported files Use Client Containers to construct AJO
117
CS602 117 AJOs and Containers Client containers encapsulate complex AJOs Manage imports, exports and execution Hold parameters, keep status, check errors Execute Group Import Group Export Group
118
CS602 118 Container Hierarchy Add your own container
119
CS602 119 Implementing the Container
120
CS602 120 Small Service Plugin Job Directory serviceOutput.txt SmallServiceContainer CLIENT SERVER Execute writes SmallService AJOGetJobStatus Repeat until Status==DONE GetOutcome Spool Area serviceOutput.txt GetSpooledFiles DeleteJob
121
CS602 121 Class SmallServiceContainer public class SmallServiceContainer extends UserContainer {... public void buildActionGroup() { String unicoreDir = ResourceManager.getUserDefaults().getUnicoreDir(); String userHome = ResourceManager.getUserDefaults().getUserHome(); String filename = userHome + File.separator + "serviceOutput.txt"; FileExport[] exports = { new FileExport(this, FileStorage.NSPACE_STRING, "serviceOutput.txt", filename, true, true)}; setFileExports(exports); super.buildActionGroup(); }
122
CS602 122 Class SmallServicePlugin public class SmallServicePlugin extends ExtensionPlugable implements IObserver { public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof GetJobStatus) {... if (status == AbstractActionStatus.DONE) { sendGetOutcome(); } } else if (theObserved instanceof GetOutcome) { sendGetSpooledFiles(); } else if (theObserved instanceof GetSpooledFiles) { sendDeleteJobs(); } else if (theObserved instanceof DeleteJob) {} } public void startPlugin() { job = new JobContainer(); task = new SmallServiceContainer(job); job.addTask(task); startButton = new JButton(new ServiceAction()); } private void submitServiceJob(Vsite vsite) { job.setName( ResourceManager.getServicePrefix() + "SmallServiceJob" + ResourceManager.getNextObjectIdentifier()); job.setVsite(vsite); job.setUser(ResourceManager.getUser(vsite)); job.run(); } }
123
CS602 123 { folder="."; initcond="spinodal"; steerfile="control"; gifanimfile="output.gif"; unicore_demo = 1; writecolour=1; writecolgif=1; makedir = "yes"; g_cc=2.0 ; tau_r = 1.0 ;tau_b = 1.0; rho = 1.0; tmax=5000 ; dt = 10 ; gravity=0.0; nx=128 ; ny=128; } The Lattice Boltzmann Application Simulation of fluent mixing Output: a gif animation Intermediate sample files are generated Control file can change parameters while application is executing Duration „Mixing Factor“
124
CS602 124 Command Task CLIENT Job Directory Running Boltzmann using a Command Task Input BoltzmannInput.txt SERVER Import with renaming C:\tmp\output.gif Export output.gif readswrites Boltzmann Application Resource Execute Set tmax to 300
125
CS602 125 Disadvantages of Command Task Input file has to be edited outside Client Imports and Exports have to be specified manually No integrated GUI for parameters Results have to be visualized outside client No additional functionality possible –sample files –application steering Use a specialized Boltzmann Plugin Task!
126
CS602 126 The Boltzmann Plugin Task Plugin –Add Boltzmann tasks to jobs –Input file editor –Automatically import input file –Export and visualize sample files –Send control files Implemented Classes –Main plugin class –Plugin Container –JPA Panel –Sample Panel –Control Panel
127
CS602 127 Class BoltzmannPlugin Icon Format public class BoltzmannPlugin extends TaskPlugable { public ActionContainer getContainerInstance(GroupContainer parentContainer) { BoltzmannContainer container = new BoltzmannContainer(parentContainer); container.setName("New_" + getName() + counter); counter++; return container; } public String getIconPath() { return "org/gridschool/unicore/plugins/boltzmann/boltzmann.gif"; } public String getName() { return "Boltzmann"; } public String getPluginInfo() { return "Grid School Example: The Boltzmann Plugin“; } public JPAPanel getPanelInstance(ActionContainer container) { return new BoltzmannJPAPanel(getClient(), (BoltzmannContainer)container); } public void startPlugin() {} public void stopPlugin() {} }
128
CS602 128 PluginJPAPanel CLIENT Job Directory Run and steer Boltzmann from Plugin Input output.gif SERVER Boltzmann Application Resource PluginContainer Export Input file Execute reads SamplePanel Sample.gif writes Get File From Uspace Request writes ControlPanel Control Send File To Uspace Request reads Editor Export Panel
129
CS602 129 Class BoltzmannJPAPanel Set parameters in container Use RemoteTextEditor, ImportPanel and ExportPanel Implements interface Applyable ContainerJPAPanel applyValues resetValues updateValues
130
CS602 130 Remote Text Editor Load, edit and save files from remote and local file spaces private RemoteTextEditor textEditor = new RemoteTextEditor(); private void buildComponents() { JTabbedPane tabbedPane = new JTabbedPane(); tabbedPane.add(textEditor, "Input File");... } public void applyValues() { container.setInputFile(textEditor.getFile()); container.setInputString(textEditor.getText());... } public void resetValues() { textEditor.setText(container.getInputString()); textEditor.setFile(container.getInputFile());... } public void updateValues(boolean vsiteChanged) { if (vsiteChanged) { textEditor.setVsite(container.getVsite()); }... }
131
CS602 131 Import and Export Panels Specify file imports and exports from the GUI Use out of the box New Import Remove Import Browse file systems
132
CS602 132 Class BoltzmannContainer public class BoltzmannContainer extends UserContainer { private String inputString; protected void buildExecuteGroup() { byte[] contents = StringTools.dos2Unix(inputString).getBytes(); IncarnateFiles incarnateFiles =new IncarnateFiles("INCARNATEFILES"); incarnateFiles.addFile(INPUT_FILENAME, contents); ResourceSet taskResourceSet = getResourceSet().getResourceSetClone(); taskResourceSet.add(getPreinstalledSoftware()); UserTask executeTask = new UserTask(getName(), null, taskResourceSet, getEnv(), getCommandLine(), null, getRedirectStdout(), getRedirectStderr(), isVerboseOn(), isVersionOn(), null,getMeasureTime(), getDebug(), getProfile()); executeGroup = new ActionGroup(getName() + "_EXECUTION"); executeGroup.add(incarnateFiles); executeGroup.add(executeTask); try { executeGroup.addDependency(incarnateFiles, executeTask); } catch (InvalidDependencyException e) { logger.log(Level.SEVERE, "Cannot add dependency.", e); } } public ErrorSet checkContents() { ErrorSet err = super.checkContents(); if (inputString == null || inputString.trim().length() == 0) { err.add(new UError(getIdentifier(), "No input file specified")); } } }
133
CS602 133 Additional Outcome Panels Implement interface IPanelProvider in Container public class BoltzmannContainer extends UserContainer implements IPanelProvider {.... public int getNrOfPanels() { return 2; } public JPanel getPanel(int i) { if (i == 0) { if (samplePanel == null) { samplePanel = new BoltzmannSamplePanel(); } return samplePanel; } else { if (controlPanel == null) { controlPanel = new BoltzmannControlPanel(); } return controlPanel; } } public String getPanelTitle(int i) { if (i == 0) { return "Sample"; } else { return "Control"; } } public void finalizePanel() {} }
134
CS602 134 Class BoltzmannControlPanel public class BoltzmannControlPanel extends JPanel implements IObserver { private RemoteTextEditor editor;... private JobContainer getJobContainer() { return ResourceManager.getCurrentInstance().getJMCTree().getCurrentJob(); } private BoltzmannContainer getBoltzmannContainer() { return (BoltzmannContainer) ResourceManager.getCurrentInstance().getJMCTree().getFocussedObject(); } private void sendControlFile() { JobContainer jobContainer = getJobContainer(); AJOIdentifier ajoId = (AJOIdentifier)jobContainer.getIdentifier(); Vsite vsite = jobContainer.getVsite(); String[] filenames = {CONTROL_FILE}; byte[][] contents = new byte[1][]; String inputString = StringTools.dos2Unix(editor.getText()); contents[0] = inputString.getBytes(); SendFilesToUspace request = new SendFilesToUspace(ajoId, filenames, contents, vsite); request.addObserver(this); request.start(); } public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof SendFilesToUspace) { AbstractJob_Outcome outcome = (AbstractJob_Outcome)changeCode; logger.info("SendFilesToUspace result: " + outcome.getStatus()); } } }
135
CS602 135 Class BoltzmannSamplePanel public class BoltzmannSamplePanel extends JPanel implements IObserver {... private void getSampleFile() { JobContainer jobContainer = getJobContainer(); AJOIdentifier ajoId = (AJOIdentifier)jobContainer.getIdentifier(); Vsite vsite = jobContainer.getVsite(); String[] filenames = {SAMPLE_FILE}; GetFilesFromUspace request = new GetFilesFromUspace(ajoId, filenames, vsite); request.addObserver(this); request.start(); } public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof GetFilesFromUspace) { AbstractJob_Outcome outcome = (AbstractJob_Outcome)changeCode; logger.info("GetFileFromUspace result: " + outcome.getStatus()); if (outcome.getStatus().isEquivalent(AbstractActionStatus.SUCCESSFUL)) { GetFilesFromUspace request = (GetFilesFromUspace)theObserved; File imageFile = (File)request.getLocalFiles().firstElement(); Image image = Toolkit.getDefaultToolkit().createImage(imageFile.getAbsolutePath()); imagePanel.setImage(image); imagePanel.repaint(); return; } } } }
136
CS602 136 Summary Extension Plugins –Easy way to submit custom AJOs –Use Client infrastructure Task Plugins –Integrated Application support –Use sub classes of UserContainer –Use Client GUI elements UNICOREpro Client Plugin Programmer‘s Guide –www.unicorepro.com → Documentswww.unicorepro.com
137
Manchester Computing Supercomputing, Visualization & e-Science Lecture 8: Resource Broker A resource broker for Unicore. This software was designed to provide and imporatant Grid abstraction, namely that the middleware should find the resources appropriate to the users request. In this way the user does not need to know what resources are on the Grid or to maintain lists of appropriate resources.
138
CS602 138 Abstract Functions for a Resource Broker Resource discovery, for workflows as well as single jobs. Resource capability checking, do the offering sites have ALL necessary capability and environmental support for instantiating the workflow. Inclusion of Quality of Service policies in the offers. Information necessary for the negotiation between client and provider and mechanisms for ensuring contract compliance. Document submitted to GPA-RG group of GGF.
139
CS602 139 Design of EuroGrid Resource Broker To utilise the structure of UNICORE, in particular the AJO. To utilise the Usite/Vsite structure, in particular to extend the Vsite to the concept of a Brokering Vsite. Two modes of operation are possible: A simple Resource Check request: “Can this job run here”, checks static qualities like software resources (e.g. Gaussian98 A9) as well as dynamic resources like quotas (disk quotas, CPU, etc.) A Quality of Service request: returns a range of turnaround time, and cost, as part of a Ticket. If the Ticket is presented (within its lifetime) with the job, the turnaround and cost estimates should be met.
140
CS602 140 Ancestral Broker The API allows two levels of operation: Resource Checking: Static requirements, capability and capacity. QoS Checking: Performance vs cost. Tickets can be issued as a “guarantee”. Protocol can be used symmetrically by Broker. User Broker NJS Execution NJS Execution NJS 1 CheckQoS 2 CheckQoS 3 CheckQoS_Outcome 4 CheckQoS_Outcome
141
CS602 141 The Brokering Process T3E O3000 VPP300 Broker User 1. QoS Request 4. Ticket(s) 3. Ticket(s) 2. QoS Request
142
CS602 142 Resource Broker Graph Manchester Computing EUROGRID Green (O3000) Fuji (VPP) Turing (T3E) IDRIS Cray T3E IBM ??? LeSC UK Grid ?? ? T3Es `R’ US BrokeringVsite Dumb Vsite Figure 1: Possible Resource Broker Graph
143
CS602 143 Advanced Features of the Broker In addition the interface allows: a Ticket may contain a modified resource set which must be used with the Ticket. Multiple Tickets may be returned from a single site This powerful mechanism allows a Broker to: Bid for jobs requiring resources it can’t find, e.g. when receiving a bid requiring 256 processors for 1 hour, it could return a Ticket for 128 processors for twice as long Return a spread of offers, representing different priorities and corresponding costs Refine abstract application-specific resource requirements to concrete resource requirements, using built-in performance information about the code.
144
CS602 144 Example Application-Specific Broker An application Specific Resource Requirement is defined, that can express “Run the DWD local weather model code, over X grid points, simulating a Y hour period”. Standardisation is controlled by the Sun-style Community Source Licence (new interfaces must be returned to the community for publication). A broker can then be designed that takes such a resource requirement, e.g. “DWD local weather model code, over 1000 grid points, simulating over a 24 hour period”, and returns concretised offers, such as “16 T3E processors for an hour, done by midday, for £32”, “32 O3000 processors for 30 minutes, done in an hour, for £45”. So the knowledge of the application performance is kept in a single place – the Broker, away from the users. Consequently, the users never have to learn the code’s performance characteristics.
145
CS602 145 Interoperability across Grids The emerging infrastructure with multiple Grids is already complex. One cannot guarantee to have a uniform middleware such as UNICORE or Globus across all Grids. Therefore a translation service is necessary. We can link this to semantic information via a Grid Resource Ontology. We are then starting to get to the right level of abstraction for a genuine infrastructure for computational resource. It now no longer matters what you call this, the abstractions reflect the underlying reality of usage and must be flexible to change with differing usage.
146
CS602 146 Interoperable Broker: Method 1 1.The Network Job Supervisor delegates the Resource Check to the Broker at the Vsite. 2.The Unicore brokering track utilises the Incarnation Data Base exactly as for the ancestral broker. 3.The Globus track uses a translator of the QoS check object. The translation service is extendable. 4.The results of the translation are used to drive the LDAP search and the Globus broker then utilises MDS to perform this.
147
CS602 147 NJS Broker Unicore Broker Globus Broker IDB Translator Filter Basic Translator MDS(GRIIS/GRIS) Delegates resource check Lookupresources Delegates translation Uses to drive LDAP search Performs Diagram Of Broker Architecture Architecture: Method 1
148
CS602 148 Ontologies Need ontologies at BOTH application and infrastructure level. If we can create a Grid Resource Ontology, creation of specialist translation classes from basic Grid translator becomes possible. Incarnation Data Base at sites can be created via ontology, it contains site specific information which the clients job specification cannot do. So brokers take client request formulated in RR space, at each site use translator to convert to RR space, offers come back with capability and QoS.
149
CS602 149 NJS Broker Unicore Broker Globus Broker IDB Translator Filter Ontology engine Resource Discovery Service Resource Discovery Service Delegates resource check Lookupresources Delegates translation Uses to drive MDS search Hierarchical Grid Search Diagram Of Broker Architecture Architecture: Method 2 Filter Uses to Drive MDS Search Nodal Grid Search OtherBrokers Resource Discovery Service Resource Discovery Service
150
Manchester Computing Supercomputing, Visualization & e-Science Lectures 9-10: Grid Interoperability A useful test of the validity of the Grid concept is to show that different middleware systems can be made to interoperate. Here we show how the Grid Interoperability Project enabled the high-level abstractions of Unicore to be mapped onto the Globus toolkit. We thank Phillip Wieder for permission to use this material.
151
CS602 151 Outline Introduction to GRIP Interoperability Layer: Design Interoperability Layer: Realisation UNICORE – Globus: Work in Progress Summary
152
CS602 152 The GRid Interoperability Project Development of an interoperability layer between the two Grid systems Interoperable applications Contributions made to the Global Grid Forum UNICORE towards Grid Services … to realise the interoperability of UNICORE and Globus and to work towards standards for interoperability in the Global Grid Forum: See http://www.grid-interoperability.org
153
CS602 153 Focus of this Talk … to describe the UNICORE – Globus interoperability layer in detail concentrating the design & the implementation. Out of scope Interoperable applications Standardization work Briefly discussed Interoperable resource broker
154
CS602 154 Interoperability Layer: Design
155
CS602 155 Client NJS Gateway TSI UUDB Gateway NJS (brokering) USite Starting Point: UNICORE Architecture UPL (Unicore Protocol Layer) NJS TSI Protocol IDB Abstract Non-abstract authorisation authentication incarnation multi-site jobs Batch system cmds files UPL Target System A Uspace Target System B
156
CS602 156 The software the TSI interfaces to can be: Batch system or scheduler UNIX shell (batch system emulated) GRID resource manager component (like Globus GRAM) Term “Batch system” used in this talk Target system == Execution system TSI runs on Note of Clarification
157
CS602 157 Client NJSUUDB Gateway Interfacing Globus through UNICORE IDB Globus server Target System Uspace TSI Globus client MDS Globus host MDS – Monitoring & Discovery Service
158
CS602 158 Challenges Grid approach UNICORE: User oriented workflow environment Globus: Services, APIs & portal builder UNICORE as a workflow portal for Globus Security UNICORE: End-to-end security model Globus: Requires transitive trust Don’t violate UNICORE’s security model Resource description UNICORE: One model for discovery & request Globus: Different models Map from MDS (LDAP), map to RSL
159
CS602 159 Interoperability Modules The following modules have been defined: Security Resources & information Job preparation Job submission & monitoring & control Output retrieval & file management
160
CS602 160 Interoperability Layer: Realisation
161
CS602 161 Security Basics Public/private key infrastructure to establish connections X509v3 certificates (incl. extensions) UNICORE: End-to-end security, jobs signed Keys & certificates are stored in a keystore at the client side Globus: Transitive trust, proxy certificates Keys & certificates are stored on the file system
162
CS602 162 Client NJS Gateway Interfacing Globus through UNICORE Globus server Target System Uspace TSI Globus client MDS Globus host X.509 user cert Globus proxy GSI enabled auth & comm GSI – Grid Security Infrastructure
163
CS602 163 Security Interoperation Proxy Certificate Plugin generates a proxy from the UNICORE user’s private key The proxy certificate is transferred to the user’s Uspace Proxy used for every task involving GSI enabled authentication & communication Configure Globus client (TSI) to use proxy Configure Globus server to trust signing CA Details next slide...
164
CS602 164 CLIENT SERVER SSO Proxy certificate Create proxy & encapsulate in Site-specific Security Object (SSO) Network Job Supervisor (NJS) Gateway Job Directory (Uspace) Unpack proxy into $USPACE/.proxy Proxy certificate Proxy Certificate Creation & Transfer
165
CS602 165 Resources & Information Globus host specific information (hostname, port,...) is configured at the TSI No extensions to the UNICORE Incarnation Database Interoperable Resource Broker for UNICORE IDB and Globus Monitoring & Directory Service (MDS) Alpha version Currently mapping between UNICORE & MDS resource descriptions Extensible
166
CS602 166 Filter Resource Broker Architecture (early 2003) Translator GIIS – Grid Index Information Service GRIS – Grid Resource Information Service Broker UNICORE Broker Globus Broker NJSIDBMDS (GIIS/GRIS) Basic Translator Delegates resource check Lookup resources Delegates translation Uses to drive LDAP search Performs LDAP search Diagram: John Brooke, University of Manchester
167
CS602 167 The Target System Interface (TSI)... implements the target system/batch system specific functions to manage the incarnated tasks on the specific system. Normally runs as root (set*id) Single threaded, multiple workers to support multi- threaded NJS NJS – TSI communication via plain sockets Two implementations: Perl & Java
168
CS602 168 TSI Flavours TSI Grid Service work in progress Globus 3 Globus 2 Unix Shell × Batch system JavaPerl TSI Impl. Target × × × work in progress prototype using OGSI::Lite
169
CS602 169 TSI: Perl Implementation VendorTypeOSBatch Sub-System HitachiSR 8000HI-UX/MPPNQS IBMSPAIX LoadLeveler (+DCE), LSF (prototype) FujitsuVPP seriesUXP/VNQS NECSX seriesSuper UXNQS CrayT3E, SV1UNICOSNQE Various PCsIA32 clustersLinuxPBS, CCS SGIO2000/3000, OnyxIRIXNQS Workstations (e.g. SUN, SGI, Linux) nativeemulated
170
CS602 170 TSI: Java Implementation... implements the same functionality as the Perl TSI. Alpha version Unix only since uses set*id via Java Native Interface (JNI) Globus 2 version makes use of the Java CoG Kit Basis for interface to Globus Toolkit 3 (work in progress) NJS remains unchanged
171
CS602 171 The Globus TSI Implemented interop. modules: Job preparation Job submission & monitoring & control Output retrieval & File management Target system: Globus Toolkit 2.x Perl (beta, inside firewall) & Java (alpha, outside firewall) implementations Both versions under development Current focus: GT3 & TSI Grid Service
172
CS602 172 Batch-/Operating system (PBS, LSF, Linux,...) TSI Shepherd NJS initiatecontrol / data fork TSI: Architecture (Perl Implementation) TSI Workers batch / os commands TSI
173
CS602 173 TSI Shepherd TSI Workers NJS initiatecontrol / data fork Globus proxy Globus 2 Batch-/Operating System Globus TSI Globus protocols TSI Globus server Globus client
174
CS602 174 Job Preparation Globus RSL job &("executable"=/var/eurogrid/Globus/bin/globus-sh-exec) ("directory"=/filespace/uspace_d3d775a/) ("hostCount"="1") ("count"="1") ("maxTime"="10") ("maxMemory"="1024") ("queue"="low") ("stdout"=https://...:39553/filespace/outcome_d3d775a/.../stdout) ("stderr"=https://...:39553/filespace/outcome_d3d775a/.../stderr) Incarnated UNICORE job #TSI_USPACE_DIR /filespace/uspace_d3d775a/ #TSI_OUTCOME_DIR /filespace/outcome_d3d775a/.../ #TSI_TIME 600 #TSI_MEMORY 1024 #TSI_NODES NONE #TSI_PROCESSORS 1 Mapping
175
CS602 175 Job Submission, Monitoring & Control Globus proxy GRAM Client GRAM Gatekeeper Globus 2 GRAM Job Manager Batch-/Operating system create TSI &("executable"=/var/eurogrid/Globus/bin/globus- sh-exec) ("directory"=/filespace/uspace_d3d775a/) ("hostCount"="1") ("count"="1") ("maxTime"="10") ("maxMemory"="1024") ("queue"="low") ("stdout"=https://...:39553/filespace/outcome_d3 d775a/.../stdout) ("stderr"=https://...:39553/filespace/outcome_d3 d775a/.../stderr) Job submission Job control & status info TSI Worker GRAM – Globus Resource Allocation Manager
176
CS602 176 TSI Worker Globus proxy GRAM Client GASS Server GRAM Gatekeeper Globus 2 GRAM Job Manager GASS Client Batch-/Operating system TSI Output Retrieval stdout & stderr create Job submission Job control & status info GASS – Global Access to Secondary Storage
177
CS602 177 File Management Necessary if TSI & Globus on different target systems Usage of GridFTP or GASS (automatic staging possible) Maintainance of remote Uspace (“Gspace”) TSI Worker Globus proxy GRAM Client GASS Server GRAM Gatekeeper Globus 2 GRAM Job Manager GASS Client TSI stdout & stderr create Job submission Job control & status info GSpace file staging & maintainance USpace
178
CS602 178 TSI Shepherd NJS initiatecontrol / data fork Globus proxy GRAM Client GASS Server GRAM Gatekeeper Globus 2 GRAM Job Manager GASS Client Batch-/Operating System create Globus 2 Target System Interface Globus protocols Job preparation TSI GSpace USpace
179
CS602 179 Globus API Submission: globusrun (returns ) Monitoring: globusrun –status Control: globus-job-cancel Output retrieval: globus-gass-server File Transfer: globus-url-copy (supports GridFTP & HTTP(S) for GASS transfers)... or the corresponding Java Commodity Grid (CoG) Kit API methods (Java TSI)
180
CS602 180 Behind the Scenes: Create Uspace # Incarnation of task # Incarnation produced for Vsite at...... #TSI_IDENTITY zdv190 NONE #TSI_USPACE_DIR /opt/Unicore/filespace/uspace_8fdce574/ #TSI_EXECUTESCRIPT # Commands to incarnate a Uspace /bin/mkdir -p -m700 /opt/Unicore/filespace/uspace_8fdce574/ /bin/mkdir -p -m700 /opt/Unicore/filespace/outcome_8fdce574/...
181
CS602 181 Behind the Scenes: Job Submission # Incarnation of task... #TSI_IDENTITY zdv190 NONE #TSI_USPACE_DIR /opt/Unicore/filespace/uspace_8fdce574/ #TSI_SUBMIT #TSI_JOBNAME SimpleScript #TSI_OUTCOME_DIR /opt/Unicore/filespace/outcome_8fdce574/AA.. #TSI_TIME 600 #TSI_MEMORY 1024 #TSI_NODES 1 #TSI_PROCESSORS 1 #TSI_HOST_NAME zam289_grip_test... #TSI_QUEUE low #TSI_EMAIL NONE... # Incarnation of ExecuteTask, UserTask or ExecuteScriptTask... RETURNS: https://zam289.zam.kfa-juelich.de:32894/2796/1061744497/
182
CS602 182 Behind the Scenes: Job Monitoring #TSI_IDENTITY zdv190 NONE #TSI_GETSTATUSLISTING RETURNS: QSTAT https://zam289.zam.kfa-juelich.de:32894/2796/1061744497/ RUNNING
183
CS602 183 TSI Modules tsi: Perl script to be executed, TSI configuration Globus server information Initialisation: Contact NJS; create Workers Start repository process MainLoop: Listen to NJS & process input No changes
184
CS602 184 TSI Modules (cont.) Submit: Job submission to resource manager, returns jobID; prerequ.: pre-staging complete, target job description available GetStatusListing: Returns list of states (SUCCESSFUL, FAILED, PENDING, QUEUED, EXECUTING) for known jobIDs JobControl: abort, cancel, hold, resume Job submission/ monitoring/ control
185
CS602 185 TSI Modules (cont.) PutFiles: Writes files sent by NJS to target system GetDirectory: Return dir. & content to NJS EndProcessing: Job finished (check for stdout & stderr)? Close GASS server, update repository Reporting: Logging, debugging Log Globus output File transfer
186
CS602 186 TSI Modules (cont.) BecomeUser: set*id; No changes ExecuteScript: Execute script; No changes DataTransfer: GASS control Globus: Job repository & Globus specific var.s JobPreparation: Mapping from UNICORE job description to RSL Globus TSI specific
187
CS602 187 “Classic” TSI Setup SSL Client Client firewall Gateway NJS TSI Globus Server Server firewall SSL Server demilitarized zone TSI & Globus on target system & inside firewall: +Ignore Globus firewall issues +Uspace == “Gspace” -“Restricted” interoperability -> no direct remote access
188
CS602 188 “Remote” TSI Setup TSI & Globus outside firewall & on different machines: +Interoperation with any Globus server possible -Maintainance of temporary “Gspace” SSL Client Client firewall Gateway NJS TSI Globus Server Server firewall SSL Server demilitarized zone SSL
189
CS602 189 UNICORE – Globus: Work in Progress
190
CS602 190 TSI Developments Java TSI as Grid Service Client (GT3) Currently only Job Submission Add file transfer & other services TSI Grid Service & NJS – TSI protocol TSI portType(s) (WSDL) XML Schema message definition Perl TSI & OGSI::Lite hosting environment
191
CS602 191 TSI Shepherd TSI Workers NJS initiatecontrol/data as before Globus proxy Grid Service Client Master Job Factory Service (MJFS) Managed Job Service (MJS) creates web services Interfacing GT3 GRAM Globus 3 TSI createService create SOAP Batch system
192
CS602 192 Batch-/Operating system (PBS, LSF, Linux,...) TSI Grid Service Factory NJS createServiceSOAP messages create TSI Grid Service TSI Grid Service Instances batch / os commands OGSI::Lite TSI Grid Service Client
193
CS602 193 Resource Broker Developments UNICORE ontology (basis: JavaDoc) Ontology for MDS (basis: GLUE schema) Ontology mapping Integrate ontology engine into broker Resource broker portType Towards a Grid resource ontology
194
CS602 194 Resource Broker Architecture Diagram: Donal Fellows, University of Manchester Compute Resource Broker NJS IDB UUDB ExpertBroker DWDLMExpert Other LocalResourceChecker UnicoreRC GlobusRC Translator OntologicalTranslator Ontology SimpleTranslator MDSGRAM TSI ICMExpert Look up static resources Look up configuration Verify delegated identities Delegate to application-domain expert code Delegate to Grid architecture-specific engine for local resource check Pass untranslatable resources to Unicore resource checker Look up resources Look up dynamic resources Delegate resource domain translation Look up translations appropriate to target Globus resource schema Broker hosted in NJS Get back set of resource filters and set of untranslatable resources TicketManager UNICORE Components EUROGRID Broker Globus Components GRIP Broker Key: Inheritance relation Get signed ticket (contract) Look up signing identity
195
CS602 195 Other Activities XML Schema for UNICORE resource model OGSI’fication: UUDB portType, resource database portType,... UNICORE Service Data Framework GGF: standardize portTypes, protocols,...... GRIP is not the end
196
CS602 196 Summary
197
CS602 197 Interoperability Abstraction Single sign-on: Use SSO to transfer alternative security credentials through to the target system Resource discovery: extend resource broker Resource request: map UNICORE job description to representation needed Use batch system specific APIs/commands for job submission/monitoring & data transfer Income/Outcome staging to/from Uspace Note: This MAY imply changes not only to the TSI
198
CS602 198 How to Start? Take interoperability modules as starting point Consider security & resource/information representation/management carefully Define UNICORE client extensions if necessary Are server modifications necessary? Specify Perl modules to be implemented/changed
199
CS602 199 Recommended Reading Grid Interoperability Project: http://www.grid-interoperability.org UNICORE software download: http://www.unicore.org/downloads.htm UNICORE Plus Final Report: http://www.unicore.org/documents/UNICOREPlus- Final-Report.pdf (Good intro to UNICORE) “An Analysis of the UNICORE Security Model”, GGF public comment period: http://sourceforge.net/projects/ggf (contains GRIP part; subsequent docs ready for submission)
200
CS602 200 Recommended Reading (cont.) Java Commodity Grid Kit: http://www-unix.globus.org/cog/java/index.php (Also good intro to Globus programming) Globus Resource Allocation Manager: http://www-unix.globus.org/developer/resource- management.html “Globus Firewall Requirements” http://www.globus.org/security/v2.0/Globus%20Fi rewall%20Requirements-5.pdf OGSI::Lite – A Perl Hosting Environment: http://www.sve.man.ac.uk/Research/AtoZ/ILCT
201
Manchester Computing Supercomputing, Visualization & e-Science Lecture 12 - Case Study This case study presents the RealityGrid project. It has used most types of Grid middleware, Unicore, Globus and its own Perl web services implementation OGSI::Lite. It has also created application APIs for computational steering
202
CS602 202 The RealityGrid Project Mission: “Using Grid technology to closely couple high performance computing, high throughput experiment and visualization, RealityGrid will move the bottleneck out of the hardware and back into the human mind.” Scientific aims: to predict the realistic behavior of matter using diverse simulation methods (Lattice Boltzmann, Molecular Dynamics and Monte Carlo) spanning many time and length scales to discover new materials through integrated experiments.
203
CS602 203 Partners Academic University College London Queen Mary, University of London Imperial College University of Manchester University of Edinburgh University of Oxford University of Loughborough Industrial Schlumberger Edward Jenner Institute for Vaccine Research Silicon Graphics Inc Computation for Science Consortium Advanced Visual Systems Fujitsu
204
CS602 204 RealityGrid Characteristics Grid-enabled (Globus, UNICORE) Component-based, service-oriented Steering is central –Computational steering –On-line visualisation of large, complex datasets –Feedback-based performance control –Remote control of novel, grid-enabled, instruments (LUSI) Advanced Human-Computer Interfaces (Loughborough) Everything is (or should be) distributed and collaborative High performance computing, visualization and networks All in a materials science domain –multiple length scales, many "legacy" codes (Fortran90, C, C++, mostly parallel)
205
CS602 205 Exploring Parameter Space through Computational Steering Initial condition: Random water/ surfactant mixture. Self-assembly starts. Rewind and restart from checkpoint. Lamellar phase: surfactant bilayers between water layers. Cubic micellar phase, low surfactant density gradient. Cubic micellar phase, high surfactant density gradient.
206
CS602 206 Computational Steering – Why? Terascale simulations can generate in days data that takes months to understand Problem: to efficiently explore and understand the parameter spaces of materials science simulations Computational steering aims to short circuit post facto analysis –Brute force parameter sweeps create a huge data-mining problem –Instead, we use computational steering to navigate to interesting regions of parameter space –Simultaneous on-line visualization develops and engages scientist's intuition –thus avoiding wasted cycles exploring barren regions, or even doing the wrong calculation
207
CS602 207 Computational Steering – How? We instrument (add "knobs" and "dials" to) simulation codes through a steering library Library provides: –Pause/resume –Checkpoint and windback –Set values of steerable parameters –Report values of monitored (read-only) parameters –Emit "samples" to remote systems for e.g. on-line visualization –Consume "samples" from remote systems for e.g. resetting boundary conditions Images can be displayed at sites remote from visualization system, using e.g. SGI OpenGL VizServer, or Chromium Implemented in 5+ independent parallel simulation codes, F90, C, C++
208
CS602 208 Philosophy Provide right level of steering functionality to application developer Instrumentation of existing code for steering –should be easy –should not bifurcate development tree Hide details of implementation and supporting infrastructure –eg. application should not be aware of whether communication with visualisation system is through filesystem, sockets or something else –permits multiple implementations –application source code is proof against evolution of implementation and infrastructure
209
CS602 209 Steering and Visualization Simulation Visualization data transfer Client Steering library Display
210
CS602 210 Architecture Communication modes: Shared file system Files moved by UNICORE daemon GLOBUS-IO SOAP over http/https Simulation Visualization data transfer Client Steering library Data mostly flows from simulation to visualization. Reverse direction is being exploited to integrate NAMD&VMD into RealityGrid framework.
211
CS602 211 Steering in the OGSA Steering client Simulation Steering library Visualization Registry Steering GS connect publish find bind data transfer publish bind Client Steering library
212
CS602 212 Steering in OGSA continued… Each application has an associated OGSA-compliant “Steering Grid Service” (SGS) SGS provides public interface to application –Use standard grid service technology to do steering –Easy to publish our protocol –Good for interoperability with other steering clients/portals –Future-proofed next step to move away from file-based steering or Modular Visualisation Environments with steering capabilities SGSs used to bootstrap direct inter-component connections for large data transfers Early working prototype of OGSA Steering Grid Service exists –Based on light-weight Perl hosting environment OGSI::Lite –Lets us use OGSI on a GT2 Grid such as UK e-Science Grid today
213
CS602 213 Steering client Built using C++ and Qt library – currently have execs. for Linux and IRIX Attaches to any steerable RealityGrid application Discovers what commands are supported Discovers steerable & monitored parameters Constructs appropriate widgets on the fly Web client (portal) under development
214
CS602 214 program lbe use lbe_init_module use lbe_steer_module use lbe_invasion_module RealityGrid-L2: LB3D on the L2G Visualization SGI Onyx Vtk + VizServer Simulation LB3D with RealityGrid Steering API Laptop Vizserver Client Steering GUI GLOBUS used to launch jobs SGI OpenGL VizServer Simulation Data GLOBUS-IO Steering (XML) File based communication via shared filesystem: Steering GUI X output is tunnelled back using ssh. ReG steering GUI
215
CS602 215 Performance Control application component 1 component 2 component 3 application performance steerer component performance steerer
216
CS602 216 Advance Reservation and Co-allocation: Summary of Requirements Computational steering + remote, on-line visualization demand: –co-allocation of HPC (processors) and visualization (graphics pipes and processors) resources –at times to suit the humans in the loop advanced reservation For medium to large datasets, Network QoS is important –between simulation and visualization, –visualisation and display Integration with Access Grid –want to book rooms and operators too Cannot assume that all resources are owned by same VO Want programmable interfaces that we can rely on –must be ubiquitous, standard, and robust Reservations (agreements) should be re-negotiable Hard to change attitudes of sysadmins and (some) vendors
217
CS602 217 Steering and Workflows Steering adds extra channels of information and control to Grid services. Steering and steered components must be state-aware, underlying mechanisms in OS and lower-level schedulers, monitors, brokers must be continually updated with changing state. How do we store and restore the metadata for the state of the parameter space search? Human factors are built into our architecture, humans continually interact with orchestrated services. What implications for workflow languages?
218
CS602 218 Collaborative Aspects Multiple groups exploring multiple regions of parameter space. How to record and restore the state of the collaboration? How to extend the collaboration over multiple sessions? What are the services and abstractions necessary to bootstrap collaborative sessions? How do we reliably recreate the resources required by the services, in terms of computation, visualization, instrumentation and networking.
219
CS602 219 Integration with Access Grid? Service for Bootstrapping session Contains “just enough” Information to start other Services, red arrows indicate bootstrapping Virtual Venues Server Multicast addressing Bridges Visualization Workflow Workflows saved from Previous sessions or Created in this session Simulation Workflow Workflows saved from Previous sessions or Created in this session Data Source Workflow Workflows saved from Previous sessions or Created in this session Process Repository Collaborative processes Captured using ontology Can be enacted by Workflow engines Application Repository Uses application specific ontology to describe what in silico processes need To be utilised for the session Participants location and access rights Application data, computation and visualization requirements Who participates? What do they use?
220
CS602 220 How far have we got? Linking US Extended Terascale Facilities and UK HPC resources via a Trans- Atlantic Grid We used these combined resources as the basis for an exciting project –to perform scientific research on a hitherto unprecedented scale Computational steering, spawning, migrating of massive simulations for study of defect dynamics in gyroid cubic mesophases Visualisation output was streamed to distributed collaborating sites via the Access Grid Workshop presentation with FZ Juelich and HLRS, Stuttgart on the theme of computational steering. At Supercomputing, Phoenix, USA, November 2003 TRICEPS entry won “Most Innovative Data-Intensive Application”
221
CS602 221 Summary All our workflow concepts are built around the idea of Steerable Grid Services. Resources used by services have complex state, may migrate, may be reshaped. Collaborative aspects of “Humans in the loops” are becoming more and more important. The problems of allocating and managing the resources necessary for realistic modelling are very hard, they require (at present) getting below the Grid abstractions. Clearly the Grid abstractions are not yet sufficiently comprehensive and in particular lack support for expression of synchronicity.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.