Building Problem Solving Environments with Application Web Service Toolkits Choonhan Youn and Marlon Pierce Computer Science, Syracuse University And Community Grid Labs, Indiana University
Presentation Outline Introduction –What is the Computational web portal? –Gateway: computing web portal –Limitations of traditional approach Web Service-Based Computing Portal Architecture Core Web services for Computing Portals –Job submission –File Manipulation –Context Management –Script Generation –Job monitoring Application Web services Web service negotiation. Conclusion
Computational Web Portals Computational Web Portals provide seamless access to HPC resources –You can log in anywhere through any general web browser. Portals simplify the use of HPCs for novice users. –Basics: batch script generation, job submission and monitoring, file service and …… –Computational grid services: Globus, Condor Portals can simplify the use of unfamiliar codes. –GEM code: disloc, simplex Provide a work management environment for all users. –You can see what you did last week. Other PSEs Web portals –NASA Information Power Grid LaunchPad –NPACI Hotpage –Pacific Northwest National Laboratory’s Ecce system, UNICORE –Our own Gateway/ServoGrid projects
Gateway project Gateway is a computational web portal project funded through: –DoD HPC MO PET Portal: Kerberos security in computational web portal –GEM science: Support codes developed by earthquake modeling consortium –Alliance: Contribute to NCSA portal –SciDAC (Scientific Discovery through Advanced Computing): DOE project to build portal services for Plasma physics Our goal is to provide building block components that can be used to build specific portals. We also develop browser-based interfaces for basic services and specific science codes. Developed to support typical, if simple, high performance computing services –Batch script generation, job submission and monitoring, file management and transfer. –Do it all securely
Problems with Traditional Portal Architecture Portals accesses heterogeneous back ends and grids through a particular middle tier. Most portal projects are not interoperable –Middle tier software incompatible –Wide range of protocols. Why do we need the portal interoperability? –Portal developers don’t have to reinvent every single important service (lesson from GGF GCE). –Users will have access to more services than any one project can provide. –Users will be able to pick up the best available implementation of a service. services Web browser services Back end resources ? … … …
Web Service-Based Computing Portal Architecture JS: Job submission JM: Job Monitoring FT: File Transfer CM: Context Manager SG: Script Generation AWS: Application Web Service HIS: Host Independent Service HSS: Host Specific Service Backend Resources Middle Tier (Web Server) Simulation Component JS JM FT HPC SOAP Data Component FT JS JM Data Base … Web Services Provider Web Browser Service Repository … Publish SOAP HTTP Portal Server CM SG AWS Middle Tier (Web Server) HIS SOAP User Interface Server SOAP Client Repository Client SOAP HSS Publish
Core Web services – 1 Given WSDL and SOAP, what can you build? Host-Specific Services (HSS) –Instances of these services are bound to particular hosts. –Job Submission –File Transfer –Job & Host Monitoring Host-Independent Services (HIS) –Informational services that are not tied to specific service points –The service provided does not depend on the location. –Context Management –Script Generation These core services are simple, stateless.
Core Web services - 2 Job Submission –Allow users to execute scientific applications –Execute operating system calls directly or may interact with Grid services through, for example, the CoG client API to Globus. –We use Java Runtime processes to run external (non-Java) commands, for example, PBS qsub. File Manipulation –Upload and download files between their desktops and various backend destinations. –Allow users to transparently move, rename, and copy files on remote back-ends and crossload between different backend sites. –File uploading and downloading service illustrate the use of SOAP messages with attachments in the RPC messaging style. –SOAP attachments are non-XML files that are appended to the SOAP message and are useful for sending binary data and files with known MIME formats.
Core Web services - 3 Context Management (CM) –Archives interactions with the computational portal and stores all of the metadata associated with user sessions. –Provides simplest possible data model CM provides an easy interface to an arbitrarily deep and complex tree-shaped data structure. Context data nodes are defined by recursive schema that hold optional, unbounded name/value pairs and child nodes. –We use CM to store locations of job scripts, miscellaneous file URIs, user’s application instance XML files, etc. –CM metadata stored on file systems, XML-native databases, …. Actual data may be anywhere. –Actual service interface for manipulating contexts and the context data Add one or more contexts. Search and store the context data with XPath queries. Remove the specified context. List the child contexts.
Context Manager Architecture Client Axis Servlet SOAP/HTTP Context Manager Shared WSDL Interface FSXMLDB Internal Communication Context Data
Core Web services - 4 Script Generation –For users who are unfamiliar with HPC systems. –The information about user’s choice with the portal interaction is stored as user’s application instance XML document. –Generate the job script which could be broken down into two parts: a queue script for a particular queuing system such as PBS, LSF and LoadLeveler and a user script for running the application code. Job monitoring –Has been built in the polling method. –Monitor the execution of a job running in a queuing system. –Return the array of the generated a WSDL complex type, effectively an XML data object that contains the job status of the scheduler, given the user name and the type of queuing system as input parameters on job monitoring method.
List user files on selected host, Solar. File operations include Upload, download, Copy, rename, crossload File manipulation service
Job monitoring service List the user’s job status on selected host, Solar that is running PBS queuing system.
Application Web Services (AWS) Application: specifically some code developed by the scientific community. –Example: Finite element codes, grid generation codes and so on. AWS are designed to make scientific applications (i.e. earthquake modeling codes) into Grid Resources. An actual application is wrapped by a Java program. We need a meaningful metadata model for applications –Describe application-specific requirements –Describe bindings of applications to host environments and to Web services in a general way that is independent of the particular portal. Scientific applications consist of several core Web services. –Get files to right place, script submission instructions, submit the job, get notified at various states.
AWS Lifecycle Applications can exist in four stages: –Abstract state: describes optional choices and configurations that are available. –Ready state: Specific choices are made –Submitted: Application is running –Completed: Application is finished, but we need to archive information about it.
AWS Schema Structure Two sets of XML schema: –Application Descriptors: describe abstract state. describe application options. Used by the application developer to deploy his/her service into the portal. –Application Instance Descriptors: describe particular instance states (ready, running, archived). describe particular user choices and archive them for later browsing and resubmission. Schema sets are arranged hierarchically –Applications contain hosts –Schema are designed to be pluggable Don’t like my queue description schema? Plug in your own.
AWS XML Descriptors Application description schema –A “basic information” element that contains information such as application name, version, option flags. –An “internal communication” element that contains child elements for describing input, output, and error fields for the code. – An “execution environment” element that contains a list of core services needed to execute the application. –An optional, generic parameter to hold arbitrary information about the application. Host description schema –Contains information about the resource such as DNS name and IP address –All of the information needed to invoke the parent application on that resource such as location of the executable, location of the workspace or scratch directory, and so on. Queue description schema –Contains information needed to perform queue submissions such as memory size, number of CPUs and so on( in case of PBS).
Example: Deploy an application code, Simplex on a particular host as a service and this form is used to edit the Application XML descriptor file
Sample generated user view of application code, Simplex: this form is generated from the Application XML descriptor for a particular application runs: the input files used, the location of the output, the resources used for the computation, etc.
Portal Stack Core services provide the basic connection to back end “Grid” services. Application services combine core services and application metadata. User interface portlets are built for each service. Portals aggregate portlet components into portals. Core Web Services User Interfaces Application Web Services and Workflow Aggregate Portals Message Security, Information
Portlets for User Interface Components Web services define XML interfaces for accessing services. User interface components (such as JSPs) combine service stubs into useful objects for human interaction. So we actually have two points of interoperability: –At the WSDL interface –At the user interface Portlets combine HTML (and other) user interfaces into aggregate portal interfaces. –EX: Jetspeed from Jakarta
Reliability of Distributed Services Distributed service systems have some important reliability problems –Information must be up to date. The system adjust when servers become available or unavailable. Service metadata should match the actual capabilities of the system. –Messages should reach the services. We are automating application service metadata through publish/subscribe mechanisms. –Servers contain embedded publisher/subscriber clients –Information aggregators publish requests for information to JMS- style brokers. –All available servers subscribed to the request topic publish their information back to the aggregator.
Bridging Between Client-Server and Messaging Services Browser Dynamic User Interface Component Broker Aggregator Tomcat Server Tomcat Server Tomcat Server Tomcat Server Tomcat Server Servers run Narada Notifiers Peers register themselves to Aggregator Web service request for information SOAP HTTP
Conclusions Traditional portals have “stovepipes” with interoperability problems. By designing and implementing several core portal services and Application Web Services around Web services, we gain interoperability and reusability. The emphasis on the development of reusable services that can form the basis for multiple PSEs. The portal developer can construct specific implementations and composites of primitive service components and can also provide services that may be shared among different portals. Application-specific services and data models that can be used to encapsulate entire applications independently of the portal implementation. User interfaces to application services become distributed portlets. Everything is distributed –Core Web Services->Application Web Services->User Interfaces Portlets- >Portals –Uses HTTP, SOAP, WSDL, …. It all has to be secured. –A flexible, message-based security system that can be bound to multiple mechanism and multiple message formats. –The general approach: to use assertion –SAML, WS-Security –Kerberos, PKI