1 st Generation of Grid portals
1st Generation Portals The first generation of Grid portals mainly used a three-tier architecture:
Properties A three-tiered architecture, consisting of an interface tier of a Web browser, a middle tier of Web servers, and a third tier of backend services and resources, such as databases, high performance computers, disk storage, and specialized devices. A user makes a secure connection from their browser to a Web server. The Web server then obtains a proxy credential from a proxy credential server and uses that to authenticate the user. When the user completes defining the parameters of the task they want to execute, the portal Web server launches an application manager, which is a process that controls and monitors the actual execution of Grid task(s). The Web server delegates the user’s proxy credential to the application manager, so that it may act on the user’s behalf. In some systems, the application manager publishes an event/message stream to a persistent event channel-archive, which describes the state of an application’s execution and can be monitored by the user through their browser.
Grid Services Provided Authentication: When users access the Grid via a portal, the portal can authenticate users with their usernames and passwords. Once authenticated, a user can request the portal to access Grid resources on the user’s behalf. Job Management: A portal provides users with the ability to manage their job tasks (serial or parallel), i.e., launching their applications via the Web browser in a reliable and secure way, monitoring the status of tasks and pausing or cancelling tasks if necessary. Data Transfer: A portal allows users to upload input data sets required by tasks that are to be executed on remote resources. Similarly the portal allows results sets and other data to be downloaded via a Web browser to a local desktop. Information Services: A portal uses discovery mechanisms to find the resources that are needed and available for a particular task. Information that can be collected about resources includes static and dynamic information such as OS or CPU type, current CPU load, free memory or file space, and network status. In addition, other details such as job status and queue information can also be retrieved.
Implementation The first generation Grid portals mainly use GT2 to provide Grid services. One main reason for this is that Globus provides a complete package and a standard way for building Grid enabled services. A dynamic graphical user interface (GUI) based on HTML pages, with JSP (Java Server Pages), or JavaScript. Common Gateway Interface (CGI) and Perl are also used by some portals. CGI is an alternative to JSP for dynamically generating Web contents. The secure connection from a browser to backend server is via Transport Layer Security (TLS) and Secure HTTP (S-HTTP). Typically, a Java Servlet or Java Bean on the Web server services requests from a user and accesses backend resources. MyProxy and GT2 GSI are used for user authentication. MyProxy provides credential delegation in a secure manner. GT2 GRAM is used for job submission. GT2 MDS is used for gathering information on various resources. GT2 GSIFTP or GT2 GridFTP for data transfer. The Java CoG provides the access to the corresponding Globus services for Java programs.
MyProxy MyProxy is an online credential management system for the Grid. It is used to delegate a user’s proxy credential to Grid portals which can be authenticated to access Grid resources on the user’s behalf. Storing your Grid credentials in a MyProxy repository allows you to retrieve a proxy credential whenever and wherever you need one. You can also allow trusted servers to renew your proxy credentials using MyProxy, so, for example, long-running tasks do not fail due to an expired proxy credential. The figure shows the steps to securely access the Grid via a Grid portal with MyProxy. myproxy.grid-support.ac.uk
Using MyProxy 1.Execute myproxy-init command on the computer where your Grid credential is located to delegate a proxy credential on a MyProxy server. The delegated proxy credential normally has a lifetime of one week. The communication between the computer and the MyProxy server is securely managed by TLS. You need to supply a user name and pass phrase for the identity of your Grid credential. Then you need to supply another different MyProxy pass phrase to secure the delegated proxy credential on the MyProxy server. 2.Log into the Grid portal with the same username and MyProxy pass phrase used for delegating the proxy credential. 3.The portal uses myproxy-get-delegation command to retrieve a delegated proxy credential from the MyProxy server using your username and MyProxy pass phrase. 4.The portal accesses Grid resources with the proxy credential on your behalf. 5.The operation of logging out of the portal will delete your delegated proxy credential on the portal. If you forget to log off, then the proxy credential will expire at the lifetime specified.
Java CoG Kit The Java Commodity Grid (CoG) Kit provides access to GT2 services through Java APIs. The goal of the Java CoG Kit is to provide Grid developers with the advantage to utilize much of the Globus functionality, as well as, access to the numerous additional libraries and frameworks developed by the Java community. Currently GT3 integrates part of Java CoG, e.g., many of the command-line tools in GT3 are implemented with the Java CoG. The Java CoG has been focused on client-side issues. Grid services that can be accessed by the toolkit include: An information service compatible with the GT2 MDS implemented with Java Native Directory Interface JNDI; A security infrastructure compatible with the GT2 GSI implemented with the iaik security library; A data transfer mechanism compatible with a subset of the GT2 GridFTP and/or GSIFTP; Resource management and job submission with the GT2 GRAM Gatekeeper; Advanced reservation compatible with GT2 GARA; A MyProxy server managing user credentials.
GridPort and HPCPortal GridPort 2.0 (GP2) is a Perl based Grid portal toolkit. The purpose of GP2 was to facilitate the easy development of application specific portals. GP2 is a collection of services, scripts and tools that allow developers to connect Web-based interfaces to backend Grid services. The scripts and tools provide consistent interfaces between the underlying infrastructure, which are based on Grid technologies such as GT2, and standard Web technologies such as CGI.
GridPort Layers Client Layer: represents the consumers of Grid portals, typically Web browsers, PDAs, or even applications capable of pulling data from a Web server. Clients interact with a GP2 portal via HTML form elements and use secure HTTP to submit requests. Portal Layer: consists of portal-specific codes. Application portals run on standard Web servers and handle client requests and provide responses to those requests. One instance of GP2 can support multiple concurrent application portals, but they must exist on the same Web server where they share the same instance of the GP2 libraries. This allows the application portals to share portal-related user and account data and thereby makes possible a single-login environment. GP2 portals can also share libraries, file space, and other services. Portal Services Layer: GP2 and other portal toolkits or libraries reside at the portal services layer. GP2 performs common services for application portals including the management of session state, portal accounts, and Grid information services with GT2 MDS. Grid Services Layer: consists of those software components and services that are needed to handle user requests to access the Grid. GP2 employs simple, reusable middleware technologies, e.g., GT2 GRAM for job submission to remote resources; GT2 GSI and MyProxy for security and authentication; GT2 GridFTP and the San Diego Supercomputer Center (SDSC) Storage Resource Broker (SRB) for distributed file collection and management [56]; and Grid Information Services based primarily on proprietary GP2 information provider scripts and the GT2 MDS.
Distributed Component Architecture
GridPort, HPCPortal and GROWL GP2 can be used in two ways. The first approach requires that GT2 be installed, because GP2 scripts wrap the GT2 command line tools in the form of Perl scripts executed from CGI-Bin. GT2 GRAM, GSIFTP, MyProxy are used to access backend Grid services. The second approach does not require GT2, but relies on the CGI scripts that have been configured to use a primary GP2 Portal as a proxy for accessing GP2 services, such as user authentication, job submission, and file transfer. The second approach allows a user to quickly deploy a Web server configured with a set of GP2 CGI scripts to perform generic portal operations. HPCPortal uses the C API in the Globus toolkit for MDS, GridFTP and GRAM. There is a front-end CGI script which passes data from the user’s form interface to the back end Globus code which in turn submits the remote job. The front and back end services can be connected using a web service call so do not need to be located on the same server (previous figure). GROWL uses the same back end services but provides a C programming API to the user in the form of a function liibrary.
GPDK and the Java™ World GPDK is another Grid portal toolkit that uses Java Server Pages (JSP) for portal presentation and Java Beans to access back end Grid resources via GT2. Beans in GPDK are mostly derived from the Java CoG kit.
Java Services in GPDK Grid service beans in GPDK can be classified as follows. These beans can be used for the implementation of Grid portals. Security: The security bean, MyproxyBean, is responsible for obtaining delegated credentials from a MyProxy server. The MyproxyBean has a method for setting the username, password, and designated lifetime of a delegated credential on the Web server. In addition, it allows delegated credentials to be uploaded securely to the Web server. User Profiles: User profiles are controlled by three beans: UserLoginBean, UserAdminBean and the UserProfileBean. – The UserLoginBean provides an optional service to authenticate users to a portal. Currently, it only sets a username/password and checks a password file on the Web server to validate user access. – The UserAdminBean provides methods for serializing a UserProfileBean and validating a user's profile. – The UserProfileBean maintains user information including preferences, credential information, submitted job history, and computational resources used. The UserProfileBean is generally instantiated with session scope to persist for the duration of the user's transactions on the portal.
More… Job Submission: The JobBean contains all the necessary functions used in submitting a job including memory requirements, name of executble code, arguments, number of processors, maximum wall clock or CPU time, and the submission queue. A JobBean is passed to a JobSubmissionBean that is responsible for actually launching the job. Two varieties of the JobSubmissionBean currently exist. The GramSubmissionBean submits a job to a GT2 GRAM gatekeeper that can either run the job interactively or submit it to a scheduling system if one exists. The JobInfoBean can be used to retrieve a job related timestamped information including the job ID, status, and outputs. The JobHistoryBean uses multiple JobInfo beans to provide a history of information about jobs that have been submitted. The history information can be stored in the user's profile. File Transfer: The FileTransferBean provides methods for transferring files. Both GSIFTPTranferBean and the GSISCPTransferBean can be used to securely copy files from source to destination hosts using a user's delegated credential. The GSISCPTransferBean requires that GSI enabled SSH [57] be deployed on machines to which file transfer via the GSI enhanced “scp”. The GSIFTPTransferBean implements a GSI enhanced FTP for third-party file transfers. Information Services: The MDSQueryBean provides methods for querying a Lightweight Directory Access Protocol (LDAP) server by setting and retrieving object classes and attributes such as OS type, memory, and CPU load for various resources. LDAP is a standard for accessing information directories on the Internet. Currently, the MDSQueryBean makes use of the Mozilla Directory SDK [27] for interacting with a LDAP server.
Comparison of 1 st Generation Portals
What are the Restrictions? First generation Grid portals have been focused on providing basic task-oriented services, such as user authentication, job submission, monitoring, data transfer. However, they are typically tightly coupled with Grid middleware tools such as Globus. The main limitations of first generation portals can be summarized as follows. Lack of Customization: Portal developers instead of portal users normally build portals because the knowledge and expertise required to use the portal toolkits, as described in this chapter, is beyond the capability of most Grid end users. When end users access the Grid via a portal, it is almost impossible for them to customize the portal to meet their specific needs, e.g., to add or remove some portal services. Restricted Grid Services: First generation Grid portals are tightly coupled with specific Grid middleware technologies such as Globus, which results in restricted portal services. It is hard to integrate Grid services provided by different Grid middleware technologies via a portal of this generation. Static Grid Services : A Grid environment is dynamic in nature with more and more Grid services are being developed. However, first generation portals can only provide static Grid services in that they lack a facility to easily expose newly created Grid services to users.