SimpleGrid Toolkit: Enabling Efficient Learning and Development of TeraGrid Science Gateway Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography 2. National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign 3. San Diego Supercomputer Center (SDSC) University of California at San Diego 4. Argonne National Laboratory 5. University of Chicago November 11, 2007
Purpose Simply the learning of science gateways Expedite the prototyping process of developing science gateways
Background Grid computing Science and engineering gateway Problem solving environments (PSE)
Related Work Active area Examples Evidenced by TeraGrid Science Gateway activities Examples Gridport Gridsphere Vine OGCE
The State of the Art Evolving and sophisticated web portal technologies GridSphere Liferay Sakai Jetspeed Missing simple, robust, and reusable interfaces between applications and portals Significant gap between Grid technologies and application problem solving environments Grid middleware complexity Grid technologies focus on enabling resource sharing and federation The development of problem solving environments requires extensible, programmable, reusable, application-oriented software components that support customizable access to Grid and VO capabilities
SimpleGrid Motivation Grid and web portal technologies are complex, and still rapidly evolving An effort to close the gap between Grid computing and scientific applications
SimpleGrid – Component-Based Design
Architecture – External Interfaces
Architecture – Internal Interactions
Efficient Learning and Development Three-stage learning Command-line Grid-enabled java application development Portlet development Simple installation and deployment Java, Ant, Tomcat, GridSphere Globus Toolkit 4.0+ only for command-line stage Reusable components for development SimpleGrid APIs JSP and Velocity templates Development environment setup Manual for SimpleGrid setup in Eclipse
From Individual to Community TeraGrid command-line tools for individual use SimpleGrid APIs to automate the access to cyberinfrastructure resources SimpleGrid portlets to enable community access to scientific problem solving capabilities as deployable components in science gateway portals
SimpleGrid APIs SimpleCred: Grid proxy management SimpleTran: Data transfer to/from Grids SimpleRun: Grid job management SimpleViz: Visualization component SimpleInfo: Grid information provider Under development Current Grid information is provided statically through a configuration file
SimpleCred Fetch Grid credentials Automatic credential renewal Local proxy loading or instantiation Remote proxy instantiation through MyProxy Automatic credential renewal Simple interface for Grid proxy renewal, i.e., SimpleCred.get() Grid community user support A global SimpleCred instance can be stored in portal as a shared object for users using the same community account Programming interface load(), logon(), get() Portlet interface Grid credentials can be managed explicitly through a UserPortlet interface
SimpleTran A wrapper of GridFTP Threaded implementation Allow responsive interactions between portal and client browser
SimpleRun A wrapper of GRAM and WS-GRAM Support both GT2 and GT4 job submission User selectable Depends on SimpleTran to transfer datasets Programming interface execute() getStatus()
SimpleViz Visualization mechanisms Threaded implementation JFreeChart Google map ParaView (under development) Threaded implementation Portlet interface Google map-based JavaScript library
Portlet Components and Interfaces UserPortlet User information and Grid credential management Interface: JSP Portlet: GridSphere ActionPortlet DMSPortlet A typical scientific computational analysis process Interface: Velocity Portlet: VelocityPortlet Portlet container GridSphere http://www.collab-ogce.org/ogce2/velocity-portlets.html
Case Study Two-dimensional spatial interpolation in Geographic Information Systems Nearest-neighbor search procedure Computing intensive for large spatial datasets and/or high-resolution interpolation A fast two-dimensional spatial interpolation algorithm called DMS (Dynamically Memorized Search) Parameter-sweeping application for sensitivity analysis
TeraGrid-Based DMS Analysis Request an individual or community account on TeraGrid Install DMS executables on three TeraGrid sites Prepare a dataset on a local machine Transfer a specified dataset to a TeraGrid site (e.g., NCSA) Submit a Grid job to the specified TeraGrid site with a parameter value The submitted job is scheduled to be executed on one compute node on the specified TeraGrid cluster When the job is finished, the analysis result is written into the data directory of DMS installation on the TeraGrid cluster Transfer the result back to the local machine Visualize the result using the DMS visualization tool
DMS Analysis Portlet
Case Study Summary 16 participants various levels of software development experience and Grid computing knowledge 2.5 hours, all participants including those with minimum Java programming knowledge Master the SimpleGrid APIs for the DMS analysis Successfully set up a portlet for the analysis in a GridSphere portal server
Concluding Discussion The SimpleGrid toolkit Makes an abstraction of generic Grid middleware services Enables science gateway developers to concentrate on developing PSE by working on reusable and extensible software components Hides the complexity of evolving web portal technologies by tailoring to application requirements for developing PSE Service-oriented architecture Component-based framework Simplify science gateway development Help overcome the learning curve of science gateway technologies
Ongoing Work APIs Automation tools Grid-based visualization SimpleInfo Workflow Automation tools Enable automatic application integration as science gateway portal components (portlets) User interface definition and generation Workflow code stubs and Grid-related server-side code skeletons
Acknowledgements CyberInfrastructure and Geospatial Information Laboratory (CIGI) National Center for Supercomputing Applications (NCSA) NSF TeraGrid
Demo