Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February 13, 2007 Berkeley, California The Kepler Actor Repository: Enabling Remote Storage, Query and Retrieval of Workflow Components
Purpose of the Repository Easy method for workflow authors to share components A common archive for components Enables strong versioning Workflow components can become metadata for research papers Helps with lineage tracking Repository Client Component
Functional Requirements Kepler users should be able to easily locate and use components Kepler users should be able to easily add components to the archive Users should be able to restrict access to their components Components should be browsable or searchable
Important Differences Between Kepler and Ptolemy Kepler Object Manager (OM) database of all objects registered with the system objects are read in at startup OM organizes objects based on an ontology Kepler Objects Each component has one or more semantic types Domain specific ordering via semantic type/ontologies Each object has a unique LSID Ontology: specification of a conceptualization within a knowledge domain
Kepler Archive (KAR) Files Used for component transfer and archiving OM can create or ingest KAR files Consits of actor metadat, manifest and eventually class/jar files Each object has LSID listed in the manifest KAR itself has an LSID KAR files are used for transporting components between the client and server Important Differences Between Kepler and Ptolemy
Architecture
The Repository All services provided via the EarthGrid (formerly known as the EcoGrid) Web Services: Get/Query, Put, Auth Web interface allows users to search for and download components outside of Kepler Component Storage KAR files metadata file external to the KAR for indexing
Client Interface Object Manager Handles local get/put/query Handles remote get/put/query through the EarthGrid interface User right clicks on a component to upload it Remote search results are integrated into the actor library User drags component from the search results to download it Internal database of LSIDs is synced to the server via EarthGrid interfaces
Uploading and Searching
Downloading to the Client Remote Components downloaded when dragged to the canvas For initial display (in the results tree), only actor metadata is loaded After initial download, KAR file is cached Want and need dynamic class loading to make this more useful (back to that in a minute)
Authentication Uses the EarthGrid interface Backend is currently LDAP Currently, a component can either be public or private, but control could be finer grained Authentication interface in Kepler is extensible and provides for other authentication schemes, such as GAMA
Documentation Kepler uses the Ptolemy documentation system For displaying docs remotely, Kepler uses a custom attribute and inserts the docs directly into the actor metadata on the server Docs can then be transformed for viewing on the website
Viewing Documentation
Future Work Use the repository as the main storage location for components instead of shipping Kepler with an extensive library Make the web interface more usable Dynamic class loading….
Dynamic Class Loading Motivation: allow the use of multiple versions of the same class in one workflow execution. Problems Loading classes isn’t too hard, but reloading classes requires removing the entire classloader. Two different actors may use two different versions of the same class. We don’t always want to use the class (with the same name) that is already loaded. Security issues with not always using the preloaded java classes. Potential Solutions Create a custom classloader that allows reloading/coloading Create a new classloader for each loaded class (that can be removed if necessary) Suggestions?
More info: This material is based upon work supported by the National Science Foundation under award and others. SEEK Partner Institutions University of New Mexico Napier University, Edinburgh Scotland University of Kansas University of Vermont University of California, Santa Barbara National Center for Ecological Analysis and Synthesis University of California, Davis Arizona State University University of North Carolina San Diego Supercomputer Center Kepler Partner Projects SEEK Ptolemy ROADNet SDM/SPA SDM/CPES GEON Resurgence