Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.

Slides:



Advertisements
Similar presentations
Fujitsu Laboratories of Europe © 2004 What is a (Grid) Resource? Dr. David Snelling Fujitsu Laboratories of Europe W3C TAG - Edinburgh September 20, 2005.
Advertisements

ECOOP Data Management System (T2.2/WP2) Declan Dunne 13 th February 2008, Athens.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Software Connectors. Attach adapter to A Maintain multiple versions of A or B Make B multilingual Role and Challenge of Software Connectors Change A’s.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
GIS Actors in Kepler - Java-based, GDAL-JNI, and C++(Grass) Routines Dan Higgins - UC Santa Barbara (NCEAS) Chad Berkley – UC Santa Barbara (NCEAS) Jianting.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Digital Library Architecture and Technology
January, 23, 2006 Ilkay Altintas
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
THE GITB TESTING FRAMEWORK Jacques Durand, Fujitsu America | December 1, 2011 GITB |
Digital Object Architecture
Refactoring the EarthGrid SOAP API to REST style and implementing it to Metacat Serhan Akın Ph.D. candidate in Earth System Sciences Institute of Earth.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM The Spatial Information Services Stack – infrastructure for the AuScope Community Earth.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
RSISIPL1 SERVICE ORIENTED ARCHITECTURE (SOA) By Pavan By Pavan.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Ocean Observatories Initiative OOI Cyberinfrastructure Data Management Michael Meisinger & David Stuebe OOI Cyberinfrastructure Life Cycle Objectives Milestone.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
A Scalable Virtual Registry Service for jGMA Matthew Grove DSG Seminar 3 rd May 2005.
DSpace - Digital Library Software
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
REST By: Vishwanath Vineet.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
web Metadata Properties Thumbnails Properties by Kind MRU Change Notifications Bulk Access File broker functions Search Deep/Shallow Enumeration.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Improving Data Discovery Through Semantic Search
An Overview of Data-PASS Shared Catalog
Flanders Marine Institute (VLIZ)
CS 501: Software Engineering Fall 1999
NSDL Data Repository (NDR)
Session 2: Metadata and Catalogues
LOD reference architecture
A Semantic Type System and Propagation
Information Services for Dynamically Assembled Semantic Grids
Presentation transcript:

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis May, 2005

Outline 1.The Kepler Object Manager 2.Archival and Exchange via KSW Files 3.Local Object Cache 4.Work in Progress (literally)

Motivation In SEEK (and generally, Kepler) we: Envision a large number of science-specific actors E.g., ecology, biodiversity, geoscience, bioinformatics, chemistry Envision a large number of available data sets Some may be fairly large; already various “collections” (e.g., LTER, GEON) Want to enable search / discovery of actors, datasets, and workflows Want to enable wide-scale sharing and re-use of data and workflows (and their components) across disciplines … and have desired more than CVS to accomplish these goals

Managing distributed collections of workflow “objects” Remote WFs/ActorsDatasets WFs/Actors Local Download Publish Search Archive WFs Actors Data Archived WFs Actors Data Archived WFs Kepler Object Manager

Main Issues and Goals Handle various kinds of objects –Workflows, Actors, Datasets, Libraries, Applications, Ontologies, Metadata, … Transport objects –Publish to and download from remote repositories –Recognize local versus remote objects Search for objects –Both local and remote repositories w/in Kepler Version objects –E.g., managing conflicts in dependency chains Enable “functional” groups –Core group plus packages, e.g., biodiversity, geospatial, R, etc. Kepler Object Manager WFs/ActorsDatasets

Overall Architecture Life-Science Identifiers (LSIDs) –Similar to DOIs (urn’s, etc.) –Includes (some) versioning support –Well-defined APIs Kepler Scientific Workflow (KSW) Files –An archive/jar file of objects –KSW Metadata (a la MoML) –For publish/download/archive/groups EcoGrid –Remote Access/Query –A “thin” client / protocol Local Object Repository –Object retrieval, indexing, etc. Local Object Cache –Managing local vs. remote objects –Dynamic class loader Kepler Object Manager WFs/ActorsDatasets KSW Cache Manager Local Repository

An archive/jar file of objects –Actor code and metadata, libraries; Dataset, metadata, etc. –A manifest (what’s in the file) All objects in KSW files have associated LSID identifiers –An LSID is a urn, e.g., urn:lsid:kepler.org:actor:1000:2 Objects have associated metadata files (basically MoML) –Distinguish between “object definitions” and “object references” Objects within a KSW file have corresponding definitions Dependent objects not included are denoted via references –Semantic types, dependency information, ports and types, etc. KSW Files AuthorityNamespaceOIDVersion (opt.)

KSW Metadata example (notional) Kepler Object Manager WFs/ActorsDatasets KSW-File (jar) Cache Manager Local Repository Publish New Workflow Creates New Actor Classes New Libraries KSW Metadata (MoML + ID) Workflow Metadata (MoML + ID) id refs Manifest NOTE: Only *new* classes/libraries/etc. are included … known via object manager Unpacks

Local Object Cache Helps manage LSIDs –Creates them locally (for local actors, datasets, etc.) –Resolves LSIDs (both local and remote) –Handles publication (local id -> repository id, etc.) Support for packing/unpacking KSW files Provides simple database-like capabilities –“Persistent” storage / file indexing –Result caching (e.g., for remote queries) –Temporary files –Object indexing (e.g., dependency graphs, etc.) Handles multiple formats (within Kepler) –Stream versus File access –Representation conversion (e.g., binary interleaved, ascii grid)

Work In Progress Still much to do … –Still finalizing Object-Manager interfaces / APIs –The object cache is partly implemented –Simple KSW files can be packed / unpacked –The KSW metadata format still being finalized Working on MoML parsing, handling ids, etc. Defining properties, like semtypes, dependencies, refs, etc. We have a simple implementation of lsids for actor lib. Our goal is to leverage Ptolemy’s strengths … –Most of what we have added / plan to add is layered on top of Ptolemy –We want well-defined, generic interfaces –We are soliciting volunteers !