The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Workshop on Cyber Infrastructure in Combustion Science April 19-20, 2006 Subrata Bhattacharjee and Christopher Paolini Mechanical.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
January, 23, 2006 Ilkay Altintas
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
International Telecommunication Union Geneva, 9(pm)-10 February 2009 ITU-T Security Standardization on Mobile Web Services Lee, Jae Seung Special Fellow,
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
XML Registries Source: Java TM API for XML Registries Specification.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Object storage and object interoperability
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Flanders Marine Institute (VLIZ)
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
NSDL Data Repository (NDR)
The Anatomy and The Physiology of the Grid
Presentation transcript:

The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher UC DAVIS Department of Computer Science San Diego Supercomputer Center

Science Environment for Ecological Knowledge Large collaborative NSF/ITR ( ) Bringing together ecologists, IT experts, CS researchers, … SEEK.ecoinformatics.org

SWDBAug 29, 2004 What is SEEK? Multidisciplinary research project to facilitate … Access to ecological, environmental, and biodiversity data –Enable data sharing & re-use –Enhance data discovery at global scales Scalable analysis and synthesis –Taxonomic, Spatial, Temporal, Conceptual integration of data, addressing data heterogeneity issues –Enable communication and collaboration for analysis –Enable re-use of analytical components

SWDBAug 29, 2004 SEEK Components Main Components: Kepler –Problem-solving environment for scientific data analysis and visualization  “scientific workflows” EcoGrid –Distributed data network for environmental, ecological, and systematics data –Making diverse environmental data systems interoperate Semantic Mediation System –“Smart” data discovery and integration Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training

SWDBAug 29, 2004 Ecological Metadata Language Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion EML Common language for archiving and transporting data Discovery information Creator, Title, Abstract, Keyword, etc. Content Context Physical, logical structure SEEK adds semantic structure

SWDBAug 29, 2004 An Example EML Document Alegria Temperatures PISCO: Intertidal Temperature Data: Alegria, California: Carol Blanchette PISCO UCSB Marine Science Institute Santa Barbara CA These temperature data were collected at Alegria Beach, California, and were... OceanographicSensorData Thermistor PISCOCategories Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications. C.Blanchette Transform

SWDBAug 29, 2004 SEEK Overview

SWDBAug 29, 2004 Ecogrid Focus Data and Metadata Distributed Data XML-based Metadata Service to Semantic Mediation Layer Access to Ontologies and Taxon Services Helping with Semantic Data Integration Service to Analysis and Modelling Layer Interaction with Kepler - Workflows Interaction with Grid Computing Facilities Access to Legacy Apps LifeMapper Spatial Data Workbench

SWDBAug 29, 2004 SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate –Hides complexity of underlying systems using lightweight interfaces –Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems –Any system can implement these interfaces –Prototyping using: Metacat, SRB, DiGIR, Xanthoria, etc. Supports multiple metadata standards –EML, Darwin Core as foci

SWDBAug 29, 2004 Web services Service Oriented Architecture (SOA) –Remote discovery and execution of services Network transport of data (HTTP) Message format (SOAP/XML) Service interface description (WSDL) Morpho 12 3 Diagram from

SWDBAug 29, 2004 Grid Services A Grid service is a Web service –plus Lifecycle management –(persisting the service over outages) State management –(tracking sessions across multiple requests) Factory services –(allowing many clients to connect) Security –(authorization) … Ecogrid defines a standard set of grid interfaces for use by many data servers

SWDBAug 29, 2004 EcoGrid Example query() get() EcoGrid WSDL query(session, query) get(session, identifier) EcoGrid Registry 1. Publish 3. Return service description 4. Execute search, handle response 5. Execute get, handle response Morpho 2. Find service

SWDBAug 29, 2004 EcoGrid Query Interfaces Provides a mechanism for search and retrieval of metadata and federated data –Supports third party interaction with search results forwarding of result set identifiers to another service instance for retrieval Different levels of compliance –Low barrier for participation –Bulk of data will be accessible through Type I ResultQuery

SWDBAug 29, 2004 EcoGrid Query Level I Basic, entry level exposure of data and metadata for EcoGrid and SEEK Response contains data – intended for direct communications rather than 3 rd party indirection ResultsetType query(SessionID,QueryType) byte[] get(SessionID,objectID) Result Query

SWDBAug 29, 2004 Query Conditions Language independent representation of a query structure Transformed into the appropriate native language of the data store Example: <condition operator="LIKE“ concept="ScientificName">peromyscus% NULL Query

SWDBAug 29, 2004 Specifying the Resultset Specify the list of concepts (fields) to be returned in the resultset Simple paths used to identify elements or document subtrees Effectively flattens the structure of the records, but allows generic representation Example: /ScientificName /Longitude /Latitude Query

SWDBAug 29, 2004 Full Query Example <egq:query queryId="query-digir.1.1" system=" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid- query-1.0.0beta1../../src/xsd/query.xsd"> /2003/1.0 /ScientificName /Longitude /Latitude Peromyscus genus query Peromyscus Query

SWDBAug 29, 2004 <rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1../../src/xsd/resultset.xsd"> T16:45:50-09: <record number="1" system="1" identifier="mvz1"> PEROMYSCUS LEUCOPUS … Query Result Set Structure Result

SWDBAug 29, 2004 EcoGrid Get & Put get enables retrieval of the content of a dataset/file such as SRB, MetaCat. get also enables SQL querying of relational databases (Oracle, DB2, etc), which are pre-registered as a data source in SRB. put for data: allows users to create (upload) files into EcoGrid resources such as MetCat, SRB. put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User-defined metadata in SRB. –Depends on the availability of an authentication and access control system –put(sessionID, objectID, object, type) –delete(sessionID,objectID)

SWDBAug 29, 2004 Building the EcoGrid ANDLUQNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node HBR

SWDBAug 29, 2004 EcoGrid Client Interactions Modes of interaction –Client-server –Fully distributed –Peer-to-peer EcoGrid Registry –Node discovery –Service discovery Aggregation services –Centralized access –Reliability –Data preservation

SWDBAug 29, 2004 Layers in EcoGrid

SWDBAug 29, 2004 EcoGrid Queries in Kepler

SWDBAug 29, 2004 Metadata-driven analysis cycle

SWDBAug 29, 2004 Status Read, Query & Register Completed Simple Registry Operational EcoGrid Wrappers completed for: –MetaCat –SRB –DiGIR –Xanthoria Available Interfaces –WSDL –Simple Web Interactivity –Kepler

SWDBAug 29, 2004 Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of California, Davis, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, DOE SDM/SciDAC, GEON, and others.

SWDBAug 29, 2004 Q & A

SWDBAug 29, 2004 Frequently Asked Questions … Which version of Grid services do you use? –We currently use 3.2.x because it was the last stable version based on OGSA. It seems that WSRF does not support the OGSA Factory pattern, which is the main Grid Service feature that we utilize and wouldn’t want to lose. We may migrate to WSRF eventually. How can a user (or developer) discover what catalogs are on the EcoGrid? –In Kepler, click the "Sources" button on the Data tab. The UI allows a basic query of the EcoGrid registry to discover new nodes and choose which should be searched. –Developers can program to the EcoGrid Registry API. How much is the EcoGrid *integrated*? Is there a common query language? –Yes, there is a common query syntax for expressing path-based metadata queries. This syntax does not do any mapping among various metadata languages. We still need of a system that can translate a query that uses terms from one metadata language (e.g., DarwinCore) into queries for another metadata language (e.g., EML). The SEEK SMS system will help with this mapping.

SWDBAug 29, 2004 Frequently Asked Questions … Is the EcoGrid a "federation of federations" ? –In a sense. The EcoGrid is an *API* (specifically a Grid Services API) that allows clients to use a common set of communication protocols to access diverse data systems. The EcoGrid API has been implemented for Metacat, DIGIR, and SRB, all of which are federations. As clients can access the various systems via EcoGrid, the latter can be considered a federation of federations. The EcoGrid Registry has a list of systems that have published EcoGrid interfaces that are accessible to clients. Where are the WSDLs? – /ecogrid/EcoGridQueryInterfaceLevelOneService?wsdl What’s on the EcoGrid right now? –The KNB network is gathering data and metadata from NCEAS, 24 LTER sites, and about 200 other field stations (KNB EcoGrid node) –The DIGIR system federates access to museum collections data in the form of Darwin Core records. The EcoGrid node at KU points at this network of about ~150 museums that are accessible through DIGIR. –SRB is currently used to hold some data objects that are described via EML metadata records that are in the KNB Metacat.

SWDBAug 29, 2004 Frequently Asked Questions … Where is the code for the EcoGrid? –Most code is in CVS at seek/projects/ecogrid. Some Kepler-specific client-side UI code is in the Kepler CVS. – –There are also Ecogrid design docs, meeting notes, etc. Are there plans for an "EcoGrid Portal" so that end users can access easily contribute data? –Yes, this is under development. In the interim, one can search the KNB and DIGIR sites individually, or use Kepler.