The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher UC DAVIS Department of Computer Science San Diego Supercomputer Center
Science Environment for Ecological Knowledge Large collaborative NSF/ITR ( ) Bringing together ecologists, IT experts, CS researchers, … SEEK.ecoinformatics.org
SWDBAug 29, 2004 What is SEEK? Multidisciplinary research project to facilitate … Access to ecological, environmental, and biodiversity data –Enable data sharing & re-use –Enhance data discovery at global scales Scalable analysis and synthesis –Taxonomic, Spatial, Temporal, Conceptual integration of data, addressing data heterogeneity issues –Enable communication and collaboration for analysis –Enable re-use of analytical components
SWDBAug 29, 2004 SEEK Components Main Components: Kepler –Problem-solving environment for scientific data analysis and visualization “scientific workflows” EcoGrid –Distributed data network for environmental, ecological, and systematics data –Making diverse environmental data systems interoperate Semantic Mediation System –“Smart” data discovery and integration Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training
SWDBAug 29, 2004 Ecological Metadata Language Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion EML Common language for archiving and transporting data Discovery information Creator, Title, Abstract, Keyword, etc. Content Context Physical, logical structure SEEK adds semantic structure
SWDBAug 29, 2004 An Example EML Document Alegria Temperatures PISCO: Intertidal Temperature Data: Alegria, California: Carol Blanchette PISCO UCSB Marine Science Institute Santa Barbara CA These temperature data were collected at Alegria Beach, California, and were... OceanographicSensorData Thermistor PISCOCategories Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications. C.Blanchette Transform
SWDBAug 29, 2004 SEEK Overview
SWDBAug 29, 2004 Ecogrid Focus Data and Metadata Distributed Data XML-based Metadata Service to Semantic Mediation Layer Access to Ontologies and Taxon Services Helping with Semantic Data Integration Service to Analysis and Modelling Layer Interaction with Kepler - Workflows Interaction with Grid Computing Facilities Access to Legacy Apps LifeMapper Spatial Data Workbench
SWDBAug 29, 2004 SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate –Hides complexity of underlying systems using lightweight interfaces –Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems –Any system can implement these interfaces –Prototyping using: Metacat, SRB, DiGIR, Xanthoria, etc. Supports multiple metadata standards –EML, Darwin Core as foci
SWDBAug 29, 2004 Web services Service Oriented Architecture (SOA) –Remote discovery and execution of services Network transport of data (HTTP) Message format (SOAP/XML) Service interface description (WSDL) Morpho 12 3 Diagram from
SWDBAug 29, 2004 Grid Services A Grid service is a Web service –plus Lifecycle management –(persisting the service over outages) State management –(tracking sessions across multiple requests) Factory services –(allowing many clients to connect) Security –(authorization) … Ecogrid defines a standard set of grid interfaces for use by many data servers
SWDBAug 29, 2004 EcoGrid Example query() get() EcoGrid WSDL query(session, query) get(session, identifier) EcoGrid Registry 1. Publish 3. Return service description 4. Execute search, handle response 5. Execute get, handle response Morpho 2. Find service
SWDBAug 29, 2004 EcoGrid Query Interfaces Provides a mechanism for search and retrieval of metadata and federated data –Supports third party interaction with search results forwarding of result set identifiers to another service instance for retrieval Different levels of compliance –Low barrier for participation –Bulk of data will be accessible through Type I ResultQuery
SWDBAug 29, 2004 EcoGrid Query Level I Basic, entry level exposure of data and metadata for EcoGrid and SEEK Response contains data – intended for direct communications rather than 3 rd party indirection ResultsetType query(SessionID,QueryType) byte[] get(SessionID,objectID) Result Query
SWDBAug 29, 2004 Query Conditions Language independent representation of a query structure Transformed into the appropriate native language of the data store Example: <condition operator="LIKE“ concept="ScientificName">peromyscus% NULL Query
SWDBAug 29, 2004 Specifying the Resultset Specify the list of concepts (fields) to be returned in the resultset Simple paths used to identify elements or document subtrees Effectively flattens the structure of the records, but allows generic representation Example: /ScientificName /Longitude /Latitude Query
SWDBAug 29, 2004 Full Query Example <egq:query queryId="query-digir.1.1" system=" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid- query-1.0.0beta1../../src/xsd/query.xsd"> /2003/1.0 /ScientificName /Longitude /Latitude Peromyscus genus query Peromyscus Query
SWDBAug 29, 2004 <rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1../../src/xsd/resultset.xsd"> T16:45:50-09: <record number="1" system="1" identifier="mvz1"> PEROMYSCUS LEUCOPUS … Query Result Set Structure Result
SWDBAug 29, 2004 EcoGrid Get & Put get enables retrieval of the content of a dataset/file such as SRB, MetaCat. get also enables SQL querying of relational databases (Oracle, DB2, etc), which are pre-registered as a data source in SRB. put for data: allows users to create (upload) files into EcoGrid resources such as MetCat, SRB. put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User-defined metadata in SRB. –Depends on the availability of an authentication and access control system –put(sessionID, objectID, object, type) –delete(sessionID,objectID)
SWDBAug 29, 2004 Building the EcoGrid ANDLUQNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node HBR
SWDBAug 29, 2004 EcoGrid Client Interactions Modes of interaction –Client-server –Fully distributed –Peer-to-peer EcoGrid Registry –Node discovery –Service discovery Aggregation services –Centralized access –Reliability –Data preservation
SWDBAug 29, 2004 Layers in EcoGrid
SWDBAug 29, 2004 EcoGrid Queries in Kepler
SWDBAug 29, 2004 Metadata-driven analysis cycle
SWDBAug 29, 2004 Status Read, Query & Register Completed Simple Registry Operational EcoGrid Wrappers completed for: –MetaCat –SRB –DiGIR –Xanthoria Available Interfaces –WSDL –Simple Web Interactivity –Kepler
SWDBAug 29, 2004 Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of California, Davis, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, DOE SDM/SciDAC, GEON, and others.
SWDBAug 29, 2004 Q & A
SWDBAug 29, 2004 Frequently Asked Questions … Which version of Grid services do you use? –We currently use 3.2.x because it was the last stable version based on OGSA. It seems that WSRF does not support the OGSA Factory pattern, which is the main Grid Service feature that we utilize and wouldn’t want to lose. We may migrate to WSRF eventually. How can a user (or developer) discover what catalogs are on the EcoGrid? –In Kepler, click the "Sources" button on the Data tab. The UI allows a basic query of the EcoGrid registry to discover new nodes and choose which should be searched. –Developers can program to the EcoGrid Registry API. How much is the EcoGrid *integrated*? Is there a common query language? –Yes, there is a common query syntax for expressing path-based metadata queries. This syntax does not do any mapping among various metadata languages. We still need of a system that can translate a query that uses terms from one metadata language (e.g., DarwinCore) into queries for another metadata language (e.g., EML). The SEEK SMS system will help with this mapping.
SWDBAug 29, 2004 Frequently Asked Questions … Is the EcoGrid a "federation of federations" ? –In a sense. The EcoGrid is an *API* (specifically a Grid Services API) that allows clients to use a common set of communication protocols to access diverse data systems. The EcoGrid API has been implemented for Metacat, DIGIR, and SRB, all of which are federations. As clients can access the various systems via EcoGrid, the latter can be considered a federation of federations. The EcoGrid Registry has a list of systems that have published EcoGrid interfaces that are accessible to clients. Where are the WSDLs? – /ecogrid/EcoGridQueryInterfaceLevelOneService?wsdl What’s on the EcoGrid right now? –The KNB network is gathering data and metadata from NCEAS, 24 LTER sites, and about 200 other field stations (KNB EcoGrid node) –The DIGIR system federates access to museum collections data in the form of Darwin Core records. The EcoGrid node at KU points at this network of about ~150 museums that are accessible through DIGIR. –SRB is currently used to hold some data objects that are described via EML metadata records that are in the KNB Metacat.
SWDBAug 29, 2004 Frequently Asked Questions … Where is the code for the EcoGrid? –Most code is in CVS at seek/projects/ecogrid. Some Kepler-specific client-side UI code is in the Kepler CVS. – –There are also Ecogrid design docs, meeting notes, etc. Are there plans for an "EcoGrid Portal" so that end users can access easily contribute data? –Yes, this is under development. In the interim, one can search the KNB and DIGIR sites individually, or use Kepler.