Download presentation
Presentation is loading. Please wait.
Published byMilton Solomon Stephens Modified over 9 years ago
1
http://knb.ecoinformatics.org http://seek.ecoinformatics.org Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara
2
Science Environment for Ecological Knowledge Research Objectives Access to ecological, environmental, and biodiversity data Enable data sharing & re-use Enhance data discovery at global scales Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of data Address data heterogeneity issues Enable communication and collaboration for analysis Enable re-use of analytical components Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC
3
SEEK Components Science Environment for Ecological Knowledge Kepler Modeling scientific workflows EcoGrid Making diverse environmental data systems interoperate Semantic Mediation System “Smart” data discovery and integration Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training
4
Scientific Workflows Model the way scientists work with their data now Mentally coordinate export and import of data among software systems Workflows emphasize data flow Output generation includes creating appropriate metadata The analysis workflow itself becomes metadata The workflow describes the data lineage as it has been transformed Derived data sets can be stored in EcoGrid with provenance Query EcoGrid to find data Archive output to EcoGrid with workflow metadata
5
Kepler: scientific workflows Collaborative effort of SEEK, SciDAC/SDM, GEON, Ptolemy Project
6
Kepler understands EML data
7
Kepler: molecular biology example
8
SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate Hides complexity of underlying systems using lightweight interfaces We have standardized data via EML, need standard APIs Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems Any system can implement these interfaces Prototyping using: Metacat, SRB, DiGIR, Xanthoria, etc. Supports multiple metadata standards EML, Darwin Core as foci
9
EcoGrid client interactions Modes of interaction Client-server Fully distributed Peer-to-peer EcoGrid Registry Node discovery Service discovery Aggregation services Centralized access Reliability Data preservation
10
EcoGrid Query Interfaces Provides a mechanism for search and retrieval of metadata and federated data Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I ResultQuery
11
Query Interfaces Implemented Initial prototype to support query and retrieval from: Storage Resource Broker (SRB) Metacat Distributed Generic Information Retrieval (DiGIR) Xanthoria Encourage additional experimentation with and feedback based on other system implementations
12
EcoGrid Query Level I Basic, entry level exposure of data and metadata for EcoGrid and SEEK Response contains data – intended for direct communications rather than 3 rd party indirection ResultsetType query(SessionID,QueryType) byte[] get(SessionID,objectID) ResultQuery
13
Query Conditions Language independent representation of a query structure Transformed into the appropriate native language of the data store Example: <condition operator="LIKE“ concept="ScientificName">peromyscus% NULL Query
14
Specifying the Resultset Specify the list of concepts (fields) to be returned in the resultset Simple paths used to identify elements or document subtrees Effectively flattens the structure of the records, but allows generic representation Example: /ScientificName /Longitude /Latitude Query
15
Full Query Example <egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query- 1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid- query-1.0.0beta1../../src/xsd/query.xsd"> http://digir.net/schema/conceptual/darwin/2 003/1.0 /ScientificName /Longitude /Latitude Peromyscus genus query Peromyscus Query
16
Query Result Set Structure <rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1../../src/xsd/resultset.xsd"> 2003-05-02T16:45:50-09:00 1 2 http://digir.net/schema/conceptual/darwin/2003/1.0 <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2 <record number="1" system="1" identifier="mvz1"> PEROMYSCUS LEUCOPUS NOVEBORACENSIS 100 200 … Result
17
EcoGrid Query Level II More detailed handling of results Uses RSIDs to identify resultsets- handles that can be passed to a third party RSID search(SessionID,query) Resultset retrieve(SessionID,RSID,start,numrecs) query decodeResultsetIdentifier(SessionID,RSID) statusinfo getResultStatus(SessionID) int transfer(SessionID,sourceURL,destURL,ObjectID)
18
EcoGrid Write Used to push data back to sources (e.g. publishing EML documents) Depends on the availability of an authentication and access control system put(sessionID, objectID, object, type) delete(sessionID,objectID)
19
Data Instance Query New requirement to support direct query and retrieval with arbitrary data sets Generally no common schemas between different instances Could either Push data instance to service that can query object (e.g. the SRB) Implement interface at the data instance location Simple JDBC / SQL interface? dbSchema getDataSchema(sessionID,objectID) dbResultset search(sessionID,objectID,SQL)
20
Building the EcoGrid ANDLUQNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node HBR
21
Metadata-driven analysis cycle
22
Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.