Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.

Slides:



Advertisements
Similar presentations
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Advertisements

Using Specimen Data in Scientific Workflow Environments to Connect to Metadata Archive and Discovery Services in Environmental Biology CJ Grady, J.H. Beach,
Integrating Biodiversity Data
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
North American initiatives in Ecoinformatics: Vegbank and SEEK Robert K. Peet and The Ecological Society of America Vegetation Panel The SEEK development.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
Building the LTER Network Information System. NIS History, Then and Now YearMilestone 1993 – 1996NIS vision formed by Information Managers (IMs) and LTER.
Matthew B. Jones Jim Regetz National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara NCEAS Synthesis Institute.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
A Proposal for a Distributed Earth Observation Data Network Matthew B Jones UC Santa Barbara National Center for Ecological Analysis and Synthesis (NCEAS)
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Refactoring the EarthGrid SOAP API to REST style and implementing it to Metacat Serhan Akın Ph.D. candidate in Earth System Sciences Institute of Earth.
Supporting Large-Scale Science with Workflows Deana Pennington University of New Mexico Long-Term Ecological Research Network Office ITR: Science Environment.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Knb.ecoinformatics.org LTER EML Best Practices Data Discovery in the Biological Sciences 7-9 February 2005 Mark Servilla LTER Network Office University.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
LTER Data Management Margaret O’Brien Santa Barbara Coastal Long Term Ecological Research (LTER) Project Santa Barbara Channel Biodiversity Observation.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Analysis and Modeling System Breakout Create a semi-automated system for analyzing data and executing models that provides documentation, archiving, and.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Object storage and object interoperability
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara Advancing Software for Ecological.
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Award No: SES/SBE Project Title: Interoperability Strategies for Scientific Cyberinfrastructure: A Comparative Study Investigators: Geoffrey C.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Strategies for NIS Development
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
A Semantic Type System and Propagation
Bird of Feather Session
Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March.
Presentation transcript:

Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara

Science Environment for Ecological Knowledge Research Objectives Access to ecological, environmental, and biodiversity data Enable data sharing & re-use Enhance data discovery at global scales Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of data Address data heterogeneity issues Enable communication and collaboration for analysis Enable re-use of analytical components Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC

SEEK Components Science Environment for Ecological Knowledge Kepler Modeling scientific workflows EcoGrid Making diverse environmental data systems interoperate Semantic Mediation System “Smart” data discovery and integration Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training

Scientific Workflows Model the way scientists work with their data now Mentally coordinate export and import of data among software systems Workflows emphasize data flow Output generation includes creating appropriate metadata The analysis workflow itself becomes metadata The workflow describes the data lineage as it has been transformed Derived data sets can be stored in EcoGrid with provenance Query EcoGrid to find data Archive output to EcoGrid with workflow metadata

Kepler: scientific workflows Collaborative effort of SEEK, SciDAC/SDM, GEON, Ptolemy Project

Kepler understands EML data

Kepler: molecular biology example

SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate Hides complexity of underlying systems using lightweight interfaces We have standardized data via EML, need standard APIs Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems Any system can implement these interfaces Prototyping using: Metacat, SRB, DiGIR, Xanthoria, etc. Supports multiple metadata standards EML, Darwin Core as foci

EcoGrid client interactions Modes of interaction Client-server Fully distributed Peer-to-peer EcoGrid Registry Node discovery Service discovery Aggregation services Centralized access Reliability Data preservation

EcoGrid Query Interfaces Provides a mechanism for search and retrieval of metadata and federated data Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I ResultQuery

Query Interfaces Implemented Initial prototype to support query and retrieval from: Storage Resource Broker (SRB) Metacat Distributed Generic Information Retrieval (DiGIR) Xanthoria Encourage additional experimentation with and feedback based on other system implementations

EcoGrid Query Level I Basic, entry level exposure of data and metadata for EcoGrid and SEEK Response contains data – intended for direct communications rather than 3 rd party indirection ResultsetType query(SessionID,QueryType) byte[] get(SessionID,objectID) ResultQuery

Query Conditions Language independent representation of a query structure Transformed into the appropriate native language of the data store Example: <condition operator="LIKE“ concept="ScientificName">peromyscus% NULL Query

Specifying the Resultset Specify the list of concepts (fields) to be returned in the resultset Simple paths used to identify elements or document subtrees Effectively flattens the structure of the records, but allows generic representation Example: /ScientificName /Longitude /Latitude Query

Full Query Example <egq:query queryId="query-digir.1.1" system=" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid- query-1.0.0beta1../../src/xsd/query.xsd"> 003/1.0 /ScientificName /Longitude /Latitude Peromyscus genus query Peromyscus Query

Query Result Set Structure <rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1../../src/xsd/resultset.xsd"> T16:45:50-09: <system id="1"> <record number="1" system="1" identifier="mvz1"> PEROMYSCUS LEUCOPUS NOVEBORACENSIS … Result

EcoGrid Query Level II More detailed handling of results Uses RSIDs to identify resultsets- handles that can be passed to a third party RSID search(SessionID,query) Resultset retrieve(SessionID,RSID,start,numrecs) query decodeResultsetIdentifier(SessionID,RSID) statusinfo getResultStatus(SessionID) int transfer(SessionID,sourceURL,destURL,ObjectID)

EcoGrid Write Used to push data back to sources (e.g. publishing EML documents) Depends on the availability of an authentication and access control system put(sessionID, objectID, object, type) delete(sessionID,objectID)

Data Instance Query New requirement to support direct query and retrieval with arbitrary data sets Generally no common schemas between different instances Could either Push data instance to service that can query object (e.g. the SRB) Implement interface at the data instance location Simple JDBC / SQL interface? dbSchema getDataSchema(sessionID,objectID) dbResultset search(sessionID,objectID,SQL)

Building the EcoGrid ANDLUQNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node HBR

Metadata-driven analysis cycle

Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number ), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)