EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
The MetaDater Model and the formation of a GRID for the support of social research John Kallas Greek Social Data Bank National Center for Social Research.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
North American initiatives in Ecoinformatics: Vegbank and SEEK Robert K. Peet and The Ecological Society of America Vegetation Panel The SEEK development.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Refactoring the EarthGrid SOAP API to REST style and implementing it to Metacat Serhan Akın Ph.D. candidate in Earth System Sciences Institute of Earth.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Knb.ecoinformatics.org LTER EML Best Practices Data Discovery in the Biological Sciences 7-9 February 2005 Mark Servilla LTER Network Office University.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Data, Metadata, and Ontology in Ecology Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
LTER Data Management Margaret O’Brien Santa Barbara Coastal Long Term Ecological Research (LTER) Project Santa Barbara Channel Biodiversity Observation.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Information Management using Ecological Metadata Language Corinna Gries - CAP Margaret O’Brien - SBC.
Analysis and Modeling System Breakout Create a semi-automated system for analyzing data and executing models that provides documentation, archiving, and.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Introduction to The Storage Resource.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Marine Metadata Interoperability Acknowledgements Ongoing funding for this project is provided by the National Science Foundation.
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Award No: SES/SBE Project Title: Interoperability Strategies for Scientific Cyberinfrastructure: A Comparative Study Investigators: Geoffrey C.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Strategies for NIS Development
Lecture 8 Database Implementation
Flanders Marine Institute (VLIZ)
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
A Semantic Type System and Propagation
Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March.
Presentation transcript:

EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University of California, San Diego

What is SEEK? Science Environment for Ecological Knowledge (SEEK) Multidisciplinary research project to create: Distributed data network (EcoGrid) Environmental, ecological, and systematics data Scalable systems for scientific analysis (workflow systems) Systems for semi-automated data and model integration Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC

Science Environment for Ecological Knowledge Research Objectives Access to ecological, environmental, and biodiversity data Enable data sharing & re-use Enhance data discovery at global scales Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of data Address data heterogeneity issues Enable communication and collaboration for analysis Enable re-use of analytical components Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC

SEEK Overview

SEEK Components Science Environment for Ecological Knowledge Kepler Modeling scientific workflows EcoGrid Making diverse environmental data systems interoperate Semantic Mediation System “Smart” data discovery and integration Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training

SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate Hides complexity of underlying systems using lightweight interfaces Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems Any system can implement these interfaces Prototyping using: Metacat, SRB, DiGIR, Xanthoria, etc. Supports multiple metadata standards EML, Darwin Core as foci

EcoGrid client interactions Modes of interaction Client-server Fully distributed Peer-to-peer EcoGrid Registry Node discovery Service discovery Aggregation services Centralized access Reliability Data preservation

Ecogrid Focus Data and Metadata Distributed Data XML-based Metadata Service to Semantic Mediation Layer Access to Ontologies and Taxon Services Helping with Semantic Data Integration Service to Analysis and Modelling Layer Interaction with Kepler - Workflows Interaction with Grid Computing Facilities Access to Legacy Apps LifeMapper Spatial Data Workbench

EcoGrid Node

Layers in EcoGrid

Ecological Metadata Language Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion EML Common language for archiving and transporting data Discovery information Creator, Title, Abstract, Keyword, etc. Content Context Physical, logical structure SEEK will add semantic structure

An Example EML Document Alegria Temperatures PISCO: Intertidal Temperature Data: Alegria, California: Carol Blanchette PISCO UCSB Marine Science Institute Santa Barbara CA These temperature data were collected at Alegria Beach, California, and were... OceanographicSensorData Thermistor PISCOCategories Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications. C.Blanchette Transform

Metadata driven data ingestion Key information needed to read and machine process a data file is in the metadata File descriptors (CSV, Excel, RDBMS, etc.) Entity (table) and Attribute (column) descriptions Name Type (integer, float, string, etc.) Codes (missing values, nulls, etc.) Integrity constraints In the future, this will include semantic typing

Heterogeneous Data integration Requires advanced metadata and processing Attributes must be semantically typed Collection protocols must be known Units and measurement scale must be known Measurement relationships must be known e.g., that ArealDensity=Count/Area

Ecological ontologies What was measured (e.g., biomass) Type of measurement (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight) SEEK intends to enable community-created ecological ontologies using OWL Represents a controlled vocabulary for ecological metadata More about this in Bertram’s talk

EcoGrid Resources ANDLUQHBRNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node

EcoGrid Resources EcoGrid Registry SRB MetaCat Xanthoria Diggir VegBank

EcoGrid Node

EcoGrid Query Service Ecogrid Query adopts a query schema, Query Document Schema, as a common query language within Ecogrid. <egq:query queryId="test.1.1" system=" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1" xmlns:xsi=" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1../../src/xsd/query.xsd"> eml://ecoinformatics.org/eml size owner min. value max. value value units --> metadata query for Eco Models /home/whywhere.seek %World Geodetic System% /home/whywhere.seek %World Geodetic System% 39.11

Ecogrid Services implementation for GET/PUT  The ‘get’ call from ecogrid client enables retrieval of the content of a dataset/file such as SRB, MetaCat.  The ‘get’ function also be enables SQL querying of relational databases (Oracle, DB2, etc), which are pre- registered as a data source in SRB.  Put for data: Ecogrid put service allows users to create (upload) files into EcoGrid resources such as MetCat, SRB.  Put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User- defined metadata in SRB.

EcoGrid Queries in Kepler

EML Metadata Display in Kepler

EcoGrid Sources in Kepler

Query Builder

Status Read, Query & Register Completed Simple Registry Operational EcoGrid Wrappers completed for: MetaCat SRB DiGGiR Xanthoria Available Interfaces WSDL Simple Web Interactivity Kepler

Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number ), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON