Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Using Specimen Data in Scientific Workflow Environments to Connect to Metadata Archive and Discovery Services in Environmental Biology CJ Grady, J.H. Beach,
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Distributed components
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Introduction to Kepler Deana Pennington, PhD University of New Mexico LTER Network Office, Sevilleta LTER PI CI-Team: Advancing CI-Based Science through.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
January, 23, 2006 Ilkay Altintas
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Knb.ecoinformatics.org LTER EML Best Practices Data Discovery in the Biological Sciences 7-9 February 2005 Mark Servilla LTER Network Office University.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Data, Metadata, and Ontology in Ecology Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
Information Management using Ecological Metadata Language Corinna Gries - CAP Margaret O’Brien - SBC.
Analysis and Modeling System Breakout Create a semi-automated system for analyzing data and executing models that provides documentation, archiving, and.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Introduction to The Storage Resource.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Lifemapper 2.0 Using and Creating Geospatial Data and Open Source Tools for the Biological Community Aimee Stewart, CJ Grady, Dave Vieglais, Jim Beach.
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
A Semantic Type System and Propagation
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)

Plan of the Day 09:00-10:30 EcoGrid Design & Architecture (Rajasekar & Jones) EcoGrid Interfaces (Vieglais) EcoGrid Services (Zhu, Tao & Spears) 10:30-11:00 - Tea 11:00-12:30 Introduction. Paul Watson, Newcastle University DataGrid Projects at Daresbury Labs, Ananta Manadhar Eldas. Stephen Rutherford, Edikt Project 12:30-14:00 - Lunch 14:00-17:30 Demos & Discussions

EcoGrid Design & Architecture the resource-access fabric for the Science Environment for Ecological Knowledge Arcot Rajasekar San Diego SupercomputerCenter Matt Jones National Center for Ecological Analysis and Synthesis

What is SEEK? Science Environment for Ecological Knowledge (SEEK) Multidisciplinary research project to create: Distributed data network (EcoGrid) Environmental, ecological, and systematics data Scalable systems for scientific analysis (workflow systems) Systems for semi-automated data and model integration Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC

What is EcoGrid? Seamless access service to distributed data and metadata scalability, multiplicity of platforms and storage devices authentication through single sign-on authentication multi-level access control, Maintain a registry for data, metadata, ontologies, services Allow rapid incorporation of new data sources as well as decades of legacy ecological data, Provide extensible, ecologically-relevant metadata based on the Ecological Metadata Language, Replicate and version of data to provide fault tolerance, disaster recovery and load balancing. Enable execution of applications as workflows Help semantic integration of data resources

Principal foci for SEEK Planning Metadata Entry Data Acquisition Quality Assurance Storage and Access Data Integration Analysis and Modeling Synthesis

SEEK Overview

SEEK Science Environment for Ecological Knowledge EcoGrid Uniform interfaces to manage environmental data Kepler Modeling scientific workflows Sparrow “Smart” data discovery and integration Knowledge Representation Classification and Nomenclature Biodiversity and Ecological Analysis and Modeling

SEEK EcoGrid Goal: standardize interfaces (using web and grid services) We have standardized data via EML Integrate diverse data networks from ecology, biodiversity, and environmental sciences Grid-standardized interfaces Uniform interface to: Metacat, SRB, DiGIR, Xanthoria, etc. Anyone can implement these interfaces Hides complexity of underlying systems Metadata-mediated data access Supports multiple metadata standards EML, Darwin Core as foci Computational services Pre-defined analytical services On-the-fly analytical services

EcoGrid client interactions Modes of interaction Client-server Fully distributed Peer-to-peer EcoGrid Registry Node discovery Service discovery Aggregation services Centralized access Reliability Data preservation

Ecogrid Focus Data and Metadata Distributed Data XML-based Metadata Service to Semantic Mediation Layer Access to Ontologies and Taxon Services Helping with Semantic Data Integration Service to Analysis and Modelling Layer Interaction with Kepler - Workflows Interaction with Grid Computing Facilities Access to Legacy Apps LifeMapper Spatial Data Workbench

EcoGrid Resources ANDLUQHBRNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node

Ecological Metadata Language Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion EML Common language for archiving and transporting data Discovery information Creator, Title, Abstract, Keyword, etc. Content Context Physical, logical structure SEEK will add semantic structure

An Example EML Document Alegria Temperatures PISCO: Intertidal Temperature Data: Alegria, California: Carol Blanchette PISCO UCSB Marine Science Institute Santa Barbara CA These temperature data were collected at Alegria Beach, California, and were... OceanographicSensorData Thermistor PISCOCategories Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications. C.Blanchette Transform

Metadata driven data ingestion Key information needed to read and machine process a data file is in the metadata File descriptors (CSV, Excel, RDBMS, etc.) Entity (table) and Attribute (column) descriptions Name Type (integer, float, string, etc.) Codes (missing values, nulls, etc.) Integrity constraints In the future, this will include semantic typing

Metadata revision Metadata needs to be revised following any transformation Versioning of metadata and data is important to reuse/repeatabilty The process describes the data lineage as it has been transformed Derived data sets can be stored in EcoGrid with provenance

AMS: Workflows in SEEK In the SEEK model, data ingestion/cleaning can be metadata driven (specifically with EML) Output generation includes creating appropriate metadata The analysis pipeline itself becomes metadata Query EcoGrid to find data Archive output to EcoGrid

Kepler: scientific workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; Kepler captures this knowledge

Kepler: grid services access

Kepler: ecological modeling

GARP Invasive Species Model Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) DiGIR Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample + A3 + A2 + A1 Data Calculation MapValidation User ValidationMap SRB Environmental layers (invasion area) (b) Integrated layers (invasion area) (c) Invasion area prediction map (f) DiGIR Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) SRB Environmental layers (native range) (b) Model quality parameter (g) Slide from D. Pennington Scientific workflows represent knowledge about the process; AMS captures this knowledge

Label data with semantic types Label inputs and outputs of analytical components with semantic types Use reasoning engines to generate transformation steps Beware analytical constraints Use reasoning engine to discover relevant components SMS: Semantic Mediation DataOntologyWorkflow Components

Homogeneous data integration Integration of homogeneous or mostly homogeneous data via EML metadata is relatively straightforward

Heterogeneous Data integration Requires advanced metadata and processing Attributes must be semantically typed Collection protocols must be known Units and measurement scale must be known Measurement relationships must be known e.g., that ArealDensity=Count/Area

Ecological ontologies What was measured (e.g., biomass) Type of measurement (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight) SEEK intends to enable community-created ecological ontologies using OWL Represents a controlled vocabulary for ecological metadata More about this in Bertram’s talk

Layers in EcoGrid

EcoGrid Node

EcoGrid Resources EcoGrid Registry SRB MetaCat Xanthoria Diggir VegBank

Status Read, Query & Register Completed Simple Registry Operational EcoGrid Wrappers completed for: MetaCat SRB DiGGiR Xanthoria Available Interfaces WSDL Simple Web Interactivity Kepler

Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number ), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON