Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Education, Outreach and Training. Specifications Document Overall objective: Better integration of ecoinformatics, in general, and SEEK tools, specifically,
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Computational Physics Kepler Dr. Guy Tel-Zur. This presentations follows “The Getting Started with Kepler” guide. A tutorial style manual for scientists.
GIS Actors in Kepler - Java-based, GDAL-JNI, and C++(Grass) Routines Dan Higgins - UC Santa Barbara (NCEAS) Chad Berkley – UC Santa Barbara (NCEAS) Jianting.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
North American initiatives in Ecoinformatics: Vegbank and SEEK Robert K. Peet and The Ecological Society of America Vegetation Panel The SEEK development.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
 Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Composing Models of Computation in Kepler/Ptolemy II
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Taverna and my Grid Basic overview and Introduction Tom Oinn
Long Term Ecological Research Network Information System LTER Grid Pilot Study LTER Information Manager’s Meeting Montreal, Canada 4-7 August 2005 Mark.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Knb.ecoinformatics.org LTER EML Best Practices Data Discovery in the Biological Sciences 7-9 February 2005 Mark Servilla LTER Network Office University.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Data, Metadata, and Ontology in Ecology Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Information Management using Ecological Metadata Language Corinna Gries - CAP Margaret O’Brien - SBC.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara Advancing Software for Ecological.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Strategies for NIS Development
Flanders Marine Institute (VLIZ)
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
A Semantic Type System and Propagation
Bird of Feather Session
Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March.
Presentation transcript:

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira (with many, many slides from Matt Jones, Bertram Ludäscher, Ilkay Altintas, Chad Berkeley and others) University of Kansas, USA June 30, 2005

SWDBAug 29, 2004 June, 2005 Outline Introduction to SEEK Introduction to Kepler Kepler capabilities and sample workflows Current and future developments

SWDBAug 29, 2004 June, 2005 What is SEEK? Science Environment for Ecological Knowledge Multidisciplinary project to create: Scientific-workflow system (Kepler) –Design, document, reuse, and execute scientific analyses Distributed data network (EcoGrid) –Environmental, ecological, and systematics data Knowledge Representation & Semantic Mediation –Discover, integrate, and compose hard-to-relate data and services via ontologies Taxonomic, Biology, and Education subcomponents Collaborators (the SEEK team) NCEAS, UNM, SDSC/UCSD, U Kansas, UC Davis Vermont, Napier, ASU, UNC

SWDBAug 29, 2004 June, 2005 Scientific Workflows Model the way scientists work with their data now –Mentally coordinate export and import of data among software systems 1)Capture data in the field 2)Digitize it into Excel spreadsheets 3)Export as CSV files 4)Import into statistical package 5)Perform analysis 6)Export results, tables and graphics 7)Write and publish article Query EcoGrid to find data Archive output to EcoGrid with workflow metadata

SWDBAug 29, 2004 June, 2005 Scientific Workflows Scientific workflows are: –Not linear –Involve multiple data sets –Involve multiple analytical steps

SWDBAug 29, 2004 June, 2005 Metadata driven data ingestion Key information needed to read and machine process a data file is in the metadata –File descriptors (CSV, Excel, RDBMS, etc.) –Entity (table) and Attribute (column) descriptions Name Type (integer, float, string, etc.) Codes (missing values, nulls, etc.) In the future, this will include semantic typing

SWDBAug 29, 2004 June, 2005 Metadata driven data ingestion Metadata is revised following any transformation Versioning of metadata and data is very important This process results in a lineage of the data file as it has been transformed

SWDBAug 29, 2004 June, 2005 Data integration Integration of heterogeneous data requires much more advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement mechanics must be known (i.e. that Density=Count/Area) –This is an advanced research topic within the SEEK project

SWDBAug 29, 2004 June, 2005 Label data with semantic types Label inputs and outputs of analytical components with semantic types Use SMS to generate transformation steps –Beware analytical constraints Use SMS to discover relevant components Ontology = specification of a conceptualization (a knowledge map) Semantic typing DataOntologyWorkflow Components

SWDBAug 29, 2004 June, 2005 SEEK Components Revisited

SWDBAug 29, 2004 June, 2005 SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate –Hides complexity of underlying systems using lightweight interfaces –Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems –Any system can implement these interfaces –Prototyping using: Metacat, SRB, DiGIR, Xanthoria, etc. Supports multiple metadata standards –EML, Darwin Core as foci Implemented as OGSA Grid Services –Query() –Get() –Put() –Login() –… Tiered-implementation critical to adoption

SWDBAug 29, 2004 June, 2005 Kepler: Scientific Workflows Implements the workflow system in SEEK Open, collaborative effort of: –SEEK, SciDAC/SDM, GEON, Ptolemy Project –Ecology, biodiversity, molecular bio, geology, engineering Based on Ptolemy II system Kepler aims to extend the Ptolemy system with: –Web and grid service access –Data integration support –Semantic reasoning Kepler actors are written in Java but can wrap other applications (such as MATLAB, GRASS) Actors can call arbitrary Web (or Grid) Services Ptolemy already has a very large inventory of actors

SWDBAug 29, 2004 June, 2005 Actor Search and Browse Actors Panel –Large number of actors –Organized hirarchically –Search makes it easy to find right actor –Ontology-based Plan to support multiple views

SWDBAug 29, 2004 June, 2005 EcoGrid: EML Data Access

SWDBAug 29, 2004 June, 2005 EcoGrid: Queries

SWDBAug 29, 2004 June, 2005 EcoGrid: Queries

SWDBAug 29, 2004 June, 2005 EML Metadata Display

SWDBAug 29, 2004 June, 2005 EcoGrid: DarwinCore Access

SWDBAug 29, 2004 June, 2005 Kepler: database access

SWDBAug 29, 2004 June, 2005 Kepler: web service example

SWDBAug 29, 2004 June, 2005 Kepler: grid services access

SWDBAug 29, 2004 June, 2005 Kepler: ecological modeling

SWDBAug 29, 2004 June, 2005 New ENM Workflow

SWDBAug 29, 2004 June, 2005 Data Analysis: Biodiversity Indices

SWDBAug 29, 2004 June, 2005 R in Kepler Source: Dan Higgins, Kepler/SEEK

SWDBAug 29, 2004 June, 2005 ORB

SWDBAug 29, 2004 June, 2005 Kepler today Supports scientific workflows –Ecology, molecular bio, geology, … –Variety of analytical components (including spatial data transformations) –Support for R scripts and Matlab scripts EcoGrid access to heterogeneous data –EML Data support Experimental data, survey data, spatial raster and vector data, etc. –DarwinCore Data support Museum collections –EcoGrid registry to discover data sources Ontology-based browsing for analytical components –Exploit semantics to improve the user experience Demonstration workflows –Ecology: Ecological Niche Modeling –Genomics: Promoter Identification Workflow –Geology: Geologic Map Information Integration –Oceanography: Real-time Revelle example of data access

SWDBAug 29, 2004 June, 2005 Kepler this year Usability engineering –Full evaluation and user-oriented customization of all UI components Distributed computing/grid computing –Large jobs, lots of machines –Detached execution Component repository / downloadable components Smart data and component discovery –Support annotating data sources Automated data and service integration and transformation using ontologies Complete EcoGrid access –Full EML support –Support for large data and 3 rd -party transfer –More data sources and types of data sources (e.g., JDBC, GEON data) Provenance and metadata propagation

SWDBAug 29, 2004 June, 2005 Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number ), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON