Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Introduction to Kepler Deana Pennington, PhD University of New Mexico LTER Network Office, Sevilleta LTER PI CI-Team: Advancing CI-Based Science through.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Long Term Ecological Research Network Information System LTER Grid Pilot Study LTER Information Manager’s Meeting Montreal, Canada 4-7 August 2005 Mark.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Enabling Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing Christopher J. Crosby, J Ramón Arrowsmith.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
Ontologies in Data and Application Integration – an Update Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
Analysis and Modeling System Breakout Create a semi-automated system for analyzing data and executing models that provides documentation, archiving, and.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
A Semantic Type System and Propagation
KEPLER: Overview and Project Status
Presentation transcript:

Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego UC Santa Barbara UC San Diego U New Mexico U Kansas Vermont, Napier, ASU, UNC

SEEK Overview, 3/ Architecture Overview Analysis & Modeling System –Design and execution of ecological models and analysis –End user focus – application-/upperware Semantic Mediation System –Data Integration of hard- to-relate sources and processes –Semantic Types and Ontologies – upper middleware EcoGrid –Access to ecology data and tools – middle-/underware Plus Working Groups: – Knowledge Representation (SEEK-KR) – Classification and Nomenclature (TAXON) – Biodiversity and Ecological Analysis and Modeling (BEAM) (cf. GEON + Cyberinfrastructure)

SEEK Overview, 3/ SEEK EcoGrid Goal: standardize interfaces (using web and grid services) –We have standardized data via EML –Integrate diverse data networks from ecology, biodiversity, and environmental sciences Grid-standardized interfaces –Uniform interface to: Metacat, SRB, DiGIR, Xanthoria, etc. Anyone can implement these interfaces Hides complexity of underlying systems Metadata-mediated data access –Supports multiple metadata standards –EML, Darwin Core as foci Computational services –Pre-defined analytical services –On-the-fly analytical services

SEEK Overview, 3/ Grid versus Web Services Grid Services are Web Services –Add authentication, lifecycle management, notification, etc. –Globus Toolkit 3: Implements Open Grid Services Architecture (OGSA) Implications for use –Write a normal web service extending GridService base class –When deployed within GT3, you get these extra functions for ‘free’ –Supports distributed computation via proxy authentication Problems –Complex system to understand –GT3 can be difficult to deploy –Proposals to incorporate grid services within the Web services community (Web Services Resource Framework [WSRF])

SEEK Overview, 3/ EcoGrid client interactions Modes of interaction –Client-server –Fully distributed –Peer-to-peer EcoGrid Registry –Node discovery –Service discovery Aggregation services –Centralized access –Reliability –Data preservation

SEEK Overview, 3/ Building the EcoGrid ANDLUQHBRNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node

SEEK Overview, 3/ Kepler: Scientific Workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; Kepler captures this knowledge Query EcoGrid to find data Archive output to EcoGrid

SEEK Overview, 3/ GARP Invasive Species Model Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) DiGIR Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample + A3 + A2 + A1 Data Calculation MapValidation User ValidationMap SRB Environmental layers (invasion area) (b) Integrated layers (invasion area) (c) Invasion area prediction map (f) DiGIR Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) SRB Environmental layers (native range) (b) Model quality parameter (g) Slide from D. Pennington Scientific workflows represent knowledge about the process; AMS captures this knowledge

SEEK Overview, 3/ Kepler Team, Projects, Sponsors Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludäscher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Yang Zhao Ptolemy II … Ptolemy II

SEEK Overview, 3/ Kepler Understands EML Data (Chad Berkley, SEEK)

SEEK Overview, 3/ Kepler: Ecological Modeling (Chad Berkley, SEEK)

SEEK Overview, 3/ Database Access (Efrat Jaeger, GEON) Note: EML descriptions of relational sources would allow automated data ingestion

SEEK Overview, 3/ Mineral Classification with Kepler … (Efrat Jaeger, GEON)

SEEK Overview, 3/ … inside the Classifier

SEEK Overview, 3/ Standard BrowserUI: Client-Side SVG

SEEK Overview, 3/ SWF Reengineering (Ilkay, SDM; Ashraf, Efrat, Kai, GEON)

SEEK Overview, 3/ DataMapper Sub-Workflow

SEEK Overview, 3/ Result launched via BrowserUI actor (coupling with ESRI’s ArcIMS)

SEEK Overview, 3/ Distributed Workflows in KEPLER Web and Grid Service plug-ins –WSDL (now) and Grid services (stay tuned …) –ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard –SSH, SCP, SDSC SRB, OGS?-???… coming WS Harvester –Import query-defined WS operations as Kepler actors XSLT and XQuery Data Transformers –to link not “designed-to-fit” web services WS-deployment interface (planned)

SEEK Overview, 3/ Web Service Actor (Ilkay Altintas, SDM) Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. Configure - select service operation

SEEK Overview, 3/ Set Parameters and Commit Set parameters and commit

SEEK Overview, 3/ Specialized WS Actor (after instantiation)

SEEK Overview, 3/ Web Service Harvester (Ilkay Altintas, SDM) Imports the web services in a repository into the actor library. Has the capability to search for web services based on a keyword.

SEEK Overview, 3/ Kepler: Grid Services Access (Steve Mock, NMI)

SEEK Overview, 3/ An (oversimplified) Model of the Grid Hosts : {h1, h2, h3, …} Hosts : i }, j }, … Hosts : i }, j }, … Given : data/workflow: … as a functional plan: […; Y := f(X); Z := g(Y); …] … as a logic plan: […; f(X,Y)  g(Y,Z); …] Find Host Assignment : d i  h i, f j  h j for all d i, f j … s.t. […; := …] is a valid plan f g X Y Z

SEEK Overview, 3/ Shipping & Handling Algebra (SHA) plan = of = 1.[ to A, := to C ] 2.[ => B, := to C ] 3.[ to C, => C, := ] Logical view Physical view: SHA Plans (1) (3) (2)

SEEK Overview, 3/ Grid-Enabling PTII: Handles AB GAGB 1.A  GA: get_handle 2.GA  A: return &X 3.A  B: send &X 4.B  GB: request &X 5.GB  GA: request &X 6.GA  GB: send *X 7.GB  B: send done(&X) Example : &X = “GA.17” *X = Candidate Formalisms : GridFTP SSH, SCP SDSC SRB OGS?-??? … WSRF? Kepler space Grid space Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

SEEK Overview, 3/ Homogeneous Data Integration Integration of homogeneous or mostly homogeneous data via EML metadata is relatively straightforward

SEEK Overview, 3/ Heterogeneous Data integration Requires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area

SEEK Overview, 3/ Label data with semantic types Label inputs and outputs of analytical components with semantic types Use reasoning engines to generate transformation steps –Beware analytical constraints Use reasoning engine to discover relevant components Semantic Mediation DataOntologyWorkflow Components

SEEK Overview, 3/ Ecological ontologies What was measured (e.g., biomass) Type of measurement (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight) SEEK intends to enable community-created ecological ontologies using OWL –Represents a controlled vocabulary for ecological metadata

SEEK Overview, 3/ Extensions : Semantic Types Take concepts and relationships from an ontology to “semantically type” the data-in/out ports Application: e.g., design support: –smart/semi-automatic wiring, generation of “massaging actors” m 1 (normalize) p3p3 p4p4 Takes Abundance Count Measurements for Life Stages Returns Mortality Rate Derived Measurements for Life Stages

SEEK Overview, 3/

SEEK Overview, 3/

SEEK Overview, 3/ Semantic Types The semantic type signature –Type expressions over the (OWL) ontology m 1 (normalize) p3p3 p4p4 SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty -> DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty

SEEK Overview, 3/ Extended Type System (here: OWL Semantic Types) SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty  DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty Substructure association: XML raw-data =(X)Query=> object model =link => OWL ontology

SEEK Overview, 3/ Semantic Types for Scientific Workflows

SEEK Overview, 3/ Deriving Data Transformations from Semantic Service Registration [Bowers-Ludaescher, DILS’04]

SEEK Overview, 3/ Structural and Semantic Mappings [Bowers-Ludaescher, DILS’04]

SEEK Overview, 3/ Fundamental improvements for researchers –Global access to ecologically relevant data –Rapidly locate and utilize distributed computation –Capture, reproduce, extend analysis process SEEK Impact

SEEK Overview, 3/ Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON