Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences Paul Smart, Ali.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
DSM Workshop, October 22 OOPSLA 2006 Model-Based Workflows Leonardo Salayandía University of Texas at El Paso.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Hybrid-Type Extensions for Actor-Oriented Modeling (a.k.a. Semantic Data-types for Kepler) Shawn Bowers & Bertram Ludäscher University of California, Davis.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Composing Models of Computation in Kepler/Ptolemy II
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
1 Peter Fox Data Science – ITEC/CSCI/ERTH-6961 Week 10, November 6, 2012 Data Workflow Management, Data Preservation and Stewardship.
SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Dimitrios Skoutas Alkis Simitsis
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH Week 11, April 20, 2010 Information management and workflow.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
Specifications document A number of revisions & refinements done => upcoming revision of design document Summary: –support smart data discovery find data.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Building Scientific Workflows for the Fisheries and Aquaculture Management Community based on Virtual Research Environments Pedro Andrade (CERN)
GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science.
An Ontological Approach to Financial Analysis and Monitoring.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Improving Data Discovery Through Semantic Search
Web Ontology Language for Service (OWL-S)
A Semantic Type System and Propagation
Semantic Interoperability in Digital Library Systems
Chaitali Gupta, Madhusudhan Govindaraju
GGF10 Workflow Workshop Summary
Presentation transcript:

Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher Dept. of Computer Science, UC Davis UC Davis Genome Center ucdavis.edu Shawn Bowers UC Davis Genome Center ucdavis.edu seek.ecoinformatics.orgseek.ecoinformatics.org | kepler-project.org | | dbis.ucdavis.edu | genomics.ucdavis.edukepler-project.orgwww.sdsc.edudbis.ucdavis.edugenomics.ucdavis.edu

Semantic Mediation System, SEEK/Kepler Science Environment for Ecological Knowledge SEEK is an NSF-funded, multidisciplinary research project to facilitate … Access to distributed ecological, environmental, and biodiversity data –Enable data sharing & reuse –Enhance data discovery at global scales Scalable analysis and synthesis –Taxonomic, spatial, temporal, conceptual integration of data, addressing data heterogeneity issues –Enable communication and collaboration for analysis –Enable reuse of analytical components –Support scientific workflow design and modeling

Semantic Mediation System, SEEK/Kepler SEEK data access, analysis, mediation Data Access (EcoGrid) –Distributed data network for environmental, ecological, and systematics data –Interoperate diverse environmental data systems Workflow Tools (Kepler) –Problem-solving environment for scientific data analysis and visualization  “scientific workflows” Semantic Mediation (SMS) –Leverage ontologies for “smart” data/component discovery and integration

Semantic Mediation System, SEEK/Kepler Managing Data Heterogeneity Data comes from heterogeneous sources –Real-world observations –Spatial-temporal contexts –Collection/measurement protocols and procedures –Many representations for the same information (count, area, density) –Data, Syntax, Schema, Semantic heterogeneity Discovery and “synthesis” (integration) performed manually –Discovery often based on intuitive notion of “what is out there” –Synthesis of data is very time consuming, and limits use

Semantic Mediation System, SEEK/Kepler Scientific workflow systems support data analysis KEPLER

Semantic Mediation System, SEEK/Kepler Composite Component (Sub-workflow) Loops often used in SWFs; e.g., in genomics and bioinformatics (collections of data, nested data, statistical regressions,...) A simple Kepler workflow (T. McPhillips)

Semantic Mediation System, SEEK/Kepler Workflow runs PhylipPars iteratively to discover all of the most parsimonious trees. UniqueTrees discards redundant trees in each collection. Lists Nexus files to process (project) Reads text filesParses Nexus format Draws phylogenetic trees PhylipPars infers trees from discrete, multi-state characters. A simple Kepler workflow (T. McPhillips)

Semantic Mediation System, SEEK/Kepler An example workflow run, executed as a Dataflow Process Network A simple Kepler workflow

Semantic Mediation System, SEEK/Kepler SMS motivation Scientific Workflow Life-cycle –Resource Discovery discover relevant datasets discover relevant actors or workflow templates –Workflow Design and Configuration data  actor (data binding) data  data (data integration / merging / interlinking) actor  actor (actor / workflow composition) Challenge: do all this in the presence of … –100’s of workflows and templates –1000’s of actors (e.g. actors for web services, data analytics, …) –10,000’s of datasets –1,000,000’s of data items –… highly complex, heterogeneous data – price to pay for these resources: $$$ (lots) – scientist’s time wasted: priceless!

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery SEEK KR group is developing OWL-DL ontologies: –Various workflow-component ontologies (for categorizing by function, project, scientific discipline, …) –Scientific observation ontology (OBOE), an upper ontology for defining and relating observations, measurements, and units –Domain specific ontologies that extend OBOE (standard and derived units, ecology and biodiversity concepts, …) Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery Annotations “connect” resources to ontologies –Conceptually describe a resource and/or its “data schema” –Annotations provide the means for ontology-based discovery, integration, … Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration

Semantic Mediation System, SEEK/Kepler “Hybrid” types … Semantic + Structural Typing Structural Types: Given a structural type language S –Datasets, inputs, and outputs can be assigned structural types S  S Semantic Types: Given an ontology language O (e.g., OWL-DL) –Datasets, inputs, and outputs can be assigned ontology types O  O S out S O out O O : Observation   obsProperty.SpeciesOccurrence S : SpeciesData(site, day, spp, occ) O : Observation   obsProperty.SpeciesOccurrence S : SpeciesData(site, day, spp, occ) S O S out O out S in O in   Semantically compatible but structurally incompatible A1A1 A1A1 A2A2 A2A2 Semantic & structural types can be combined using logic constraints  := (  site, day, sp, occ ) SpeciesData ( site, day, sp, occ )  (  y ) Observation (y), obsProp ( y, occ ), SpeciesOccurrence ( occ )  := (  site, day, sp, occ ) SpeciesData ( site, day, sp, occ )  (  y ) Observation (y), obsProp ( y, occ ), SpeciesOccurrence ( occ )

Semantic Mediation System, SEEK/Kepler Semantic Type Annotation in Kepler Component input and output port annotation –Each port can be annotated with multiple classes from multiple ontologies –Annotations are stored within the component metadata

Semantic Mediation System, SEEK/Kepler Component Annotation and Indexing Component Annotations –New components can be annotated and indexed into the component library (e.g., specializing generic actors) –Existing components can also be revised, annotated, and indexed (hiding previous versions)

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery Ontology-based “smart” search –Find components by semantic types –Find components by input/output semantic types –Ontology-based query rewriting for discovery/integration Joint work with GEON project (see SSDBM-04, SWDB-04) Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration

Semantic Mediation System, SEEK/Kepler Smart Search Find a component (here: an actor) in different locations (“categories”) … based on the semantic annotation of the component (or its ports) Browse for ComponentsSearch for Component NameSearch for Category / Keyword

Semantic Mediation System, SEEK/Kepler Searching in context Search for components with compatible input/output semantic types –… searches over actor library –… applies subsumption checking on port annotations

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery Workflow validation and analysis –Check that workflows are semantically & structurally well-typed –Infer semantic type annotations of derived data (ie, type inference) An initial approach and prototype based on mapping composition (see QLQP-05) –User-oriented provenance Collect & query data-lineage of WF runs (see IPAW-06) Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration

Semantic Mediation System, SEEK/Kepler Workflow validation in Kepler Navigate errors and warnings within the workflow –Search for and insert “adapters” to fix (structural and semantic) errors … Statically perform semantic and structural type checking

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery Integrating and transforming data –Merge (“smart union”) datasets –Find mappings between data schemas for transformation data binding, component connections (see DILS-04) Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration

Semantic Mediation System, SEEK/Kepler Smart (Data) Integration: Merge Discover data of interest … connect to merge actor … “compute merge” –align attributes via annotations –open dialog for user refinement –store merge mapping in MOML … enjoy! –… your merged dataset –almost, can be much more complicated

Semantic Mediation System, SEEK/Kepler a3a3 a6a6 a1a1 a8a8 a4a4 a1 a3 a4 a b a 0.1 c 0.2 d 0.3 a1 a3 a4 a b a 0.1 c 0.2 d 0.3 Merge Result a1 a2 a3 a4 a 5 10 b 6 11 a1 a2 a3 a4 a 5 10 b 6 11 a5 a6 a7 a8 0.1 a 0.2 c 0.3 d a5 a6 a7 a8 0.1 a 0.2 c 0.3 d Merge a1a8 a3a6 a4 Biomass Site Under the hood of “Smart Merge” … Exploits semantic type annotations and ontology definitions to find mappings between sources Executing the merge actor results in an integrated data product (via “outer union”)

Semantic Mediation System, SEEK/Kepler Approach & SMS capabilities Ontologies Semantic Annotation Iterative Development Iterative Development Resource Discovery Workflow design support –(Semi-) automatically combine resource discovery, integration, and validation –Abstract  Executable WF –… ongoing work! Workflow Validation Resource Integration Resource Integration Workflow Elaboration Workflow Elaboration Automated SWF Refinement

Semantic Mediation System, SEEK/Kepler Summary Outlook: –Ontologies and semantic anotations for WF design & reuse –Put ontologies to actual use in Kepler –Continue to develop Kepler tools for annotation (KR observation ontology), discovery, integration, design, … Issues & Challenges: –Tools/approaches for ontology (OWL) management, organization, reasoning –Open source (distributed) ontology (OWL) storage and reasoning –Tools and techniques for robust ontology versioning, and extension Acknowledgements –Timothy McPhillips, Dave Thau (UC Davis) –Mark Schildhauer, Josh Madin, Matt Jones (UCSB) –Deana Pennington (UNM) –Rich Williams (Microsoft Research) –Ferdinando Villa, Sergey Krivov (UVM)