Using observational data models to enhance data interoperability for integrative biodiversity and ecological research Mark Schildhauer*, Luis Bermudez,

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Querying Integrated Observation and Measurement data SONet June 8,
Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB.
Dan Bunker TraitNet RCN: Foster the curation, discovery, and sharing of ecological trait data.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
SONet: A Community-Driven Scientific Observations Network to achieve Semantic Interoperability of Environmental and Ecological Data Mark Schildhauer 1,
ODM2: Developing a Community Information Model and Supporting Software to Extend Interoperability of Sensor and Sample Based Earth Observations Jeffery.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
Advancing an Information Model for Environmental Observations Jeffery S. Horsburgh Anthony Aufdenkampe, Richard P. Hooper, Kerstin Lehnert, Kim Schreuders,
SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping.
Mapping between SOS standard specifications and INSPIRE legislation. Relationship between SOS and D2.9 Matthes Rieke, Dr. Albert Remke (m.rieke,
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Patterns and Conventions for Defining OBOE-Compatible Ontologies … Based on OBOE 1.0, June, 2010.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
MPEG-7 Interoperability Use Case. Motivation MPEG-7: set of standardized tools for describing multimedia content at different abstraction levels Implemented.
Opportunities for earth science data interoperability through coordinated semantic development, using a shared model for observations and measurements.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
Growing challenges for biodiversity informatics Utility of observational data models Multiple communities within the earth and biological sciences are.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Subgroup 1 Collect interoperability requirements Define common, unified data model Engage tool & data providers, data consumers Subgroup 2 Identify and.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
Proof of concept study of the Socio-Ecological Research and Observation oNTOlogy (SERONTO) for integrating multiple ecological databases. Introduction.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
WIGOS Data model – standards introduction.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
LTER Science 2050: Challenges, Constraints and Opportunities Bill Michener Professor and DataONE Project Director University of New Mexico 12 September.
Page 1 Drexel University, College of Engineering ACHIEVING SEMANTIC INTEROPERABILITY WITH HYDROLOGIC ONTOLOGIES FOR THE WEB 6 th International Conference.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Trait ontology approach Marie-Angélique LAPORTE NCEAS June 7 th 2010.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Human-Aware Sensor Network Ontology (HASNetO): Semantic Support for Empirical Data Collection Paulo Pinheiro 1, Deborah McGuinness 1, Henrique Santos 1,2.
OBOE v.s. OGC O&M SONet June 8,2010. OBOE Entity Context Characteristic Measurement Observation Standard hasCharacteristic hasMeasurement ofEntity hasContext.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Controlled Vocabulary Working Group Activities
Semantic metadata in the Catalogue Frédéric Houbie.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
U.S. Department of the Interior U.S. Geological Survey WaterML Presentation to FGDC SWG Nate Booth January 30, 2013.
Harmonizing Measurements for Marine Biodiversity Observation Networks
Improving Data Discovery Through Semantic Search
SONet: A Community-Driven Scientific Observations Network to achieve Semantic Interoperability of Environmental and Ecological Data Mark Schildhauer1,
Session 2: Metadata and Catalogues
Measurement Semantics: “MEASEM”
School of Information Studies, Syracuse University, Syracuse, NY, USA
Presentation transcript:

Using observational data models to enhance data interoperability for integrative biodiversity and ecological research Mark Schildhauer*, Luis Bermudez, Shawn Bowers, Phillip C. Dibner, Corinna Gries, Matthew B. Jones, Deborah L. McGuinness, Steve Kelling, Huiping Cao, Ben Leinfelder, Margaret O’Brien, Carl Lagoze, Hilmar Lapp, and Joshua Madin Rauischholzhausen, Germany: meeting on “Data repositories in environmental sciences: concepts, definitions, technical solutions and user requirements” Feb * presenter; see end of presentation for affiliations

Integrative Environmental Research Analyses require a wide range of data –Broad scales: geospatial, temporal, and biological –Diverse topics: abiotic and biotic phenomena Predicting impact of invasive insect species on crop production Documenting effects of climate change on forest composition Large amounts of relevant data… –E.g., over 25,000 data sets are available in the Knowledge Network for Biocomplexity repository (KNB– But researchers struggle to … –Discover relevant datasets for a study –And combine these into an integrated product to analyze Marburg 20112

How to discover and interpret data needed for integrative, synthetic environmental science? metadata and keywords are good start, but not enough: ambiguous, idiosyncratic, hard to parse controlled vocabularies: an improvement, but can do more with today’s technology Ontologies: based on Web standards (W3C)— RDF, SKOS, OWL— Provide inferencing capabilities Establish relationships among terms (subclass relationships, object properties, domain/range constraints) Marburg 2011

Observational data Environmental and earth science data often consists of “observations” Data sets are often stored in tables (e.g., flat files, spreadsheets) Represent collections of associated measurements Highly heterogeneous (format, content, semantics) (cell) Values represents measurements Marburg 20114

Examples of “raw” observational data

Observational Data Models Emerging conceptual models for observations Many earth science communities Motivated by need for intra and inter-disciplinary data discovery and integration Provide high level representations of observations –Based on a standard set of “core concepts” –Entities, their measured properties, units, protocols, etc. –Specific terms and how these are modeled vary Marburg 20116

Several prospective observation models… ProjectDomainObservational data model VSTOAtmospheric sciences Ontologies for interoperability among different meteorological metadata standards and other atmospheric measurements SERONTOSocioecological research Ontology for integrating socio-ecological data OGC’s O&MGeospatialObservations and Measurements standard for enhancing sensor data interoperability SEEK’s OBOEEcologyExtensible Observation Ontology for describing data as observations and measurements PATO’s EQPhenotype/EvolutionUnderlying model for describing phenotypic traits to link with genomic data Marburg 2011

Observational Data Models High degree of similarity across models Potentially enable better data interoperability and uniform access – Domain-neutral “foundational” template –Abstracts away underlying format issues – Domain ontologies help formalize semantics of terms used to describe measurements Marburg

Observational Data Model Implemented as an OWL-DL ontology –Provides basic concepts for describing observations –Specific “extension points” for domain-specific terms Marburg Entity Characteristic Observation Measurement Protocol Standard + precision : decimal + method : anyType 1..1 * * * * * * Value 1..1 * * Context ObservedEntity

Observational Data Model Observations are of entities (e.g., Tree, Plot, …) –An observation can have multiple measurements –Each measurement is taken of the observed entity Marburg Entity Characteristic Observation Measurement Protocol Standard + precision : decimal + method : anyType 1..1 * * * * * * Value 1..1 * * Context ObservedEntity

Observational Data Model A measurement consists of –The characteristic measured (e.g., Height) –The standard used (e.g., unit, coding scheme) –The measurement protocol –The measurement value Marburg Entity Characteristic Observation Measurement Protocol Standard + precision : decimal + method : anyType 1..1 * * * * * * Value 1..1 * * Context ObservedEntity

Observational Data Model Observations can have context –E.g. geographic, temporal, or biotic/abiotic environment in which some measurement was taken –Context is an observation too –Context is transitive Marburg Entity Characteristic Observation Measurement Protocol Standard + precision : decimal + method : anyType 1..1 * * * * * * Value 1..1 * * Context ObservedEntity

Similarities among Observational Data Models FeatureOfInterest ObservationContext ObservedProperty OM_Observation Result carrierOfCharacteristic forProperty relatedContextObservation hasResult OM_Process usesProcedure OGC’s Observations and Measurements (O&M) ofFeature Marburg 2011

(b) Semantic annotation to dataset (a) (a) Dataset Similarities among Observational Data Models Entity Context (other Observation) Characteristic Observation Standard hasCharacteristichasMeasurement ofEntity hasContext usesStandard Protocol usesProtocol Precision hasPrecision ofCharacteristic hasValue SEEK/Semtools Extensible Observation Ontology (OBOE) Measurement Marburg 2011

Seronto basic classes: Similarities among Observational Data Models Marburg 2011

Developing a core model (SONet project) Identify the key observational models in the earth and environmental sciences Are these various observational models easily reconciled and/or harmonized? Are there special capabilities and features enabled by some observational approaches? What services should be developed around these observational models? Marburg 2011

(b) Semantic annotation to dataset (a) Similarities among Observational Data Models Entity FeatureOfInterest Characteristic ObservedProperty Measurement OM_Observation Protocol OM_Process Result Standard Value Precision Context ObservationContext OBOEO&M Marburg 2011

How to use observational data models… Marburg 2011

Linking data values to concepts through observations Observational data models provide a high-level, domain-neutral abstraction of scientific observations and measurements Can link data (or metadata) through observational data model to terms from domain-specific ontologies Context can inter-relate values in a tuple Can provide clarification of semantics of data set as a whole, not just “independent” values Marburg 2011

ObsDB – Observational Data Model Terms drawn from domain-specific ontologies –E.g., for Entities, Characteristics, Standards, Protocols Marburg 2011Figure from O’Brien

SONet/Semtools Semantic Approach Data-> metadata-> annotations-> ontologies Annotations link EML metadata elements to concepts in ontology thru Observation Ontology EML metadata describe data and its structures Marburg 2011

Semantic annotation Marburg Attribute mappings

Morpho -documents ecological data through formal metadata -based on Ecological Metadata Language (EML)-- XML- schema -local and network storage and querying -supports attribute-level descriptions of tabular data -originally developed under NSF-funded KNB project -Free, multi-platform, java-based EML-editing and KNB querying tool -Prospective querying client for DataONE repository Marburg 2011

Semtools Extends Morpho codebase -builds on existing rich metadata corpus (KNB) -semantic annotation of data through metadata -map attribute-level metadata descriptions to observation model -uses core model defined by SONet -access domain ontologies through OBOE -semantic querying ∀ Marburg 2011

Load Domain Ontology Can load custom OBOE-compatible ontology Ontology development work underway: -Santa Barbara Coastal LTER ontology -Plant Trait Ontology (TraitNet, CEFE/CNRS, TRY, etc.) -Others Marburg 2011

Load and Use Multiple Ontologies

Semantic Annotation Apply semantic annotation to data attribute of –“veg_plant_height” -Characteristic (Height) -Entity (Plant) -Standard (Meters) terms from Observation Ontology (OBOE.OWL) terms from Domain Ontology (Plant-trait.OWL) Marburg 2011

Open Data Annotation Frame

Semantic annotation Formal syntax for annotation Can provide “key-like” capabilities Marburg siteplotspphtdbhpH GCE 1 Apiru GCE 1 Bpiru ……………… GCE 9 Aabba Observation “o2” Entity “exp:ExperimentalReplicate” Measurement “m2” Entity “oboe:Name”... Observation “o3” Entity “oboe:Tree” Measurement “m3” Characteristic: “oboe:TaxonType”... Measurement “m4” Characteristic “units:Height” Standard “units:Meter”... Context “o2”... Observation “schema” for Dataset Attribute mappings

Semantic Annotation in Morpho

Semantic Search Enable structured search against annotations to increase precision Enable ontological term expansion to increase recall Precisely define a measured characteristic, the standard used to measure it, and its relation to other observations, via an observational data model Marburg 2011

Query Precision Keyword-based search -“kelp” -3 data sets found Observational semantics-based search -Entity=”kelp” -1 data set found Marburg 2011

Query Expansion Entity=Kelp AND Characteristic=DryMass -1 record -Macrocystis is subclass of Kelp Entity=Kelp AND Characteristic=Mass -2 Records -DryMass is subclass of Mass Marburg 2011

Query by Observation Measurements are from same sample instance –Entity=Kelp –AND –Characteristic=DryMass –AND –Characteristic=WetMass Marburg 2011

Query by Observation

Future Directions -Continue building corpus of semantically-annotated data -Refine “design patterns” for observation-compliant domain ontologies -Align/integrate ontologies at common points -Mass, units -Iterate design for annotation interface -Stronger inferencing: measurement types, transitivity along properties (e.g., partonomy), data “value-based” querying -Semi-automated aggregation, integration Marburg 2011

ObsDB – Query Support Querying observations Simple examples … Tree –Selects all observations of Tree entities Tree[Height] in d1 –Selects d1 observations of trees with height measurements Tree[Height, DBH Meter] –Same as above, but with diameter in meters Marburg

ObsDB – Query Support More examples … Tree[Height > 20 Meter] –Selects observations of trees with height > 20 m –Supports standard SQL comparators … Tree[Height between 12 and 25 Meter] –Same as above, but 12 ≤ height ≤ 25 (Tree[Height Meter], Soil[Acidity pH]) –Selects all observations of trees (with height measures) and soils (with acidity measures) Marburg

ObsDB – Query Support Context examples … Tree[Height] -> Soil[Acidity] –Selects tree and soil observations where soil contextualizes the tree measurement Tree -> Plot -> Site –Context chains (Tree, Plot, and Site observations returned) (Tree, Soil) -> Plot -> Site –Tree and Soil observations contextualized by the same Plot observation (Tree, Soil) -> (Plot, Zone) –Tree, soil contextualized by (same) plot and zone Marburg

Acknowledgements Mark Schildhauer*, Matthew B. Jones, Ben Leinfelder: NCEAS, Santa Barbara CA, USA Luis Bermudez:Open Geospatial Consortium Inc., Wayland MA, USA Shawn Bowers:Gonzaga University, Spokane WA, USA Phillip C. Dibner: OGCii, Berkeley CA, USA Corinna Gries: University of Wisconsin, Madison WI, USA Deborah L. McGuinness: Rensselaer Polytechnic Institute, Troy NY, USA Margaret O’Brien:UCSB, Santa Barbara CA, USA Huiping Cao: New Mexico State University, Las Cruces NM, USA Simon J.D. Cox: Earth Science & Resource Engrg, CSIRO, Bentley WA, AUS Steve Kelling, Carl Lagoze:Cornell University, Ithaca NY, USA Hilmar Lapp: NESCent, Durham NC, USA Joshua Madin: Macquarie University, Sydney NSW, AUS * presenter

Further Acknowledgements * presenter Thanks as well: Marie-Angelique LaPorte CEFE/CNRS- Montpellier Farshid AhrestaniTraitNet/Columbia Daniel BunkerTraitNet, NJIT

* presenter

Marburg