Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

1 Copyright ©2007 Sandpiper Software, Inc. Vocabulary, Ontology & Specification Management at OMG Elisa Kendall Sandpiper Software
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Database Design & ER Diagrams
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
INTRODUCTION TO DATABASE USING MS ACCESS 2013 PART 2 NOVEMBER 4, 2014.
Using observational data models to enhance data interoperability for integrative biodiversity and ecological research Mark Schildhauer*, Luis Bermudez,
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1, Shawn Bowers 2, Josh Madin 3,
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping.
Entity Framework Overview. Entity Framework A set of technologies in ADO.NET that support the development of data-oriented software applications A component.
Patterns and Conventions for Defining OBOE-Compatible Ontologies … Based on OBOE 1.0, June, 2010.
Opportunities for earth science data interoperability through coordinated semantic development, using a shared model for observations and measurements.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Growing challenges for biodiversity informatics Utility of observational data models Multiple communities within the earth and biological sciences are.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
Subgroup 1 Collect interoperability requirements Define common, unified data model Engage tool & data providers, data consumers Subgroup 2 Identify and.
Semantic Web - an introduction By Daniel Wu (danielwujr)
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Databases Illuminated Chapter 3 The Entity Relationship Model.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
OBOE Model Changes SONet Meeting June 7-9, Motivation for Changes Remove redundancy in the model –Mainly in Dimension (characteristics) Make it.
Session 1 Module 1: Introduction to Data Integrity
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Context Observation Measurement Relationship Entity Characteristic Value Standard hasContextRelationship ofEntity hasValue ofCharacteristic usesStandard.
OBOE v.s. OGC O&M SONet June 8,2010. OBOE Entity Context Characteristic Measurement Observation Standard hasCharacteristic hasMeasurement ofEntity hasContext.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Lecture 5 Data Model Design Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Jeffery S. Horsburgh Hydroinformatics Fall 2014
Database Systems: Design, Implementation, and Management Tenth Edition
The Semantic Web By: Maulik Parikh.
Object Management Group Information Management Metamodel
Databases Chapter 16.
Improving Data Discovery Through Semantic Search
Chapter 7: Entity-Relationship Model
Databases and Database Management Systems Chapter 9
Data Models.
Instance Model Structure
OBO Foundry Principles
Data Model.
Annotation Examples (12/18/2009)
Database Design Hacettepe University
logical design for relational database
Measurement Semantics: “MEASEM”
Presentation transcript:

Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010

Nature of scientific data sets Scientific data often in tables Tables consist of rows (records) and columns (attributes) The association of specific columns together (tuple) in a scientific data set is often a non- normalized (materialized) view, with special meaning/use for researcher Individual cells contain values that are measurements of characteristic of some thing

SONet/Semtools Semantic Approach Data-> metadata-> annotations-> ontologies Ontology: formal knowledge representation in OWL-DL – Hierarchical structure of concepts – Relationships can link concepts Annotations link EML metadata elements to concepts in ontology thru Observation Ontology EML metadata describe data and its structures

Linking data values to concepts Extensible Observation Ontology (OBOE) OBOE provides a high-level abstraction of scientific observations and measurements Enables data (or metadata) structures to be linked to domain-specific ontology concepts Can inter-relate values in a tuple Provides clarification of semantics of data set as a whole, not just “independent” values

Concepts of Semantic Search Annotations give metadata attributes semantic meaning w.r.t. an ontology Enable structured search against annotations to increase precision Enable ontological term expansion to increase recall Precisely define a measured characteristic and the standard used to measure it via OBOE

Logical Architecture

Annotations XML schema defines annotation properties Namespaces to identify sources of terms Search performed against annotations not the metadata itself Returns metadata documents that are linked to the annotation Reasoning (term expansion, consistency, etc.) through domain ontology

XML Links

KNB metadata catalog Stores EML (XML) and raw data objects Extended to store Ontologies, domain and OBOE (OWL-DLs serialized in XML) Extended to store Annotations (XML) Jena to facilitate querying ontologies Pellet to reason (consistency of ontologies; class subsumption)

Metacat Implementation

11 Context Observation Measurement Relationship Entity Characteristic Value Standard hasContextRelationship ofEntity hasValue ofCharacteristic usesStandard hasMeasurement hasContext hasContextObservation 0..* * * * * OBOE Conceptual Model (OWL-DL)

Annotation Examples (12/18/2009) AnnotationDataset Materialize Define (view def.) OBOE Model (individuals/triples) OBOE Concepts instantiates uses terms from observation-based representation of Query* * Conceptually, we want to query datasets via annotations

13 Annotation Examples Annotation Syntax observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “diam” to “m2" if diam > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” * Code exists to read/write annotations using this XML format

14 Annotation Examples yrspecsppdbh 20071piru piru abba33.2 observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset Basic idea: go row-by-row through dataset, generating individuals/triples “external” terms should have namespacing prefix URI : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : Year : DateTime 2008 : Obs : Meas : EntN : LocTN. 2 : Meas : TaxN : ITIS Abie. : Meas : DBH : Centim : Tree : Tempral Range : Tree : Tempral Range : Tree : Tempral Range hasContext

15 Annotation Examples yrspecsppdbh 20071piru piru abba33.2 observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset Same Trees!! (both have name = 1) Same Year and year observation!! : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : Year : DateTime 2008 : Obs : Meas : EntN : LocTN. 2 : Meas : TaxN : ITIS Abie. : Meas : DBH : Centim : Tree : Tempral Range : Tree : Tempral Range : Tree : Tempral Range hasContext

16 Annotation Examples yrspecsppdbh 20071piru piru abba33.2 observation "o1” distinct yes entity ”TemporalRange” measurement "m1” key yes characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” key yes characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : Year : DateTime 2007 : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : EntN : LocTN. 1 : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : Year : DateTime 2008 : Obs : Meas : EntN : LocTN. 2 : Meas : TaxN : ITIS Abie. : Meas : DBH : Centim : Tree : Tempral Range : Tree : Tempral Range Every observation has an implicit “distinct” attribute (set to “no”) … and every measurement has an implicit “key” attribute (set to “no”) hasContext

17 Observation measurement keys – Like a primary key constraint – States that observation instances with the same measurement key values are of the same entity instance – Does not imply the same observation instance, unless the observation is declared distinct – All key measurements of an observation together form the primary key Distinct observations – Only applies if at least one key measurement is defined – States that observation instances with the same entity instance are of the same observation instance Annotation Examples

18 Annotation Examples pltsppdbh Apiru35.8 Apiru36.2 Bpiru33.2 observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : EntN : Nominal A : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : EntN : Nominal B : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Tree : Plot hasContext Here we don’t have unique ids for trees But, assume each spp name within a plot uniquely identifies a tree … i.e., at most one tree of a particular type was measured (possibly multiple times) in each plot

19 Annotation Examples pltsppdbh Apiru35.8 Apiru36.2 Bpiru33.2 observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : EntN : Nominal A : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : EntN : Nominal B : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Tree : Plot hasContext The Tree entity instance should depend on the plot it is in!!! (context)

20 Annotation Examples pltsppdbh Apiru35.8 Apiru36.2 Bpiru33.2 observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context identifying yes observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” AnnotationDataset : Obs : Meas : EntN : Nominal A : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Obs : Meas : EntN : Nominal B : Obs : Meas : TaxN : ITIS Picea. : Meas : DBH : Centim : Tree : Plot hasContext Every context relationship has an “identifying” qualifier (set to “no”) Uniqueness within context observation Similar to a weak-entity constraint (ER) : Tree

21 Representing instances … Annotation(AnnotId, Resource) Observation(ObsId, AnnotId, EntId) Measurement(MeasId, ObsId, MeasType, Value) Context(ObsId1, ObsId2, Rel) Relationship(RelId, RelType) Entity(EntId, EntType) This could be queried itself and/or mapped to triples Note that ObsIds are unique across annotations Context.ObsId’s must be for the same annotation Annotation Examples * Simple relational schema for OBOE models (individuals/triples)

22 Developing compatible domain ontologies (design patterns for use with observation ontology) Scalability of materialization algorithm from annotations (data result sets) Testing and developing capabilities motivated by Use Cases (coastal ecosystems and plant traits) SONet and JWG-ODMS continue to meet and discuss Ongoing Activities

Acknowledgements: Shawn Bowers, Huiping Cao, SEEK KR/SMS working group, and all members of SONet and Semtools projects Thanks also to Chad Berkeley and Ben Leinfelder, project software engineers Work supported by National Science Foundation awards , , , , ,