Download presentation
Presentation is loading. Please wait.
Published byHubert Howard Modified over 9 years ago
1
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer 1 National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara 1 University of California, Davis 2 MacQuarie University 3
2
Ecological studies Ecological studies focus on –Distribution and abundance of organisms –Organism interactions –Population and community processes –Ecosystem processes –Mechanistic understanding of ecosystems Diverse data sources, e.g., –Biodiversity monitoring –Experimental manipulations –Environmental monitoring
3
Synthesis over ecological process Gruner et al. 2008 –Ecology Letters, (2008) 11: 740–755 Meta-analysis of 191 factorial manipulations of nutrients and herbivores Experimenters manipulated –nutrient addition –herbivore removal Effect on producer biomass
4
Synthesis over space Costanza et al. Nature 1997
5
Synthesis over time Jackson et al., Science 2001
6
How did they do it? As a scientist, could you: –Locate the precise data used? –Locate the analytical processes used? Reconstruct them? Today, only a slim chance... –Why?
7
Insufficient sharing Researchers don’t publish their data Researchers don’t publish their analytical code In general, we have no way to verify or reproduce the conclusions in papers
8
Synthesis requires access to global ecological data Single-schema databases do not suffice Loosely-coupled metadata and data collections –No constraints on data schemas Knowledge Network for Biocomplexity (KNB) National Biological Information Infrastructure (NBII) Preserving data for synthesis
9
PhysicalDataFormat Access and Distribution LogicalDataModel MethodsCoverage: Space, Time, Taxa Identity and Discovery Information 22 independent modules open modular extensible Ecological Metadata Language Grass roots metadata Describe what data you have... rather than prescribe what to produce.
10
EML: Selected relationships 1990 19952000 2005 ‘91‘92‘93‘94 ‘96‘97‘98‘99 ‘01‘02‘03‘04 FGDC created ‘06‘07‘08‘09 EML 1.0.0 EML 1.3.0 EML 1.4.x EML 2.0.0 CSDGM 1.0 Michener ’97 paper ESA FLED Report NBIIB DP ISO 19115 Dublin Core OBOE XML 1.0 EML 2.0.1 EML 2.1.0?
11
Logical Model: Attribute structure Describes data tables and their attributes a typical data table with 10 attributes –some metadata are likely apparent, other ambiguous –missing value code is present –definitions need to be explicit, as well as data typing YEAR MONTH DATE SITE TRANSECT SECTION SP_CODE SIZE OBS_CODE NOTES 2001 8 2001-08-22 ABUR 1 0-20 CLIN 5 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 11 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 10 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 14 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 7 06. 2001 8 2001-08-22 ABUR 1 21-40 OPIC 19 06. 2001 8 2001-08-22 ABUR 1 21-40 COTT 5 06. 2001 8 2001-08-22 ABUR 2 0-20 CLIN 5 06. 2001 8 2001-08-22 ABUR 2 21-40 NF 0 06. 2001 8 2001-08-27 AHND 1 0-20 NF 0 03. Species Codes Value bounds Date Format Code definitions
12
Logical Model: unit Dictionary Consistent assignment of measurement units –Quantitative definitions in terms of SI units –‘unitType’ expresses dimensionality time, length, mass, energy are all ‘unitType’s second, meter, gram, pound, joule are all ‘unit’s Mass kilogram gram UnitTypeUnit x1000
14
An EML Record at NCEAS
15
Knowledge Network for Biocomplexity (KNB) PISCO KNB II AND... (26) GCE LTER NCEAS ESA OBFS KNB 1 Building a data preservation network Preserve primary data Rich metadata descriptions Redundant backup via replication Access controlled by contributors
16
KNB 1 KNB II PISCO AND... (26) GCE LTER NCEAS ESA OBFS Knowledge Network for Biocomplexity (KNB) South African Data Network Mozambique Mapungubwe Marakele KrugerSAEON Grahamsto wn Cape Town San Parks Wilderness Cape Town U Addo Karoo Tsitsikama Phalabora Savannah ClusterMarine Cluster
17
South African National Parks Metacat
19
Metacat deployments
20
International LTER Recommendation for producing EML across all ILTER sites Recommendation for producing continental and regional metadata caches –one or more in each ILTER region –initial nodes may use Metacat
21
att1 | attr2 | attr3.... |.... |...... Dynamic Data Retrieval Data Storage Metadata Parser Metadata Parser Data Loader Data Loader DB Results Query SELECT * FROM... CREATE TABLE... Data QueryResults Data Manager Store DataStore Metadata User Client Metadata Catalog
22
Join Query Client Query Request Results Response
23
Importance of semantics So far we’ve dealt only with the logical data model –any semantics in EML in natural language The computer doesn’t really understand: –what is being measured –how measurements relate to one another –how semantics map to logical structure Analysis depends on understanding the semantic contextual relationships among data measurements –e.g., density measured within subplot
24
Semantic annotation Observation Ontology Data set Mapping between data and the ontology via semantic annotation slide from J. Madin Relational data lacks critical semantic information no way for computer to determine that “Ht.” represents a “height” measurement no way for computer to determine if Plot is nested within Site or vice-versa no way for computer to determine if the Temp applies to Site or Plot or Species
25
Scientific Observations An Observation is the Measurement of the Value of a Characteristic of some Entity in a particular Context
26
Provide extension points for loading specialized domain ontologies Goal: semantically describe the structure of scientific observation and measurement as found in a data set Observation ontology (OBOE) Entities represent real- world objects or concepts that can be measured. Observations are made about particular entities. Every measurement has a characteristic, which defines the property of the entity being measured. Observations can provide context for other observations. slide from J. Madin
28
Datasets vs. Observations EML describes “data sets” –collections of related observations with relatively unspecified semantics –mostly natural language descriptions OBOE describes “scientific observations” –semantically-precise descriptions of scientific measurements –allows understanding of relationships among measurements and context of an observation
29
Model correspondences
30
TDWG Observations Task Group An Observation is the Measurement of the Value of a Characteristic of some Entity in a particular Context Create: Community-sanctioned, extensible, and unified ontology model for observational data –Compatible with existing standards –Integrate with metadata standards such as EML, CSDGM, etc. –Reduce the “babel” of scientific dialects
31
Questions? http://www.nceas.ucsb.edu/ecoinformatics/ http://knb.ecoinformatics.org/http://knb.ecoinformatics.org http://seek.ecoinformatics.org/http://seek.ecoinformatics.org http://kepler-project.org/http://kepler-project.org
32
Acknowledgments This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON, RoadNet, EOL, Resurgence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.