TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute x.marshall.ma rpi.edu/~max MarshallXMa
TWC William Smith's 1815 geologic map of England and Wales with part of Scotland William Smith ( ) (Image source: Geological Society of London)
TWC 1874 (Image source: British Geological Survey) 1906 Evolution of the Geological Map of British Islands / UK
TWC Definition of “Quaternary” in several versions of the International Stratigraphic Chart Sorry, no Quaternary…
TWC 5
(Haq, 2007) Distributed datasets: Regional geologic time scales
TWC (Haq, 2007) Distributed datasets: Regional geologic time scales
TWC 8 Distributed datasets: Mismatches of geological units across political boundaries Italy/France near Cuneo/Colmar CambrianCarboniferous (Asch et al., 2012) (Ma et al., 2014) Felsic and hornblendic gneisses Granitic rocks Wyoming/Colorado (Base map courtesy: OneGeology-Europe and USGS)
TWC Data and models, vocabularies, and ontologies –Have we ever had model-independent datasets? Ontology dynamics and a data life cycle 9 CONCEPT *Initial concepts *Questions and answers *Grant info CONCEPT *Initial concepts *Questions and answers *Grant info COLLECTION *Questionnaire *Coded instrument *CAI metadata *Paradata COLLECTION *Questionnaire *Coded instrument *CAI metadata *Paradata PROCESSING *Data specs *Recodes *Summary descriptive info PROCESSING *Data specs *Recodes *Summary descriptive info DISTRIBUTION *Terms of use *Citation *Packaging info DISTRIBUTION *Terms of use *Citation *Packaging info DISCOVERY *Catalog record *Indexing *Related publications DISCOVERY *Catalog record *Indexing *Related publications ANALYSIS *Replication code *Publications ANALYSIS *Replication code *Publications ARCHIVING *Preservation metadata *Confidentiality *Additional processing ARCHIVING *Preservation metadata *Confidentiality *Additional processing REPURPOSING *Post-hoc harmonization *Data transformations REPURPOSING *Post-hoc harmonization *Data transformations Diagram reproduced from (Spencer, 2012)
TWC Ontology dynamics Ontology Mapping Ontology Morphism Ontology Matching Ontology Articulation Ontology Translation Ontology Evolution Ontology Debugging Ontology Versioning Ontology Integration Ontology Merging 10 (Flouris et al., 2008)
TWC Potential challenges Reworking of the extant data in a data center –e.g. caused by ontology/vocabulary versioning Semantic mismatch among data sources –e.g. heterogeneity in ontologies of the same topic Differentiated understanding of a same piece of dataset between data providers and data users –e.g. a data provider understands Quaternary as Ma-present, and a data user understands it as Ma-present Error propagation in cross-discipline data re-use –e.g. heterogeneous datasets may cause misconception in subsequent works 11 (Ma et al., 2014)
TWC OneGeology-Europe 20 European nations providing national geologic maps at scale ~1: 1M Harmonized geological terms and map legends Multilingual labels in 18 languages Central portal for data browsing/query among distributed data sources A contribution to INSPIRE 12 A few recent works of interest
TWC 13 Federated query: Result of geologic units with age ‘Cenozoic - from 66 million years to today’
TWC 14 Earth Resource Form Environmental Impact Value Exploration Activity Type Exploration Result UNFC Value Earth Resource Expression Earth Resource Shape Enduse Potential Mineral Occurrence Type Mining Activity Type Processing Activity Type Mining Waste Type Value Commodity Code Mineral Deposit Group Mineral Deposit Type Product Value Recently finished CGI vocabularies Construct a collection of vocabularies for populating information interchange documents and enabling interoperability Provide labels for concepts, scope to various communities defined by language, science domain, or application domain CGI Geoscience Terminology Workgroup geoscience_terminology_working_group.html
TWC 15 USGS Online Geologic Maps Standardized vocabulary with detailed annotation Forward and backward queries between spatial data and attribute data Links to further data sources, e.g. aeromagnetic survey, mineral resources data, soils, geochemical samples, etc. state/map.html
TWC 16 Records of a point in the San Francisco area
TWC Recommendations Communities of practice on ontology and vocabulary –Bottom-up, self-organized, and loose top-down control Formalize the ‘Concept’ step in a data life cycle –Top-down, and adopt outputs from the bottom-up approach Make it a virtuous circle among the bottom-up and top- down approaches 17 Thanks for