Presentation is loading. Please wait.

Presentation is loading. Please wait.

In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) Tetherless World Constellation.

Similar presentations


Presentation on theme: "In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) Tetherless World Constellation."— Presentation transcript:

1 In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

2 Metadata and documentation

3 Not more code!

4 Spectral synthesis components and flow

5 Getting the metadata?

6 6 What I wanted ~ 1994-6 Scientists should be able to access a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available But… data is obtained by multiple means (instruments, models, analysis) using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non- existent) metadata. It may be inconsistent, incomplete, evolving, and distributed. And, it is almost always created in a manner to facilitate its generation not its use. And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

7 What I was doing… pro read_spec, spectra_name, description, auxiliary_info, model_size, mu_size, wave_size, model, smodel, mu, wave0, wavelength, intensity, brightness_temperature, index1, index2, percent ncopts = 0; description_start=0 description_edges=80 i=0 j=0 k=0 ; Construct the DB filename ncid=ncdf_open(string(getenv("SPECTRA "))) inq_struct=ncdf_inquire(ncid) ; /* get dimension info */ tmp_id = ncdf_dimid(ncid, "comment_dim") ncdf_diminq,ncid, tmp_id, dummy, comment_dim tmp_id=ncdf_dimid(ncid, "mu_dim") ncdf_diminq,ncid, tmp_id, dummy, mu_dim tmp_id=ncdf_dimid(ncid, "wave_dim") ncdf_diminq,ncid, tmp_id, dummy, wave_dim tmp_id=ncdf_dimid(ncid, "model_dim") ncdf_diminq,ncid, tmp_id, dummy, model_dim tmp_id=ncdf_dimid(ncid, "smodel_dim") ncdf_diminq,ncid, tmp_id, dummy, smodel_dim tmp_id=ncdf_dimid(ncid, "item_dim") ncdf_diminq,ncid, tmp_id, dummy, item_dim

8 What I was doing… etc. tmp_id = ncdf_varid (ncid, "description") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, description ; Id's for variables tmp_id=ncdf_varid(ncid, "spectra_name") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, spectra_name tmp_id=ncdf_varid(ncid, "auxiliary_info") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, auxiliary_info tmp_id=ncdf_varid(ncid, "model_size") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=item_dim, model_size start=intarr(1) edges=intarr(1) start(0)=0 edges(0)=model_size tmp_id=ncdf_varid(ncid, "mu_size") ncdf_varget,ncid, tmp_id, mu_size, OFFSET=start, COUNT=edges tmp_id=ncdf_varid(ncid, "model") ncdf_varget,ncid, tmp_id, model, OFFSET=start, COUNT=edges start=intarr(2) edges=intarr(2) start(0)=0 edges(0)=smodel_dim start(1)=0 edges(1)=model_size tmp_id=ncdf_varid(ncid, "smodel") ncdf_varget,ncid, tmp_id, smodel, OFFSET=start, COUNT=edges

9 What does It all Mean?

10 Some version of this… 10 DataInformationKnowledge Context Presentation Organization Integration Conversation Creation Gathering Experience ~Metadata?

11 It and Meaning It = things that matter –Context Meaning = duh -> semantics Relations!! Real ones! But it was more than that, though that often comes later… –Syntax (structure/form) –Semantics (meaning) –Pragmatics (use)

12 Metadata-Information- Knowledge Ecosystem 12 MetadataInformationKnowledge Context Formalization Organization Integration Shared Conceptualization Creation Gathering Experience

13 Provenance Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility Provenance: metadata in a given context! Swallow that. Knowledge provenance; meaning and relations in multiple contexts!

14 Perfect is the enemy of the good… (thanks Voltaire)

15 Origins … In 2000-2001 the need for capturing and preserving knowledge in science data became very clear but the barriers were high In 2004 we started a virtual observatory project based on semantic technologies Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations We knew we also needed integration and provenance (but that came later) We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-) Tetherless World Constellation15

16 In 2004 2004 – OWL was a W3 recommendation!! Protégé 2.x and the Protégé-Java-OWL API SWOOP was a viable editor Jena and the Jena API were in good shape Pellet worked SPARQL was still a twinkle in the RDF working group’s eye Semantics were still the realm of computer scientists Tetherless World Constellation16

17 Design and Development We made a conscious decision only to develop ontologies that were required to answer specific use cases and migrate metadata –Both Classes AND Properties (uh-oh…) We made a conscious effort to use whatever ontologies were available (cf. trends in metadata… nuff said) We were pretty sure that rules would be needed (complex logic or late semantic binding) We ignored query (see implementation) Tetherless World Constellation17

18 18 Use Case example Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series. –Meanings and relations Objects=Things! –Neutral temperature is a (temperature is a) parameter –Millstone Hill is a (ground-based observatory is a) observatory –Fabry-Perot is a interferometer is a optical instrument is a instrument –Non-vertical mode is a instrument operating mode –January 2000 is a date-time range –Time is a independent variable/ coordinate –Time series is a data plot is a data product Metadata just appeared everywhere…

19 19 Knowledge representation Statements as triples: {subject-predicate-object} interferometer is-a optical instrument Fabry-Perot is-a interferometer Optical instrument has focal length Optical instrument is-a instrument Instrument has instrument operating mode Instrument has measured parameter Instrument operating mode has measured parameter NeutralTemperature is-a temperature Temperature is-a parameter A query*: select all optical instruments which have operating mode vertical An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature

20 Semantics - Modern informatics enables a new scale-free** framework approach Use cases Stakeholders Distributed authority Access control Ontologies Maintaining Identity

21 21 Developing ontologies (c. 2005) Use cases and small team (7-8; 2-3 domain/ data experts, 2 knowledge experts, 1 software engineer, 1 facilitator, 1 scribe) Identify classes and minimal properties (leverage controlled vocab.) –Start with narrower terms, generalize when needed or possible –Adopt a suitable conceptual decomposition (e.g. SWEET) –Import modules when concepts are orthogonal –Add service classes and properties where needed Review, vet, publish Only code them (in RDF or OWL) when needed (CMAP, …) Ontologies: small and modular

22 Semantics between 2004 and 2009 Ontologies were needed for data integration and provenance and mediation for data mining Protégé 3.x and then 4.0 came out SWOOP development was interrupted Cmap added OWL predicate support* SPARQL became a recommendation Triple stores exploded in use and capability Linked Open Data started to take off Pellet 2.0 came out I used the “M” word less frequently! Tetherless World Constellation22

23 Working with knowledge Expressivity Maintainability/ Extensibility Implementability

24 Working with semantics Query Rule execution Inference

25 Semantics between 2009 and now Semantic data framework (SeSF) Substantial knowledge provenance work Data quality, uncertainty and bias representations and applications (oh, these are in production at NASA) Multi-sensor data synergy advisor Applications: –Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov …. Tetherless World Constellation25

26 Respect and Mediation … how

27 Discovering new data

28 NCA links to GCIS entities 28 http://data.globalchange.gov

29 Information model 29 Ontology

30 Core and Framework Semantics - Multi-tiered interoperability used by

31 Closing thoughts Go ahead, create all the metadata you want, we’ll “materialize” some of it into triples based on semantics for use! Go ahead, create all the schema and encodings you want but remember – semantics now lives in an open-world (some of it). You are not the only source of metadata. Not all formal. Link over map. Semantics make metadata useful but we do not need all of your metadata Tetherless World Constellation31

32 Contact pfox@cs.rpi.edu http://tw.rpi.edu @taswegian


Download ppt "In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) Tetherless World Constellation."

Similar presentations


Ads by Google