Download presentation
Presentation is loading. Please wait.
Published byKelly Campbell Modified over 9 years ago
1
An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award* Lecture Standing in for Deborah L. McGuinness* July 17, 2013 Peter Fox (RPI) pfox@cs.rpi.edu, @taswegianpfox@cs.rpi.edu Tetherless World Constellation
2
Outline Origins of this effort and the working premise Semantics in 2004 Ontologies and the software and production! Semantics between 2004 and 2009 The design and development methodology The expressivity and implementability balance and one more … 2009-2013 … And, a bit about what we are up to and where it is going… 2Tetherless World Constellation
3
Origins (1) … In 2000-2001 the need for capturing and preserving knowledge in Earth sciences data settings became very clear but the barriers were high In 2004 we started a virtual observatory project based on semantic technologies Tetherless World Constellation3
4
Content: Coupling Energetics and Dynamics of Atmospheric Regions Community data archive for observations and models of Earth's upper atmosphere and geophysical indices and parameters needed to interpret them. Includes browsing capabilities by periods, instruments, models, …
5
Content: Mauna Loa Solar Observatory Near real-time data from Hawaii from a variety of solar instruments. Source for space weather, solar variability, and basic solar physics
6
Working premise 6 Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data and information that: appears to be integrated appears to be locally available is accessible in a suitable vocabulary Then BAM - Data intensive – volume, complexity, mode, scale, heterogeneity, … in an OPEN WORLD
7
Origins (2) … Use case driven – in solar and solar- terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations We knew we also needed integration and provenance (but that came later) We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-) Tetherless World Constellation7
8
8 Early days of VOs … … VO 1 VO 2 VO 3 DB 2 DB 3 DB n DB 1 ?
9
9 The Astronomy approach; data-types as a service … … VO App 1 VO App 2 VO App 3 DB 2 DB 3 DB n DB 1 VOTable Simple Image Access Protocol Simple Spectrum Access Protocol Simple Time Access Protocol VO layer Limited interoperability Lightweight semantics Limited meaning, hard coded Limited extensibility
10
In 2004 Tetherless World Constellation10
11
Design and Development Only develop ontologies that were required to answer specific use cases Use whatever ontologies were available** Would rules be needed? We ignored query* Tetherless World Constellation11
12
12 Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity. Provide semantically-enabled, smart data query services via the Web for the Virtual Ionosphere- Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.
13
13 Use Case example Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series. Objects: –Neutral temperature is a (temperature is a) parameter –Millstone Hill is a (ground-based observatory is a) observatory –Fabry-Perot is a interferometer is a optical instrument is a instrument –Non-vertical mode is a instrument operating mode –January 2000 is a date-time range –Time is a independent variable/ coordinate –Time series is a data plot is a data product
14
14 Knowledge representation Statements as triples: {subject-predicate-object} interferometer is-a optical instrument Fabry-Perot is-a interferometer Optical instrument has focal length Optical instrument is-a instrument Instrument has instrument operating mode Instrument has measured parameter Instrument operating mode has measured parameter NeutralTemperature is-a temperature Temperature is-a parameter A query*: select all optical instruments which have operating mode vertical An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature ISWC paper award 2006, IAAI best paper (2007), Fox et al. 2009 in Computers and Geosciences.
15
Fox - APAC 2007, Driving e- research: Grids and Semantics 15 … … VO Portal Web Serv. VO API DB 2 DB 3 DB n DB 1 Semantic mediation layer - VSTO - low level Semantic mediation layer – mid-upper-level Education, clearinghouses, other services, disciplines, etc. Metadata, schema, data Query, access and use of data Semantic query, hypothesis and inference Semantic interoperability Added value Mediation Layer Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Maps queries to underlying data Generates access requests for metadata, data Allows queries, reasoning, analysis, new hypothesis generation, testing, explanation, etc.
16
Fox - APAC 2007, Driving e- research: Grids and Semantics 16 Partial exposure of Instrument class hierarchy - users seem to LIKE THIS Semantic filtering by domain or instrument hierarchy
17
17
18
18 Inferred plot type and return required axes data
19
19 Semantic Web Services
20
Fox - APAC 2007, Driving e- research: Grids and Semantics 20 Semantic Web Services OWL document returned using VSTO ontology - can be used both syntactically or semantically
21
http://escience.rpi.edu/schemas/vsto_all.owl
22
22 Developing ontologies Use cases and small team Identify classes and minimal properties (leverage controlled vocab.) Review, vet, publish Only code them (in RDF or OWL) when needed (CMAP, …) Ontologies: small and modular
23
Implications and OWL 1.0 Lack of numeric support meant that the the rules and procedural logic were implemented in java, i.e. in the code On several occasions the tools (not to be named) pushed us into OWL-Full, introduced inconsistencies, etc. Finally, they stabilized, and in 2005 (and again in 2006 and twice in 2007) we had stable releases Tetherless World Constellation23
24
Expressivity VSTO 1.0 Tetherless World Constellation24
25
Expressivity VSTO dev. version Tetherless World Constellation25
26
Yikes Tetherless World Constellation26
27
Ontologies and the software Protégé 2.x and then 3.x built from our ontology on the web Java class generation Eclipse as a development environment Leveraged a portal code base (from the Earth System Grid project) Tetherless World Constellation27
28
Implementation choices Our big challenge was time – in use cases and in the representation –Depending on the level of granularity there were > 200,000 day-time records, and > 70,000,000 sub-day time intervals – no triple store could handle this** Reasoning in finite time does not mean 3-4 secs! We descoped our effort to delay use cases such as: find all neutral temperature data around the summer solstice for the last decade We chose a minimal time encoding in the ontology and delegated that to a relational DB Tetherless World Constellation28
29
29
30
Fox - APAC 2007, Driving e- research: Grids and Semantics 30 VSTO - semantics and ontologies in an operational environment: www.vsto.orgwww.vsto.org Web Service
31
31 Semantic Web Benefits (1) Unified/ abstracted query workflow: Parameters, Instruments, Date-Time Decreased input requirements for query: in one case reducing the number of selections from eight to three Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics
32
32 Semantic Web Benefits (2) Semantic query support: only expose coherent query (portal and services) Semantic integration: mediated understanding of coordinate systems, relationships, data synthesis, transformations. returns independent variables and related parameters A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)
33
Evaluation Formative and Summative Tetherless World Constellation33
34
Iteration We needed the ability to evolve the ontology and not break the framework As we broadened re-use of these ontologies and creation of new ones Tetherless World Constellation34
35
Maintenance Support for collaborative feedback, evolution Change management Support for ‘comments’ and ‘annotations’, i.e. self-documentation Package management: creation, dependency, consistency checking Tetherless World Constellation35
36
Semantics between 2004 and 2009 Ontologies were needed for data integration and provenance and mediation for data mining Protégé 3.x and then 4.0 came out SWOOP development was interrupted Cmap added OWL predicate support* SPARQL became a recommendation Triple stores exploded in use and capability Linked Open Data started to take off Pellet 2.0 came out We invaded OWLED 2006, 2007, 2009, (2010) Tetherless World Constellation36
37
Semantics approach Use cases Stakeholders Distributed authority Access control Ontologies Maintaining Identity
38
Working with knowledge Expressivity Maintainability/ Extensibility Implement -ability Query Rule execution Inference
39
Expressivity/ Implementation DeclarativeProcedural Linked open data URI/http/RDF * Ontology encoded
40
Semantics between 2009 and 2013 Semantic data frameworks (SeSF) Substantial knowledge provenance work Data quality, uncertainty and bias representations and applications –Multi-sensor advisor** Applications: –Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov …. Tetherless World Constellation40
41
Anomaly Example: South Pacific Anomaly Anomaly 41 MODIS Level 3 dataday definition leads to artifact in correlation
42
…is caused by an Overpass Time Difference 42
43
RuleSet Development [DiffNEQCT: (?s rdf:type gio:RequestedService), (?s gio:input ?a), (?a rdf:type gio:DataSelection), (?s gio:input ?b), (?b rdf:type gio:DataSelection), (?a gio:sourceDataset ?a.ds), (?b gio:sourceDataset ?b.ds), (?a.ds gio:fromDeployment ?a.dply), (?b.ds gio:fromDeployment ?b.dply), (?a.dply rdf:type gio:SunSynchronousOrbitalDeployment), (?b.dply rdf:type gio:SunSynchronousOrbitalDeployment), (?a.dply gio:hasNominalEquatorialCrossingTime ?a.neqct), (?b.dply gio:hasNominalEquatorialCrossingTime ?b.neqct), notEqual(?a.neqct, ?b.neqct) -> (?s gio:issueAdvisory giodata:DifferentNEQCTAdvisory)] Multi-sensor Data Synergy Advisor (NASA), Leptoukh, Lynnes, Zednik, et al.
44
Advisor Knowledge Base 44 Advisor Rules test for potential anomalies, create association between service metadata and anomaly metadata in Advisor KB
45
Semantic Advisor Architecture RPI Multi-sensor Data Synergy Advisor (NASA), Leptoukh, Lynnes, Zednik, et al.
46
Advisory Report Tabular representation of the semantic equivalence of comparable data source and processing properties Advise of and describe potential data anomalies/bias 46
47
Advisory Report (Dimension Comparison Detail) 47
48
Going forward… No surprise – lots of application –Water quality, environmental conditions –Global change information system; traceable accounts –Deep Carbon Observatory –And … maturing ontologies Discrete and Continuous Additive effects (in Event Calculus) – driven by the desire to explore physical processes in Earth’s atmosphere when a volcano erupts
49
RPI Tetherless World Constellation tw.rpi.edu Themes Lots of RDF Web Infrastructure Scaling and Distributed Query and Reasoning AI, Rule reasoning Visualizing SW ‘data’ Social SW, SMWiki Policy Lang Ontologies/Tools, … Future Web Web Science Policy Social Xinformatics Data Science Semantic eScience Data Frameworks Semantic Foundations Knowledge Provenance Inference Trust Hendler Fox McGuinness + ~ 40 = Post-doc, Staff, Grad, UGrad Government Data Health care/Life Sciences Environmental Informatics Luciano, Erickson
50
Respect vocabularies!
51
Discovering new data
52
Core and Framework Semantics - Multi-tiered interoperability used by
53
High-level architecture Tetherless World Constellation53
54
Summary (thanks to AI) In 2004 we set out to build a prototype and ended up with a production semantic data framework –Languages and tools served us well (standards) Even with modest expressivity we challenged the tools of the time and made many compromises All along the way, we evaluated our ontology developments and implementations to gauge the benefits of semantics Maintainability, esp. modularization drove new expressivity needs Xinformatics and a repeatable methodology is the key (information models) - we continue to need to bridge computer science and application communities (“It’s the language stupid”, i.e. semantics) Tetherless World Constellation54
56
Contact pfox@cs.rpi.edu, dlm@cs.rpi.edupfox@cs.rpi.edudlm@cs.rpi.edu http://tw.rpi.edu @taswegian
57
57 Ontology Spectrum Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
58
58 Semantic Web Layers http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.