Foundations I: Methodologies, Knowledge Representation

Slides:



Advertisements
Similar presentations
April 23, 2007McGuinness NIST Interoperability Week One Ontology Spectrum Perspective Deborah L. McGuinness Acting Director & Senior Research Scientist.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Ontology Assessment – Proposed Framework and Methodology.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
CS570 Artificial Intelligence Semantic Web & Ontology 2
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Introduction to Databases Transparencies
Web Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson & Bruce McManus Heart + Lung Institute.
Samad Paydar Web Technology Laboratory Computer Engineering Department Ferdowsi University of Mashhad 1389/11/20 An Introduction to the Semantic Web.
Describing Syntax and Semantics
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
1 Developing Ontologies (and more) Peter Fox (NCAR) ESIP Winter Meeting (TIWG) January 9, 2008, Washington, D.C.
1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Joanne Luciano CSCI/ITEC Week 2, September 13, 2010.
1 Class Exercise I: Use Cases Deborah McGuinness and Peter Fox (NCAR) CSCI Week 4 (part II), 2008.
Knowledge representation
Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Ontologies for the Integration of Geospatial Data Michael Lutz Workshop: Semantics and Ontologies for GI Services, 2006 Paper: Lutz et al., Overcoming.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
CSE 219 Computer Science III Program Design Principles.
1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Peter Fox (NCAR) CSCI Week 2, 2008.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Semantic Web - an introduction By Daniel Wu (danielwujr)
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
Semantics and analytics = making the data and the decisions smarter? Digital Antiquity CI Feb 7-8, 2013, Arlington VA Peter Fox (RPI and WHOI)
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
1 Foundations I: Methodologies, Knowledge Representation Deborah McGuinness and Peter Fox CSCI/ITEC Week 2, September 14, 2009.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
IRI Data Library Faceted Search: an example of RDF-based faceted search for climate data Drawing on multiple ontologies to build an application Using inference.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Knowledge Representation Techniques
‘Ontology Management’ Peter Fox (Semantic Web Cluster lead)
The Semantic Web By: Maulik Parikh.
ece 627 intelligent web: ontology and beyond
improve the efficiency, collaborative potential, and
Developing Ontologies (and more)
Presentation transcript:

Foundations I: Methodologies, Knowledge Representation Professor Deborah McGuinness TA-Weijing Chen Other lectures from Professor Peter Fox, Professor Joanne Luciano, grad student Jim McCusker, and possibly others from http://tw.rpi.edu/web/People CSCI 6962 - 01, 86933 , CSCI 4969 - 01, 87927 ITWS 6960 - 01, 87198 , ITWS 4969 - 01, 87928 Week 2, September 12, 2011

Review of reading Assignment 1 Ontologies 101, Semantic Web, e-Science, RDFS, OWL guide Any comments, questions? One pass around room on highlights

Contents Review of methodologies Elements of KR in semantic web context And in e-Science Choices of representation, models Examples of KR Encoding and understanding representations Assignment 1

Semantic Web Methodology and Technology Development Process Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Adopt Technology Approach Leverage Technology Infrastructure Science/Expert Review & Iteration Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Develop model/ ontology Small Team, mixed skills

KR and methodologies Procedural Knowledge: Knowledge is encoded in functions/procedures. This can be viewed as hard coded and less flexible. E.g.: function Person(X) return boolean is if (X = ``Socrates'') or (X = ``Hillary'') then return true else return false; OR function Mortal(X) return boolean is return person(X); Networks: A compromise between declarative and procedural schemes. Knowledge is represented in a labeled, directed graph whose nodes represent concepts and entities, while its arcs represent relationships between these entities and concepts. Adapted from Deepak Kumar (Bryn Mayr) updated 2009

KR and methodologies Frames: Much like a semantic network except each node represents prototypical concepts and/or situations. Each node has several property slots whose values may be specified or inherited. Logic: A way of declaratively representing knowledge. For example: person(Socrates). person(Hillary). forall X [person(X) ---> mortal(X)] DL, FOL, HOL Adapted from Deepak Kumar (Bryn Mayr) updated 2009 6 6

KR and methodologies Decision Trees: Concepts are organized in the form of a tree. Statistical Knowledge: The use of certainty factors, Bayesian Networks, Dempster-Shafer Theory, Fuzzy Logics, ..., etc. Rules: The use of Production Systems to encode condition-action rules (as in expert systems). Adapted from Deepak Kumar (Bryn Mayr)

KR and methodologies Parallel Distributed processing: The use of connectionist models. Subsumption Architectures: Behaviors are encoded (represented) using layers of simple (numeric) finite-state machine elements. Hybrid Schemes: Any representation formalism employing a combination of KR schemes. Adapted from Deepak Kumar (Bryn Mayr) 8 8

Remember, in any knowledge encoding Some of the knowledge is lost when it is placed into any particular representation structure, or may not be reusable (e.g. Frames) So, you may ask something that cannot be answered or inferred Knowledge evolves, i.e. changes Knowledge and understanding is very often context dependent (and discipline, language, and skill-level dependent, and …)

And, if you are used to logic You are working mostly within the world of logic, whereas we are trying to represent knowledge with logic and we are usually dealing with tangible objects, such as trees, clouds, rock, storms, etc. Because of this, we have to be very careful when translating real things into logical symbols - this can, surprisingly, be a difficult challenge. Consider your method of representation (yes, we do want to compute with it) Adapted from Conrad Barski under CC license.

Thus A person who wants to encode knowledge needs to decouple the ambiguities of interpretation from the mathematical certainty of (any form of) logic. The nature of interpretation is critical in formal knowledge representation and is carefully formalized by KR scientists in order to guarantee that no ambiguity exists in the logical structure of the represented knowledge. Adapted from Conrad Barski under CC license.

Representing Knowledge With Objects Take all individuals that we need to keep track of and place them into different buckets based on how similar they are to each other. Each bucket is given a description based on what objects it contains. Since the individuals in a given bucket are at least somewhat similar, we can avoid needing to describe every inconsequential detail about each individual. Instead, properties that are common to all individuals in a bucket can just be assigned to the entire bucket at once. Properties are typically either primitive values (such as numbers or text strings) or may be references to other buckets. Adapted from Conrad Barski under CC license.

Representing Knowledge With Objects Some buckets will be more similar to each other than others and we can arrange the buckets into a hierarchy based on the similarity. If all buckets in a branch in the tree of buckets share a property, the information can be further simplified by assigning the property only to the parent bucket. Other buckets (and individuals) are said to inherit that property. Buckets may have different names: e.g. Classes, Frames, or Nodes BUT, once we move to (e.g.) DL, not all object rules apply, e.g. cannot override properties Multiple inheritance is not always obvious to people Adapted from Conrad Barski under CC license.

Re-enter Semantic Web At its core, the Semantic Web can be thought of as a methodology for linking pieces of structured and unstructured information into commonly-shared description logics ontologies. Adapted from Conrad Barski under CC license.

Semantic Web Layers http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/

Elements of KR in Semantic Web Declarative Knowledge Statements as triples: {subject-predicate-object} interferometer is-a optical instrument Fabry-Perot is-a interferometer Optical instrument has focal length Optical instrument is-a instrument Instrument has instrument operating mode Instrument has measured parameter Instrument operating mode has measured parameter NeutralTemperature is-a temperature Temperature is-a parameter A query: select all optical instruments which have operating mode vertical An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature

Ontology Spectrum Thesauri “narrower term” relation Frames Selected Logical Constraints (disjointness, inverse, …) Frames (properties) Formal is-a Catalog/ ID Informal is-a Formal instance General Logical constraints Terms/ glossary Value Restrs. Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

OWL or RDF or OWL 2 RL? In representing knowledge you will need to balance expressivity with implementability OWL (Lite, DL, Full) 1 or 2 and if OWL 2, then which profile? RDF and RDFS Rules, e.g. SWRL or OWL 2 RL You will need to consider the sources of your knowledge You will need to consider what you want to do with the represented knowledge

The knowledge base Using, Re-using, Re-purposing, Extending, Subsetting Approach: Bottom-up (instance level or vocabularies) Top-down (upper-level or foundational) Mid-level (use case) Coding and testing (understanding) Using tools (some this class, more over the next two classes) Iterating (later) Maintaining and evolving (curation, preservation) (later)

‘Collecting’ the ‘data’ Part of the (meta)data information is present in tools ... but thrown away at output e.g., a business chart can be generated by a tool: it ‘knows’ the structure, the classification, etc. of the chart,but, usually, this information is lost storing it in web data would be easy! Semantic Web-aware tools are around (even if you do not know it...), though more would be good: Photoshop CS stores metadata in RDF in, say, jpg files (using XMP) RSS 1.0 feeds are generated by (almost) all blogging systems (a huge amount of RDF data!) Scraping - different tools, services, etc, come around every day: get RDF data associated with images, for example: service to get RDF from flickr images service to get RDF from XMP XSLT scripts to retrieve microformat data from XHTML files RSS scraping in use in Virtual Observatory projects in Japan scripts to convert spreadsheets to RDF SQL - A huge amount of data in Relational Databases Although tools exist, it is not feasible to convert that data into RDF Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF data is transformed into SQL on-the-fly

More Collecting RDFa (formerly known as RDF/A) extends XHTML by: extending the link and meta to include child elements add metadata to any elements (a bit like the class in microformats, but via dedicated properties) It is very similar to microformats, but with more rigor: it is a general framework (instead of an メagreementモ on the meaning of, say, a class attribute value) terminologies can be mixed more easily GRDDL - Gleaning Resource Descriptions from Dialects of Languages ATOM - XML-based Web content and metadata syndication format (used with RSS) GRDDL = Gleaning Resource Descriptions from Dialects of Languages

Foundational Ontologies Domain independent concepts and relations physical object, process, event,…, participates,… (Usually) Rigorously defined formal logic, philosophical principles, highly structured Examples DOLCE – Descriptive Onotology for Linguistic and Cognitive Engineering SUMO – Suggested Upper Merged Ontology CYC Upper Level Ontology BFO – Basic Formal Ontology GFO – General Formal Ontology (developed by Onto Med) Adapted from Boyan Brodaric

Foundational ontology Foundational Ontologies PURPOSE: help integrate domain ontologies “…and then there was one…” Foundational ontology Geophysics ontology Marine ontology Water ontology Planetary ontology Geology ontology Struc ontology Rock ontology Courtesy: Boyan Brodaric

Foundational ontology Foundational Ontologies PURPOSE: help organize domain ontologies “…a place for everything, and everything in its place…” Foundational ontology shale rock formation lithification Courtesy: Boyan Brodaric

Problem scenario Little work done on linking foundational ontologies with geoscience ontologies Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.: water budgets: groundwater (geology) and surface water (hydro) hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic) health: toxic substances (geochemistry) and people, wildlife many others… Courtesy: Boyan Brodaric

DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering

SUMO - Standard Upper Merged Ontology Physical Object SelfConnectedObject ContinuousObject CorpuscularObject Collection Process Abstract SetClass Relation Proposition Quantity Number PhysicalQuantity Attribute

http://www.ifomis.org/Research/IFOMISReports/IFOMIS%20Report%2005_2003.pdf BFO – Basic Formal Ontology Snap comes from a snapshot at any given time http://www.ifomis.org/Research/IFOMISReports/IFOMIS%20Report%2005_2003.pdf

Span comes from spanning time; sometimes considered a 4D description

Using SNAP/ SPAN

SWEET 2.0 Modular Design Supports easy extension by domain specialists Organized by subject (theoretical to applied) Reorganization of classes, but no significant changes to content Importation is unidirectional Math, Time, Space Basic Science Geoscience Processes Geophysical Phenomena Applications Needs for ontology package design led to SWEET 2.0 importation

SWEET 2.0 Ontologies

Using SWEET Plug-in (import) domain detailed modules Lots of classes, few relations (properties) Version 2.0 is re-usable and extensible

Mix-n-Match The hybrid example: Collect a lot of different ontologies representing different terms, levels of concepts, etc. into a base form: RDF

Mid-Level: Developing ontologies Use cases and small team (7-8; 2-3 domain experts, 2 knowledge experts, 1 software engineer, 1 facilitator, 1 scribe) Identify classes and properties (leverage controlled vocab.) Start with narrower terms, generalize when needed or possible Adopt a suitable conceptual decomposition (e.g. SWEET) Import modules when concepts are orthogonal Review, vet, publish Only code them (in RDF or OWL) when needed (CMAP, …) Ontologies: small and modular

Use Case example Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series. Objects: Neutral temperature is a (temperature is a) parameter Millstone Hill is a (ground-based observatory is a) observatory Fabry-Perot is a interferometer is a optical instrument is a instrument Non-vertical mode is a instrument operating mode January 2000 is a date-time range Time is a independent variable/ coordinate Time series is a data plot is a data product

Class and property example Parameter Has coordinates (independent variables) Observatory Operates instruments Instrument Has operating mode Instrument operating mode Has measured parameters Date-time interval Data product

Higher level use case Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity

Extending the KR for a purpose GeoMagneticActivity has ProxyRepresentation GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere) Kp is a GeophysicalIndex hasTemporalDomain: “daily” hasHighThreshold: xsd_number = 8 Date/time when KP => 8 Specification needed for query to CEDARWEB Instrument Parameter(s) Operating Mode Observatory Date/time Return-type: data Input Physical properties: State of neutral atmosphere Spatial: Above 100km Toward arctic circle (above 45N) Conditions: High geomagnetic activity Action: Return Data

Translating the Use-Case - ctd. NeutralAtmosphere is a subRealm of TerrestrialAtmosphere hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc. hasSpatialDomain: [0,360],[0,180],[100,150] hasTemporalDomain: NeutralTemperature is a Temperature (which) is a Parameter Translating the Use-Case - ctd. Specification needed for query to CEDARWEB Instrument Parameter(s) Operating Mode Observatory Date/time Return-type: data Input Physical properties: State of neutral atmosphere Spatial: Above 100km Toward arctic circle (above 45N) Conditions: High geomagnetic activity Action: Return Data FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument hasFilterCentralWavelength: Wavelength hasLowerBoundFormationHeight: Height ArcticCircle is a GeographicRegion hasLatitudeBoundary: hasLatitudeUpperBoundary: GeoMagneticActivity has ProxyRepresentation GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere) Kp is a GeophysicalIndex hasTemporalDomain: “daily” hasHighThreshold: xsd_number = 8 Date/time when KP => 8

Knowledge representation - visual UML – Universal Modeling Language Ontology Definition Metamodel/Meta Object Facility (OMG) for UML Provides standardized notation CMAP Ontology Editor (concept mapping tool from IHMC - http://cmap.ihmc.us/coe ) Drag/drop visual development of classes, subclass (is-a) and property relationship Read and writes OWL Formal convention (OWL/RDF tags, etc.) White board, text file

Representing processes From Volcano-Atmosphere use case

Is OWL/RDF the only option? No… SKOS - Simple Knowledge Organization Scheme for Taxonomies http://www.w3.org/2004/02/skos/ Annotations (RDFa) – for un- or semi-structured information sources http://www.w3.org/TR/xhtml-rdfa-primer/ http://rdfa.info Atom (and RSS) – for representing syndication feeds – structured http://tools.ietf.org/html/rfc4287 More expressive languages IKL, CL, … Languages aimed at different paradigms – e.g., rule languages

Query Querying knowledge representations in OWL and/or RDF SPARQL for RDF http://www.sparql.org/ and http://www.w3.org/TR/rdf-sparql-query/ OWL-QL (for OWL) http://projects.semwebcentral.org/projects/owl-ql/ XQUERY (for XML) SeRQL (for SeSAME) RDFQuery (RDF) Few as yet for natural language representations JSON = JavaScript Object Notation SeRQL is for SeSAME

Best practices (some) Ontologies/ vocabularies must be shared and reused - swoogle.umbc.edu, bioportal, OOR Examine ‘core vocabularies’ to start with SKOS Core: about knowledge systems Dublin Core: about information resources, digital libraries, with extensions for rights, permissions, digital right management FOAF: about people and their organizations SIOC: about communities DOAP: on the descriptions of software projects DOLCE seems the most promising to match science ontologies Go “Lite” as much as possible, then increasing logic - balancing expressibility vs. implementability Minimal properties to start, add only when needed

Summary The science of knowledge representation has, throughout its history, consisted of a compromise between pragmatism, scientific rigor, and accessibility to domain experts Many different options for ontology development and encoding, i.e. knowledge representation Sometimes, your choice of representation may need to change based on language and tools availability/ capability… Balancing expressivity and implementability means we favor an object-type, e.g. DL representation (but also suggests the need for a meta-representation: e.g. KIF – Knowledge Interchange Format) Next class (3) – ontology engineering Use cases should drive the functional requirements of both your ontology and how you will ‘build’ one (see class 4)

Upcoming Logistics Next week – Jim McCusker on ontologies. He will do some hands on workshop walking you through building an ontology Following week – Peter Fox on use cases. He will introduce the format and also give examples. http://tw.rpi.edu/web/Courses/SemanticeScience/2011

Assignment for Week 2 Assignment 1: Reading: Semantic Web for the Working Ontologist Alternate reading: Pizza Tutorial Assignment 1: Representing Knowledge and Understanding Representations HW1: http://tw.rpi.edu/media/latest/SeS2011_HW.pdf HW2: http://tw.rpi.edu/media/latest/SeS2011_HW2.pdf

Extras

Selected Technical Benefits Integrating Multiple Data Sources Semantic Drill Down / Focused Perusal Statements about Statements Inference Translation Smart (Focused) Search Smarter Search … Configuration Proof and Trust Updated material reused from “The Substance of the Web”. McGuinness and Dean. Semantic Web Applications for National Security. May, 2005. http://www.schafertmd.com/swans/agenda.html

1: Integrating Multiple Data Sources The Semantic Web lets us merge statements from different sources The RDF Graph Model allows programs to use data uniformly regardless of the source Figuring out where to find such data is a motivator for Semantic Web Services #Ionosphere hasCoordinates #magnetic name hasLowerBoundaryValue “100” “Terrestrial Ionosphere” NOTE _ NOTES NEED TO BE UPDATED TO NEW CONTENT: In this example about country information, the different line colors represent different data sources merged because of the use of common URIs. Key difference improvement with rdf over pure xml ***not sure what coordinates magnetic means – talk here *** I am not sure I would put 100km since that appears to be a string but it is really a number with units. It would be simpler to either have a straight number or for simplicity (particulary on example 1), not to do something with a number hasLowerBoundaryUnit “km” Different line & text colors represent different data sources

2: Drill Down /Focused Perusal The Semantic Web uses Uniform Resource Identifiers (URIs) to name things These can typically be resolved to get more information about the resource This essentially creates a web of data analogous to the web of text created by the World Wide Web Ontologies are represented using the same structure as content We can resolve class and property URIs to learn about the ontology …#NeutralTemperature …#Norway Internet measuredby locatedIn ...#ISR For example, we can resolve the URI of the organization that someone worksFor to learn more about it. Notice also the link to the university, which is a type of organization in the class in the ontology. *** I could use acronyms expanded on the notes page ...#FPI type operatedby …#EISCAT ...#MilllstoneHill

3: Statements about Statements The Semantic Web allows us to make statements about statements Timestamps Provenance / Lineage Authoritativeness / Probability / Uncertainty Security classification … This is an unsung virtue of the Semantic Web #Danny’s #Aurora hasSource hasDateTime hascolor 20031031 Red The example is taken from the CIA World Factbook, where many values have an associated timestamp, uncertainty, or notes. The statement itself becomes the subject of these other statements. Ontologies Workshop, APL May 26, 2006

4: Inference The formal foundations of the Semantic Web allow us to infer additional (implicit) statements that are not explicitly made Unambiguous semantics allow question answerers to infer that objects are the same, objects are related, objects have certain restrictions, … SWRL allows us to make additional inferences beyond those provided by the ontology #Millstone Hill #Interferometer OperatesInstrument hasInstrument isOperatedBy Measures hasOperatingMode hasTypeofData hasMeaasuredData #VerticalMeans Example: given the 2 statements in black and an ontology about families, we can infer the 5 additional statements in red. hasUncle requires the use of the Semantic Web Rule Language (SWRL) ***we need to talk here. Does this really say that verticalMeans hasOperatingMode which is interferometer we are asserting black and inferrin red. This also says that verticalMeans measure interferometer and hasTypeOfData millstone hill. It looks like arrows are turned around.

5: Translation While encouraging sharing, the Semantic Web allows multiple URIs to refer to the same thing There are multiple levels of mapping Classes Properties Instances Ontologies OWL supports equivalence and specialization; SWRL allows more complex mappings #precipitation name ont1:EduLevel ont1:Precipitation VO:Scientist #precipitation In this example, we translate the classes properties and instances used from one ontology to another. Fips - http://www.census.gov/geo/www/fips/fips.html Federal Information Processing Standards Iso – international standards organization http://www.studentsoftheworld.info/country_information.php?Pays=GBR name ont2:EduLevel ont2:Rain EduVO:K-12

6: Smart (Focused) Search The Semantic Web associates 1 or more classes with each object We can use ontologies to enhance search by: Query expansion Sense disambiguation Type with restrictions …. ***I cold use the english description for this slide

7: Smarter Search / Configuration

GEONGRID Ontology Search and Data Integration Example Uses emerging web standards to enable smart web applications Given an upper-level domain choice Ecology Illustrate or list contained concepts/hierarchy VegetationCover, TreeRings, etc. Retrieve some specific options from web Maps, tree-ring data, Info: https://portal.geongrid.org:8443/gridsphere/gridsphere I need notes on slide 6 on

8: Proof The logical foundations of the Semantic Web allow us to construct proofs that can be used to improve transparency, understanding, and trust Proof and Trust are on-going research areas for the Semantic Web: e.g., See PML and Inference Web hasCalibration #FlatField #Critical Dataset hasPeerReview #Solar Physics Paper For example, Bob is allowed to access the W3C member web site if he can prove that he is an employee of a member company. *** what are we trying to prove? That the critical dataset has been peer reviewed? This picture says flatfield has calibration that is critical dataset which has peer review that is solar physics paper. “Critical Dataset has been calibrated with a flat field program that is published In the peer reviewed literature.”

Inference Web Framework for explaining reasoning tasks by storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by multiple distributed reasoners. OWL-based Proof Markup Language (PML) specification as an interlingua for proof interchange IWExplainer for generating and presenting interactive explanations from PML proofs providing multiple dialogues and abstraction options IWBrowser for displaying (distributed) PML proofs IWBase distributed repository of proof-related meta-data such as inference engines/rules/languages/sources Integrated with theorem provers, text analyzers, web services, … http://iw.rpi.edu

Semantic Discovery Service Inference Web Infrastructure (McGuinness, et.al., 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html ) Files/WWW Toolkit Proof Markup Language (PML) CWM (NSF TAMI) JTP (DAML/NIMD) SPARK (DARPA CALO) UIMA (DTO NIMD Exp Aggregation) IW Explainer/ Abstractor IWBase IWBrowser IWSearch Trust Justification Provenance N3 KIF SPARK-L Text Analytics IWTrust provenance registration search engine based publishing Expert friendly Visualization End-user friendly visualization Trust computation Semantic Discovery Service (DAML/SNRC) OWL-S/BPEL Framework for explaining question answering tasks by abstracting, storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by question answerers. Pacific Northwest Division

An abstracted explanation SW Questions & Answers Users can explore extracted entities and relationships, create new hypothesis, ask questions, browse answers and get explanations for answers. A context for explaining the answer A question An answer An abstracted explanation (this graphical interface done by Batelle supported by Stanford KSL)

Summary Semantics are a very key ingredient for progress in informatics and escience A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production This is what we will be teaching you in this class

DOLCE + SWEET Benefits Issues full coverage rich relations home for orphans single superclasses DOLCE = SWEET < SWEET Physical-body BodyofGround, BodyofWater,… Material-Artifact Infrastructure, Dam, Product,… Physical-Object LivingThing, MarineAnimal Amount-of-Matter Substance Activity HumanActivity Physical-Phenomenon Phenomena Process State StateOfMatter Quality Quantity, Moisture,… Physical-Region Basalt,… Temporal-Region Ordovician,… Issues individuals (e.g. Planet Earth) roles (contaminant) features (SeaFloor) Courtesy: Boyan Brodaric

Conclusions Surprisingly good fit amongst ontologies so far: no show-stopper conflicts, a few difficult conflicts DOLCE richness benefits geoscience ontologies good conceptual foundation helps clear some existing problems Unresolved issues in modeling science entities modeling classifications, interpretations, theories, models,… Same procedure with GeoSciML Courtesy: Boyan Brodaric

Blumenthal NC basic attributes CF attributes IRIDL attributes/objects CF data objects CF Standard Names (RDF object) SWEET Ontologies (OWL) Location CF Standard Names As Terms IRIDL Terms SWEET as Terms Search Terms Gazetteer Terms Blumenthal

IRI RDF Architecture Blumenthal MMI Data Servers Ontologies JPL bibliography Start Point Standards Organizations RDF Crawler Location Canonicalizer RDFS Semantics Owl Semantics SWRL Rules SeRQL CONSTRUCT Time Canonicalizer Sesame Search Queries Blumenthal Search Interface

CLCE - Common Logic Controlled English CLCE: If a set x is the set of (a cat, a dog, and an elephant), then the cat is an element of x, the dog is an element of x, and the elephant is an element of x. PC:~(∃x:Set)(∃x1:Cat)(∃x2:Dog)(∃x3:Elephant)(Set(x,x1,x2,x3) ∧ ~(x1∈x ∧ x2∈x ∧ x3∈x))

Use Case Provide a decision support capability for an analyst to determine an individual’s susceptibility to avian flu without having to be precise in terminology (-nyms)

Building SKOS ThManager Protégé (4) plugin for SKOS

Is OWL the only option II? No… Natural Language (NL) Read results from a web search and transform to a usable form Find/filter out inconsistencies, concepts/relations that cannot be represented Popular options CLCE (common logic controlled english) Rabbit, e.g. ShellfishCourse is a Meal Course that (if has drink) always has drink Potable Liquid that has Full body and which either has Moderate or Strong flavour PENG (processable English) Really need PSCI - process-able science but that’s another story (research project)

Sydney syntax If X has Y as a father then Y is the only father of X. The class person is equivalent to male or female, and male and female are mutually exclusive. equivalent to The classes male and female are mutually exclusive. The class person is fully defined as anything that is a male or a female.

PENG - Processible English If X is a research programmer then X is a programmer. Bill Smith is a research programmer who works at the CLT. Who is a programmer and works at the CLT?

Rules (aka ‘Logic’) OWL is based on Description Logic OWL DL follows it precisely There are things that DL cannot express (though there are things that are difficult to express with rules and easy in DL...) A well known examples is Horn rules (eg, the ‘uncle’ relationship): (P1 ∧ P2 ∧ ...) → C e.g.: parent(?x,?y) ∧ brother(?y,?z) ⇒ uncle(?x,?z) Or, for any X, Y and Z: if Y is a parent of X, and Z is a brother of Y then Z is the uncle of X

Examples from http://www.w3.org/Submission/SWRL/ A simple use of these rules would be to assert that the combination of the hasParent and hasBrother properties implies the hasUncle property. Informally, this rule could be written as: hasParent(?x1,?x2) ∧ hasBrother(?x2,?x3) ⇒ hasUncle(?x1,?x3) In the abstract syntax the rule would be written like: Implies(Antecedent(hasParent(I-variable(x1) I-variable(x2)) hasBrother(I-variable(x2) I-variable(x3)))Consequent(hasUncle(I-variable(x1) I-variable(x3)))) From this rule, if John has Mary as a parent and Mary has Bill as a brother then John has Bill as an uncle.

Examples An even simpler rule would be to assert that Students are Persons, as in Student(?x1) ⇒ Person(?x1).Implies(Antecedent(Student(I-variable(x1)))Consequent(Person(I-variable(x1)))) However, this kind of use for rules in OWL just duplicates the OWL subclass facility. It is logically equivalent to write instead Class(Student partial Person) or SubClassOf(Student Person) which would make the information directly available to an OWL reasoner.

Semantic Web with Rules Metalog RuleML SWRL RIF OWL 2 RL WRL Cwm Jess - rules engine

Developing a service ontology Use case: find and display in the same projection, sea surface temperature and land surface temperature from a global climate model. Find and display in the same projection, sea surface temperature and land surface temperature from a global climate model. Classes/ concepts: Temperature Surface (sea/ land) Model Climate Global Projection Display …

Service ontology Climate model is a model Model has domain Climate Model has component representation Land surface is-a component representation Ocean is-a component representation Sea surface is part of ocean Model has spatial representation (and temporal) Spatial representation has dimensions Latitude-longitude is a horizontal spatial representation Displaced pole is a horizontal spatial representation Ocean model has displaced pole representation Land surface model has latitude-longitude representation Lambert conformal is a geographic spatial representation Reprojection is a transform between spatial representation ….

Service ontology A sea surface model has grid representation displaced pole and land surface model has grid representation latitude-longitude and both must be transformed to Lambert conformal for display