A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech
Languages you may encounter... XML (eXtensible Markup Language) XML Schema XPath (navigate an XML document) XQuery (query an XML document) XSLT (Extensible Stylesheet Language Transformations) RDF (Resource Description Framework) RDF Schema OWL (Web Ontology Language) SPARQL (Query language for RDF triples) SQL (Structured Query Language – for RDBMS) UML (Unified Modeling Language – conceptual) SKOS (Simple Knowledge Organization System) – glossary
Links to language specs NameSourceDescription RDFW3CResource Description Framework RDFSW3CRDF Schema SKOSW3CSimple Knowledge Organisation Systems SPARQLW3CRDF/OWL Query Language SQLANSI/ISOStructured Query Language UMLOMGUnified Modeling Language OWLW3CWeb Ontology Language XMLW3CExtensible Markup Language XML Schema (XSD)W3CXML Schema XPathW3CXML Path Language XQueryW3CXML Query Language XSLTW3CExtensible Stylesheet Language Transformations
XML General purpose markup language Mechanism for structured data exchange between heterogeneous systems Basically: elements (tags) and attributes Not really for human consumption, although it is easy for us to read and write in small amounts An XML file is often called an instance document
XML Schema Defines the allowed structure of a set of instance documents Defines a set of “types” -- valid chunks of XML Typically the schema is defined up front and applications are written to process valid or schema- conforming instance documents The schema is a way to achieve standardization – like a contract “If you provide a valid document, we’ll provide you with tools that do X, Y, and Z.”
RDF A knowledge representation language Conceptual in nature It really has nothing to do with XML But, there happens to be an XML representation A way to make statements about pretty much anything you want: “The Curator meeting is at GFDL.” “The Curator meeting is Oct ” “Balaji works at GFDL.”
RDF Statements “The Curator meeting is at GFDL.” Curator meeting GFDL hasLocation subjectpredicateobject
RDF Statements “The Curator meeting is Oct ” Curator meeting GFDL “18 Oct 2007” “19 Oct 2007” hasLocation starts ends resource literal
RDF Statements “Balaji works at GFDL.” Curator meeting GFDL “18 Oct 2007” “19 Oct 2007” Balaji hasLocation worksAt starts ends
RDF XML Representation 18 Oct Oct 2007
RDF Schema Define a domain specific data model for RDF Includes classes and properties (along with subclasses and subproperties) Properties are first class (they are not defined as part of a particular class)
RDF Schema ClassesProperties Event MeetingFlight Person hasLocation domain: Event range: Place starts domain: Event range: date Place ends domain: Event range: date worksAt domain: Person range: Place
OWL (Web Ontology Language) Builds on RDF by adding increased expressivity Every OWL file is RDF (but not necessarily the reverse)
RDF vs. OWL Classes Subclasses Properties Subproperties Individuals RDF OWL Property constraints -allValuesFrom -someValuesFrom -hasValue Cardinality constraints on properties -cardinality (exact) -minCardinality -maxCardinality Class definitions -intersection -union -complement -equivalentClass -disjointWith -oneOf (enum) Transitive Properties Symmetric Properties Individuals -sameAs -differentFrom
Things you can NOT say in RDF, but can say in OWL The class TriangularUnstructuredGrid is at the intersection of TriangularGrid and UnstructuredGrid UnstructuredGrid is the complement of StructuredGrid A Dataset is generated by exactly one Model A Model is made up of at least one Component An AtmosphereComponent is a Component with ScienceType equal to “Atmosphere” X subComponent Y, Y subComponent Z X subComponent Z
Things you can NOT say in RDF, but can say in OWL The class Model is equivalent to ConfiguredModel ScienceType is the exact enumeration Atmosphere, Ocean, Ice, and Land ObservationDataset is disjoint from SimulationDataset Dataset123 is the same object as DatasetXYZ
SPARQL A language for querying RDF/OWL triples Example query: PREFIX foaf: SELECT ?x ?name WHERE { ?x foaf:name ?name }
Curator’s Current Strategy Curator data model written in XML Schema Models and Datasets (Resources*) annotated with conforming XML instance documents Portions of XML translated into RDF and exposed by CDP-Curator faceted search This means: Low level details remain in XML instance Higher level concepts pulled out into the RDF Can we confirm this strategy?
Technical Challenges XML to RDF translation Hierarchical, low level graph-based, conceptual Is there a need to go from RDF back to XML? What stays in XML? What goes to RDF? Automation of translation Schema level (e.g., schema evolution) Instance level (e.g., submission of new resource to CDP-Curator)