Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: Semantics Web and Data Integration.

Similar presentations


Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: Semantics Web and Data Integration."— Presentation transcript:

1 1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: Semantics Web and Data Integration

2 2 Outline Semantic Web Overview and Data Integration

3 333 Semantic Web Overview

4 Semantic Web Basics The Semantic Web Cake WWW is evolving toward the vision of the semantic web; machine understandable model Pioneer is Tim Berners-Lee URL: Uniform Resource Locator URI: Uniform Resource Identifier qname (global): qualified name = nameSpace:identifier. Examples: rdf:xxx, rdfs:yyy, owl:zzz 4

5 Semantic Web Basics URI format: : [? ]{# ] foo://example.com:8024/over/there?name=ferret#nose -URN-|-------authority-------|---path------|-----query-----|--fragment |-------hierarchical-part-------------| RDF (Resource Description Framework): a data model to help transforming unstructured data into structured data that can be processed by a machine/computer. The model itself is called triples. All three items can be described in terms of URI and only object can be optionally expressed as literal Triple can be thought of as: Example: http://www.example.org/index.html  subject is index.html http://purl.org/dc/elements/1.1/creator  predicate is creator http://www.example.org/staffid/85740  object is staff identified by staff#http://www.example.org/index.htmlhttp://purl.org/dc/elements/1.1/creatorhttp://www.example.org/staffid/85740 URIRef can come in 2 flavors: URI reference is written completely or as a qualified name, i.e., prefix:name Vocabulary: is a set of URIRefs (resources) that are related to each other in a specific context like Dublin Core vocabulary 5

6 Semantic Web Basics Blank node: is used for n-way relationship. Blank node is named (_:name). Both subject and object can be a blank node RDF can be expressed in XML RDF container (rdf:Bag, rdf:Seq, rdf:Alt): identified resources are members of the container, and does not say that other members exist RDF Collections: is a form of closed container (i.e., all members) RDFS (RDF Schema): it describes group of related resources and the relationship between these resources (~ SQL schema) OWL (Web Ontology Language): Web 3.0 depends on RDF and RDFS, and it can be enhanced by using OWL. OWL adds vocabulary for describing classes and properties 6

7 Semantic Web Basics: Sample RDF expressed in XML 7 XML Syntax for RDF: ]> August 16, 1999 1999-08-16 http://www.w3.org/2001/XMLSchema#http://www.w3.org/1999/02/22-rdf-syntax-ns#http://purl.org/dc/elements/1.1/http://www.example.org/terms/http://www.example.org/index.htmlhttp://www.w3.org/2001/XMLSchema#datehttp://www.example.org/staffid/85740

8 Semantic Web Basics: Sample RDF expressed in XML 8 Overnighter http://www.w3.org/TR/rdf-syntax-grammerhttp://www.exampel.com/terms/Tent

9 Semantic Web Basics OWL provide higher degree of specificity and as a result increase the quality of the information SPARQL is a query language to manipulate RDF data similar to SQL manipulating tables. SPARQL is executed against RDF datasets. SPARQL endpoint accept queries and returns results via HTTP. Endpoint can be generic or specific. SPARQL query result can be returned in a variety of formats: XML, JSON, RDF, HTML. Also results can be serialized in many ways: RDF/XML, N-Triples, Turtle, etc. 9

10 Semantic Web Basics: Representing RDBMS table as Triple 10 Use PK to uniquely identify a row in the table (e.g., product) Use name space (e.g., mfg:) for the RDBMS schema Unique row: mfg:product1 Unique column: mfg:product-model# Each cell in the table is a triple PK id Model # Division SKU 123..123.. SubjectpredicateObject Mfg:product1Mfg:product-ID 1 Mfg:product2Mfg:product-ID 2

11 11 Semantic Web Overview: Data Model - RDF

12 Resource Description Framework (RDF) A framework (not a language) for describing resources Model for data Syntax to allow exchange and use of information stored in various locations The point is to facilitate reading and correct use of information by computers, not necessarily by people Find the official recommendation at http://www.w3.org/RDF/ http://www.w3.org/RDF/ Note the subtle difference between a standard and a recommendation  w3c has no power to enforce compliance  Obeying the rules in the recommendation allows a site to participate in the world wide web cooperative enterprise. 12

13 Identification and description RDF identifies resources with URIs  Often, though not always, the same as a URL  Anything that can have a URI is a RESOURCE RDF describes resources with properties and property values  A property is a resource that has a name Ex. Author, Book, Address, Client, Product  A property value is the value of the Property Ex. “Joanna Santillo,” http://www.someplace.com/, etc. A property value can be another resource, allowing nested descriptions 13

14 Statements Resource, Property, Property Value Aka subject, predicate, object of a statement Predicates are not the same as English language verbs:  Specify a relationship between the subject and the object Statement: "The author of http://www.w3schools.com/RDF is Jan Egil Refsnes". Subject: http://www.w3schools.com/RDF Predicate: author Object: Jan Egil Refsnes Statement: "The homepage of http://www.w3schools.com/RDF is http://www.w3schools.com". Subject: http://www.w3schools.com/RDF Predicate: homepage Object: http://www.w3schools.com 14

15 Binary predicates RDF offers only binary predicates Think of them as P(x,y) where P is the relationship between the objects x and y From the example, X = http://www.w3schools.com/RDF Y = Jan Egil Refsnes P = author http://www.w3schools.com/RDFJan Egil Refsnes author 15

16 Bob Dylan USA Columbia 10.90 1985 Bonnie Tyler UK CBS Records 9.90 1988 … Root element of RDF documents Source of namespace for elements with rdf prefix Source of namespace for elements with cd prefix Description element describes the resource identified by the rdf:about attribute. Cd:country etc are properties of the resource. 16 Binary Predicates

17 RDF Validator Check the correctness of an RDF document: http://www.w3.org/RDF/Validator/ Result shows the subject, predicate and object of each element of the document and a graph of the model Containers:  Groups of things:  unordered list; duplicates allowed  ordered list; duplicates allowed  list of alternatives; one will be selected 17

18 Example Example <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#"> <rdf:Description rdf:about="http://www.recshop.fake/cd/Beatles"> CD Record Tape Exactly one of these formats 18

19 Limiting the Scope Collection - describes a group that contains only the specified members, no others. <rdf:Description rdf:about="http://recshop.fake/cd/Beatles"> 19

20 RDF Schema Extension to RDF to allow definition of application- specific classes and properties  Does not define the classes, properties  Provides a framework to describe such  Classes - similar to OOP Allows instances and subclasses of classes 20

21 <rdf:RDF xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base= "http://www.animals.fake/animals#"> Horse defined as subclass of animal 21 RDF Schema Example

22 <rdf:RDF xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base= "http://www.animals.fake/animals#"> Abbreviated version. Works because an RDFS class is an RDF resource. Use rdfs:Class instead of rdfDescription and drop the rdf:type information 22 RDF Schema Example (2)

23 23 Semantic Web Overview: Ontologies and OWL

24 24 What are we doing today? OWL – but OWL is just a representation. The hard part is what it is we are representing!  What is an ontology?  Why develop an ontology?  Step-By-Step: Developing an ontology  Going deeper: Common problems and solutions  Current research issues in ontology engineering Computers are good at syntax. People aren’t. So we will explore an ontology development tool, Protégé

25 25 What Is An Ontology An ontology is an explicit description of a domain:  Concepts  properties and attributes of concepts  constraints on properties and attributes  Individuals (often, but not always) An ontology defines  a common vocabulary  a shared understanding

26 26 Ontology Examples Taxonomies on the Web  Google Directory Google Directory Catalogs for on-line shopping  Amazon.com product catalog Amazon.com product catalog Domain-specific standard terminology  Unified Medical Language System (UMLS) and MeSHMeSH Broad general Ontologies  Cyc Cyc

27 27 Why Develop an Ontology? To share common understanding of the structure of information  among people  among software agents To enable reuse of domain knowledge  to avoid “re-inventing the wheel”  to introduce standards to allow interoperability

28 28 More Reasons To make domain assumptions explicit  easier to change domain assumptions (consider a genetics knowledge base)  easier to understand and update legacy data To separate domain knowledge from the operational knowledge  re-use domain and operational knowledge separately (e.g., configuration based on constraints)

29 29 Which wine should I serve with seafood today? What wines should I buy next Monday for my reception in Villanova, PA? Is there a market for the products of another small winery in this area? What online source is the best for wines for my party in Texas next fall? Consider Questions Like:

30 30 Wines and Wineries

31 31 Ontology-Development Process General approach : determine scope consider reuse enumerate terms define classes define properties define constraints create instances Usually a highly iterative process

32 32 Ontology Engineering versus Object-Oriented Modeling An ontology Knowledge representation language for authoring Ontologies; reflects the structure of the world OWL describes set of “individuals” and a set of “property assertions” which relate these individuals to each other Ontology consists of a set of axioms which place constraints on set of individuals (called classes) and the types of relationships permitted between them These axioms allow us to infer additional information An ontology (Contd.) actual physical representation is not an issue An OO class structure reflects the structure of the data and code is usually about behavior (methods) describes the physical representation of data (long int, char, etc.)

33 33 Preliminaries - Tools All screenshots in this tutorial are from Protégé-2000, which:  is a graphical ontology-development tool  supports a rich knowledge model  is open-source and freely available (http://protege.stanford.edu) Some other available tools:  Ontolingua and Chimaera  OntoEdit  OilEd

34 34 Define Classes and the Class Hierarchy A class is a concept in the domain  a class of wines  a class of wineries  a class of red wines A class is a collection of elements with similar properties Instances of classes  a glass of California wine you’ll have for lunch consider reuse determine scope define classes define properties define constraints create instances enumerate terms

35 35 Class Inheritance Classes usually constitute a taxonomic hierarchy (a subclass-superclass hierarchy) A class hierarchy is usually an IS-A hierarchy: an instance of a subclass is an instance of a superclass If you think of a class as a set of elements, a subclass is a subset

36 36 Class Inheritance - Example Apple is a subclass of Fruit Every apple is a fruit Red wines is a subclass of Wine Every red wine is a wine Chianti wine is a subclass of Red wine Every Chianti wine is a red wine

37 37 Modes of Development top-down – define the most general concepts first and then specialize them bottom-up – define the most specific concepts and then organize them in more general classes combination – define the more salient concepts first and then generalize and specialize them

38 38 Documentation Classes (and slots) usually have documentation  Describing the class in natural language  Listing domain assumptions relevant to the class definition  Listing synonyms Documenting classes and slots is as important as documenting computer code!

39 39 Define Properties of Classes – Slots Slots in a class definition describe attributes of instances of the class and relations to other instances Each wine will have color, sugar content, producer, etc. consider reuse determine scope define constraints create instances enumerate terms define classes define properties

40 40 Properties (Slots) Types of properties  “intrinsic” properties: flavor and color of wine  “extrinsic” properties: name and price of wine  parts: ingredients in a dish  relations to other objects: producer of wine (winery) Simple and complex properties  simple properties (attributes): contain primitive values (strings, numbers)  complex properties: contain (or point to) other objects (e.g., a winery instance)

41 41 Slot and Class Inheritance A subclass inherits all the slots from the superclass If a wine has a name and flavor, a red wine also has a name and flavor If a class has multiple superclasses, it inherits slots from all of them Port is both a dessert wine and a red wine. It inherits “sugar content: high” from the former and “color:red” from the latter

42 42 Property Constraints Property constraints (facets) describe or limit the set of possible values for a slot The name of a wine is a string The wine producer is an instance of Winery A winery has exactly one location consider reuse determine scope create instances enumerate terms define classes define constraints define properties

43 43 Common Facets Slot cardinality – the number of values a slot has Slot value type – the type of values a slot has Minimum and maximum value – a range of values for a numeric slot Default value – the value a slot has unless explicitly specified otherwise

44 44 Common Facets: Slot Cardinality Cardinality  Cardinality N means that the slot must have N values Minimum cardinality  Minimum cardinality 1 means that the slot must have a value (required)  Minimum cardinality 0 means that the slot value is optional Maximum cardinality  Maximum cardinality 1 means that the slot can have at most one value (single-valued slot)  Maximum cardinality greater than 1 means that the slot can have more than one value (multiple-valued slot)

45 45 Common Facets: Value Type String: a string of characters (“Château Lafite”) Number: an integer or a float (15, 4.5) Boolean: a true/false flag Enumerated type: a list of allowed values (high, medium, low) Complex type: an instance of another class  Specify the class to which the instances belong The Wine class is the value type for the slot “produces” at the Winery class

46 46 Domain and Range of Slot Domain of a slot – the class (or classes) that have the slot  More precisely: class (or classes) instances of which can have the slot Range of a slot – the class (or classes) to which slot values belong

47 47 Create Instances Create an instance of a class  The class becomes a direct type of the instance  Any superclass of the direct type is a type of the instance Assign slot values for the instance frame  Slot values should conform to the facet constraints  Knowledge-acquisition tools often check that consider reuse determine scope create instances enumerate terms define classes define properties define constraints

48 48 Ontology Languages RDF and RDFS DAML+OIL OWL Cyc-L Others OWL has a special status within the semantic web community

49 49 RDF(S) Terminology and Semantics Classes and a class hierarchy  All classes are instances of rdfs:Class  A class hierarchy is defined by rdfs:subClassOf Instances of a class  Defined by rdf:type Properties  Properties are global: A property name in one place is the same as the property name in another (assuming the same namespace)  Properties form a hierarchy, too (rdfs:subPropertyOf)

50 50 Property Constraints in RDF(S) Cardinality constraints  No explicit cardinality constraints  Any property can have multiple values Range of a property  a property can have only one range Domain of a property  a property can have more than one domain (can be attached to more than one class) No default values

51 51 OWL: Classes And a Class Hierarchy Classes:  Each class is an instance of owl:Class Class hierarchy:  Defined by rdfs:subClassOf More ways to specify organization of classes:  Disjointness (owl:disjointWith)  Equivalence (owl:EquivalentClass) Predefined:  owl:Thing, owl:Nothing

52 52 Properties in OWL Two kinds of property:  owl:ObjectProperty: Object properties relate objects to other objects. Example: Goes-well-with.  owl:datatypeProperty: Datatype properties relate objects to datatype values. Example: Has-cost Properties can be equivalent:  owl:equivalentProperty: Example: bottled-by and produced-by

53 53 Some Special Properties in OWL owl:TransitiveProperty  Is more expensive than owl:SymmetricProperty  Comes from vinyard owl:FunctionalProperty  Has UPC code owl:InverseFunctionalProperty  Is UPC code of

54 54 Ontology Languages What is the “right” level of expressiveness? What is the “right” semantics? When does the language make “too many” assumptions?

55 55 Semantic Web and Data Integration

56 Unified Access to Interoperable Information from Multi-Data Sources Environment: Different Ontology is used for a given domain, RDF triples (data model) are used to represent instance of data and metadata, Ontology describing the semantic relationship vocabulary/terms (~schema). SPARQL is the unified access method language. Possible Data Integration Model Content / Email  Integrated Information Representation (IIR) Information Query Interface Content / Web ……. Data Warehouse Query Generator & return result Querying the underlying Data Sources Ontology Mgmt Agent 1 Agent 2 Agent 3 Partition Query into Subqueries Applying the query against IIR Data Source Registration 56

57 Issues Data/Query Mapping Issues:  Foreign key support  Complex joins  RDB constraints representation should be included in the mapping (UNIQUE, NOT NULL, etc.)  Effective Query Rewrite (filter expressions)  … SPARQL Missing Capabilities:  Aggregation support  Group By support  Select expression support  … SQLSemantic Web TriggersRules ConstraintsOWL Table DefinitionRDFS or N-dimensional RDF Graph Relational ModelRDF 57

58 Relevant Semantic Web Work  Protėgė (Stanford): open source tool to build domain models and knowledge-based applications with ontologies.  SquirrelRDF (HPL Bristol – Jena): SPARQL access to RDB, LDAP, and IMAP servers.  D2RQ (University of Berlin – successor to D2R): popular tool for RDB mapping and can be embedded in Java applications to access RDB through Jena and Sesame APIs.  DARQ: Query engine for federated SPARQL queries – extends ARQ ( Andy Seaborne ).  SPIDERS (Semantic P2p data Interchange with Distributed agEnts among netwoRk management Systems), University of Texas @Austin): Telco application to implement Operations Support Systems (OSS).  IBM Integrated Ontology Development Toolkit (IODT): Ontology storage and management for RDFS/OWL including java API to manipulate ontologies.  Asio Tool Suite (BBN): allows SPARQL using domain ontology to span multiple data sources.  Advanced Knowledge Technologies (AKT): text analytics, knowledge management, and ontology-based cross-media annotation.  Knowledge extraction from RDF (Georgia University, Dept of Computer Science): http://iswc2006.semanticweb.org/items/Ramakrishnan2006kx.pdf http://iswc2006.semanticweb.org/items/Ramakrishnan2006kx.pdf  Virtuoso (OpenLink): creates virtual triple store from one or more data source and query by SPARQL. 58

59 59 END


Download ppt "1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: Semantics Web and Data Integration."

Similar presentations


Ads by Google