Download presentation
Presentation is loading. Please wait.
Published byLawrence Hornbrook Modified over 10 years ago
1
TU/e technische universiteit eindhoven Web Data and Metadata Geert-Jan Houben
2
TU/e technische universiteit eindhoven Contents Evolution in Web data Techniques and Languages for Web data: –XML –XML Querying: XQuery –RDF (& RQL) –OWL Note: here the context, not the details!
3
TU/e technische universiteit eindhoven Evolution
4
TU/e technische universiteit eindhoven Future of the Web 1.common syntax: XML HTML: a fixed set of tags complicates the identification of information elements XML allows to define data structures: Tags with freely chosen names –No predefined tags enables definition, transmission, validation and interpretation of data between applications (and organizations) Freely chosen attributes Simple definition: DTD Extended definition: XML-Schema
5
TU/e technische universiteit eindhoven Bob Quilt Peter Quilt XML-GL Quilt Karin Alice
6
TU/e technische universiteit eindhoven //person/name[../know-how="Quilt"] $union$ //seminar[topic="Quilt"]/participant/name
7
TU/e technische universiteit eindhoven Future of the Web 2.Specification of meaning: RDF Resource: denotes an information item, e.g. via a URL Property type: name of a property of a resource Value: value for that property Example: Resource = URL of web page Property type = “author” Value =“John Smith”
8
TU/e technische universiteit eindhoven John Smith smith@home.net Home, Inc.
9
TU/e technische universiteit eindhoven Future of the Web 3.Meaning: ontologies Ontology = a vocabulary with associated meaning Possibility to define synonyms, specializations and other relationships Use of same ontology = contract on meaning of words (tags, attributes) Often, industry or domain dependent
10
TU/e technische universiteit eindhoven Future of the Web 4.Logic to derive conclusions Necessary in electronic commerce: What do messages mean exchanged between supplier and customer? 5.Goal: trust in the meaning of communication between Web systems, and hence the possibility to automate using agents Ref: www.w3.org
11
TU/e technische universiteit eindhoven Web Data Integration WIS repository (back-end) typically assembled from different heterogeneous sources, e.g. databases, files, WWW To manage (coordinate) data from different sources, metadata helps to structure the data
12
TU/e technische universiteit eindhoven Metadata Describing the data and its availability Sometimes provided by sources Needed by IS Engineering metadata: –Meaning –Validity –Quality Specifying “logistics” of data
13
TU/e technische universiteit eindhoven XML Semistructured data
14
TU/e technische universiteit eindhoven XML: Complex data Structure is irregular (missing/extra data) Schema does not exist or is unknown Schema is rapidly evolving Relational and ODB models are too rigid Standard is a document/hypertext language HTML Solution: semistructured data model XML –data model consists of a type definition language, a query/update language and more
15
TU/e technische universiteit eindhoven XML Environment Follow-up of SGML, markup language for documents, and OO databases XML eXtensible Mark-up Language –W3C and most industrial companies [B2B] –Main idea: separate content and presentation –Use tags to represent structure and semantics Ref: www-rocq.inria.fr/~abitebou/pub/lics01.ppt
16
TU/e technische universiteit eindhoven HTML = Hypertext Language Ref Name Price X23 Camera 359.99 R2D2 Robot 19350.00 Z25 PC 1299.99 Information System HTML The X23 new camera replaces the X22. It comes equipped with a flash (worth by itself 53.99 $ ) and provides great quality for only 359.99 $. Text + presentation Where is the data ? hard
17
TU/e technische universiteit eindhoven XML = Semistructured Data Ref Name Price X23 Camera 359.99 R2D2 Robot 19350.00 Z25 PC 1299.99... Information System camera 359.99 … Robot 19350 …... XML Data + Structure Semistructured: more flexible easy
18
TU/e technische universiteit eindhoven XML Flexibility no fixed set of tags no fixed interpretation/rendering of tags no fixed structure
19
TU/e technische universiteit eindhoven Alice Smith 123 Maple Street Mill Valley CA 90952 Robert Smith 8 Oak Avenue Old Town PA 95819 Hurry, my lawn is going wild! Lawnmower 1 148.95 Confirm this is electric Baby Monitor 1 39.98 1999-05-21
20
TU/e technische universiteit eindhoven XML Documents elements and attributes elements are ordered attribute values are strings well-formed documents (e.g. proper nesting) namespaces: vocabularies for tags valid documents: DTD, Schema
21
TU/e technische universiteit eindhoven DTD: a grammar Catalog Product* Product Name Price? Cat (Part Quantity)* Part BasicPart + ComposedPart BasicPart Name ComposedPart Name (Part Quantity)*
22
TU/e technische universiteit eindhoven XML Schema to define a class of documents: conforming to a schema in XML syntax built-in types
23
TU/e technische universiteit eindhoven Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved....
24
TU/e technische universiteit eindhoven...
25
TU/e technische universiteit eindhoven Typing XML Not really, the true spirit of the Web, but essential for data management: query optimization, user interfaces, applications Differences with standard database typing –Collections are sequences instead of sets –Types may be very large (e.g., from integration) –Data is more irregular so types should be more permissive –New issues sometimes: you have the data, extract its type: an approximate type
26
TU/e technische universiteit eindhoven More on XML The Database Models course in BIS, given by De Bra and Paredaens, will pay much more attention to the XML data model. Also, look at the W3C site: w3c.org
27
TU/e technische universiteit eindhoven XML Querying XQuery
28
TU/e technische universiteit eindhoven XML query language XML is used for data exchange on the Web W3C develops standard: XML Query Working Group XML Query Data Model XPath and XQuery Ref: www.w3.org/XML/Query
29
TU/e technische universiteit eindhoven XPath Path expressions in OO databases /Students/Student/Status Semistructured: –missing parts /Students//Status –conditions /Students/Student[Status=“U4”] Indexing, wildcards Selection, string manipulation, aggregation, attribute existence, union
30
TU/e technische universiteit eindhoven XSLT XSL: XML Stylesheet Language –(XSLT: XSL Transformations) declarative language for transforming XML documents using an XSLT processor
31
TU/e technische universiteit eindhoven XQuery http://www.w3.org/XML/Query “the” standard for XML querying Goal WG: “data model for XML documents, a set of query operators on that data model, and a query language based on these query operators” General query language (next to XPath + XSLT)
32
TU/e technische universiteit eindhoven XQuery Path Expressions Based on XPath In the second chapter of the document named “zoo.xml”, find the figure(s) with caption “Tree Frogs”. document(“zoo.xml”)/chapter[2]// figure[caption=“Tree Frogs”] Find captions of figures that are referenced by elements in the chapter of “zoo.xml” with title “Frogs”. document(“zoo.xml”)/chapter[title=“Frogs”]// figref/@refid->fig/caption
33
TU/e technische universiteit eindhoven XQuery Element Constructor Generate an element that has an “empid” attribute. The value of the attribute and the content of the subelements are specified by variables that are bound in other parts of the query. {$name} {$job}
34
TU/e technische universiteit eindhoven XQuery FLWR Expression FOR var IN exprbinding-clause LET var := exprbinding-clause WHERE exprselect-predicate RETURN exproutput-generation List the titles of books published by Morgan Kaufmann in 1998. FOR $b IN document(“bib.xml”)//book WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998” RETURN $b/title
35
TU/e technische universiteit eindhoven FLWR Expression List each publisher and the average price of its books. FOR $p IN distinct(document(“bib.xml”)//publisher) LET $a := avg(document(“bib.xml”)/book[publisher=$p]/price) RETURN {$p/text()} {$a}
36
TU/e technische universiteit eindhoven Operators and Functions Find the maximum depth of the document named “partlist.xml”. NAMESPACE xsd=http://www.w3.org/2001/XMLSchema-datatypes FUNCTION depth(ELEMENT $e) RETURNS xsd:integer { -- An empty element has depth 1 -- Otherwise, add 1 to max depth of children IF empty($e/*) THEN 1 ELSE max(depth($e/*)) + 1 } depth(document(“partlist.xml”))
37
TU/e technische universiteit eindhoven Conditional Expression Make a list of holdings, ordered by title. For journals, include the editor, and for all other holdings, include the author. FOR $h IN //holding RETURN {$h/title, IF $h/@type=“Journal” THEN $h/editor ELSE $h/author } SORTBY (title)
38
TU/e technische universiteit eindhoven Quantified Expressions Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph. FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p,”sailing”) AND contains($p,”windsurfing”) RETURN $b/title Find titles of books in which sailing is mentioned in every paragraph. FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p,”sailing”) RETURN $b/title
39
TU/e technische universiteit eindhoven Other expressions Sequence-related expressions –Example: ($x,$y,$z) –PRECEDES, FOLLOWS Operators on data types –INSTANCEOF –CAST –TREAT
40
TU/e technische universiteit eindhoven More on XQuery The Database Models course in BIS, given by De Bra and Paredaens, will pay much more attention to XML query languages. Also, look at the W3C site: w3c.org
41
TU/e technische universiteit eindhoven RDF RQL
42
TU/e technische universiteit eindhoven Resource Description Framework W3C standard for metadata description Describes the “meaning” of data like Web sites, parts of HTML pages, etc. Makes data “machine - understandable” – allows automated data processing Framework that allows you to make simple assertions about anything: distributed and extensible (as is the Web) “meaning” expressed via “subclass of” Ref: www.w3.org/RDF, www.w3.org/TR/rdf-primer
43
TU/e technische universiteit eindhoven Basic RDF Model Recognizes 3 object types: –Resources – always named by URI, e.g. web site, part of web page, others –Properties – an attribute of a Resource, its characteristics –Statements – Resource + Property + Property Value
44
TU/e technische universiteit eindhoven Basic RDF Model Example RDF representation of the sentence: “Ora Lassila is the creator of the resource www.w3.org/Home/Lassila.” Statement: Subject (Resource) www.w3.org/Home/Lassila Predicate (Property) Creator Object (Literal) “Ora Lassila”
45
TU/e technische universiteit eindhoven Basic RDF Model Example In general : HAS here www.w3.org/Home/Lassila HAS Creator Ora Lassila Diagram of the statement: www.w3.org/Home/Lassila Ora Lassila Creator
46
TU/e technische universiteit eindhoven RDF and XML RDF can be implemented using XML The example of complete XML for the previous example is: <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:s=http://description.org/schema/> Ora Lassila
47
TU/e technische universiteit eindhoven Structured Value Example “The employee with ID 85740, Ora Lassila, with Email lassila@w3.org, is the creator of the resource www.w3.org/Home/Lassila” www.w3.org/staffid/85740 www.w3.org/Home/Lassila Ora LassilaLassila@w3.org Creator Name Email In XML it is: Ora Lassila lassila@w3.org
48
TU/e technische universiteit eindhoven RDF - more Property value can be literal or resource One subject can have more than one property It is possible to make statements about statements It is possible to refer a collection of resources (containers) of 3 types: –Bag – a property has multiple values, order has no significance –Sequence – a property has multiple value, order is significant –Alternative – list of literals/resources representing alternatives for single property
49
TU/e technische universiteit eindhoven RDF Schemas and Namespaces Meaning of terms used in statements like “Creator”, “Name”, “Email” is expressed by referencing to RDF Schemas (“domain-definition”) RDF Schema provides information about the interpretation of the statement in given RDF model RDF Schema is usually separate document To avoid confusion between different definitions of the same term, RDF Schemas use Namespace facility. xmlns:s=“http://description.org/schema” xmlns:v=“http://description.org/differentschema” Ora Lassila
50
TU/e technische universiteit eindhoven RDF Query Language Querying RDF metadata –SQL/XQL style approach, viewing RDF metadata as relational or XML database [RDF Query Specification (IBM)] –viewing Web descriptions by RDF metadata as knowledge base, applying knowledge representation and reasoning techniques [W3C related] RQL Ref: 139.91.183.30:9090/RDF/publications/bda01.PDF 139.91.183.30:8999/RQLdemo/
51
TU/e technische universiteit eindhoven RQL subClassOf(Artist) subClassOf^(Artist) SELECT $C1, $C2 FROM {$C1}creates{$C2} SELECT X, Y FROM {X}last_modified{Y} WHERE Y >= 2000-01-01
52
TU/e technische universiteit eindhoven OWL
53
TU/e technische universiteit eindhoven OWL Web Ontology Language used to explicitly represent meaning of terms in vocabularies and relationships between terms: ontology –ontology engineering beyond XML and RDF(S) revision of DAML+OIL
54
TU/e technische universiteit eindhoven Stack XML: surface syntax for structured documents (no semantic constraints on meaning) XML Schema: restricting structure of XML documents RDF: datamodel for objects (resources) and relationships, provides simple semantics for this datamodel RDF Schema: vocabulary for describing properties and classes of RDF resources, with semantics for generalization-hierarchies OWL: adds vocabulary for describing properties and classes, e.g. relations between classes (disjoint), cardinality (exactly one), equality, richer typing of properties, characteristics of properties (symmetry), enumerated classes
55
TU/e technische universiteit eindhoven OWL Sublanguages OWL Lite: classification hierarchy and simple constraints OWL DL: maximum expressiveness while retaining computational completeness and decidability (description logics) OWL Full: maximum expressiveness and syntactic freedom of RDF with no computational guarantees
56
TU/e technische universiteit eindhoven OWL Lite RDF Schema features: Class, rdf:Property, rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, Individual (In)Equality: equivalentClass, equivalentProperty, sameIndividualAs, differentFrom, allDifferent Property characteristics: inverseOf, TransitiveProperty, SymmetricProperty, FunctionalProperty, InverseFuntionalProperty Property type restrictions: allValuesFrom, someValuesFrom Restricted cardinality: minCardinality (0/1), maxCardinality (0/1), cardinality (0/1) Class intersection: intersectionOf
57
TU/e technische universiteit eindhoven OWL DL and Full Class axioms: oneOf, disjointWith, equivalentClass, rdfs:subClassOf (both applied to class expressions) Boolean combinations of class expressions: unionOf, intersectionOf, complementOf Arbitrary cardinality: minCardinality, maxCardinality, cardinality
58
TU/e technische universiteit eindhoven References There is a lot of information available through the W3C site. Depending on your background, have a close look at some of the languages and the ideas behind them.
59
TU/e technische universiteit eindhoven
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.