1 XML Basics –Semi-structured data –DTD –XML Schema XML transforming and querying –XPath –XSLT –XQuery Semantic Web –RDF –OWL An introduction to XML and.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
XSLT 11-Apr-17.
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
XQuery Or, what about REAL databases?. XQuery - its place in the XML team XLink XSLT XQuery XPath XPointer.
XML 6.6 XPath 6. What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to.
2-Jun-15 XPath. 2 What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to.
1 COS 425: Database and Information Management Systems XML and information exchange.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
ECA 228 Internet/Intranet Design I Intro to XSL. ECA 228 Internet/Intranet Design I XSL basics W3C standards for stylesheets – CSS – XSL: Extensible Markup.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XP New Perspectives on XML Tutorial 6 1 TUTORIAL 6 XSLT Tutorial – Carey ISBN
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
WORKING WITH XSLT AND XPATH
Extensible Markup and Beyond
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall XML Transformation: XSLT Semantic Web - Spring 2007 Computer.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
ECA 228 Internet/Intranet Design I XSLT Example. ECA 228 Internet/Intranet Design I 2 CSS Limitations cannot modify content cannot insert additional text.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
CITA 330 Section 6 XSLT. Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace "
XSLT part of XSL (Extensible Stylesheet Language) –includes also XPath and XSL Formatting Objects used to transform an XML document into: –another XML.
XSLT Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Database Systems Part VII: XML Querying Software School of Hunan University
XPath Aug ’10 – Dec ‘10. XPath   XML Path Language   Technology that allows to select a part or parts of an XML document to process   XPath was.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XSLT. XSLT stands for Extensible Stylesheet Language Transformations XSLT is used to transform XML documents into other kinds of documents. XSLT can produce.
University of Nottingham School of Computer Science & Information Technology Introduction to XML 2. XSLT Tim Brailsford.
Martin Kruliš by Martin Kruliš (v1.1)1.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall XML Transformation: XSLT Semantic Web - Fall 2005 Computer.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 14 This presentation © 2004, MacAvon Media Productions XML.
XML Introduction to XML Extensible Markup Language.
XML Schema – XSLT Week 8 Web site:
1 XSL Transformations (XSLT). 2 XSLT XSLT is a language for transforming XML documents into XHTML documents or to other XML documents. XSLT uses XPath.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
XML in Web Technologies
Chapter 7 Representing Web Data: XML
XPath 9-May-19.
More XML XML schema, XPATH, XSLT
Xpath October 16, 2019 ICS 541: Xpath.
XPath 7-Dec-19.
Presentation transcript:

1 XML Basics –Semi-structured data –DTD –XML Schema XML transforming and querying –XPath –XSLT –XQuery Semantic Web –RDF –OWL An introduction to XML and Related Standards

2 Background: Markup and Markup Language Markup –Annotations (tags) for carrying information about a document’s content a writer’s handwritten notes for typesetting an editor’s corrections in a manuscript Makeup Language –A language defines a syntax and grammar for tags

3 Background: SGML SGML –Standard Generalized Markup Language –Standardized in 1986 (ISO) –A language for defining markup languages –And for marking-up content –Syntax + Document Type Definition (DTD) –Tools aimed at document management

4 Background: HTML HTML –A markup language –A particular SGML Document Type (called an “application”) –Tools for browsing and authoring

5 Background: Limitations of SGML and HTML SGML –Complex, many options and shortcuts –Must know the DTD to parse correctly –Cost of SGML technology is high HTML –Not extensible—can’t define new tags –Tags for presenting data not describing it –Doesn’t capture much document structure or content meaning

6 Enter XML XML (Extensible Markup Language) –Standardized by W3C in 1998 –For data interchange over the Web –A Simpler SGML: Actually, a subset of SGML DTDs are optional Less features and options –Widely available tools for parsing, authoring, browsing, etc.

7 Uses for XML Why XML? –Capture logical structure of documents Presentation Independent –Data Interchange XML is implementation independent –Storage Format Any successful interchange format becomes a storage format –Metadata Searching, filtering, organizing –Data Packaging, Movement, and Processing Client-Side processing, Server-to-Server communication, Non- browser based clients, Simplified Server Processing, etc.

8 The Many Standards of XML XML Document XML DTD Query XQuery, XQL, XML-QL Programming Document Object Model (DOM) Transformation XSLT for rearranging and restructuring XML documents Transport XML-RPC, SOAP, XML-Protocol for message and object serialization and remote procedure calls Metadata RDF, OWL - using XML to define resource metadata Schema and Types XML Schema Linking XLink for simple and complex hyperlinks between XML Documents Addressing XPath and XPointer for addressing XML subdocuments

9 The Running Example Lego Product Catalogs –catalogs have: a publishing date, an identifier, a title, etc. –catalogs are made up of products either a kit or accessory each has an item #, price, name, picture, etc. kits can have an age level, # of pieces, set type (duplo, basic), a theme (star wars), a system (space)

10 An Example XML Catalog Document 2000 X-Wing Fighter Star Wars Take to the skies with Luke as he battles the forces of evil!

11 An Example XML Document prolog body elements have start and end-tags elements can also contain content elements are nested “boxes within boxes” 2000 X-Wing Fighter Star Wars Take to the skies with Luke as he battles the forces of evil! …

12 Well Formed Documents Well-formed XML documents: –A single root element –Start and end tags required (unlike HTML) X-Wing Fighter empty-element tags: –Elements must be properly nested 263 –More rules: naming elements, document has at least one element, etc. This is NOT properly nested!!!

13 XML Attributes Elements can contain attributes element name attribute name attribute value attribute name attribute value attribute name attribute value Attributes are always assigned in element start tags, are always surrounded by double quotes, and must be unique in the element

14 Attributes vs. Content In general, it is up to the document designer In SGML, content usually was for data you see and attributes for metadata

15 DTD and XML Schema

16 Document Type Definition Why DTDs? –To standardize tags and structure for interchange and creation –To make the documents machine processable What is a DTD? –A grammar for describing XML documents (tags, attributes, nesting, etc.) –An XML document that is well-formed and conforms to a DTD is said to be valid

17 An Example DTD: Elements <!ELEMENT kit (name, ages, pieces, theme?, series?, desc)> An element content model for LegoCatalog A character data content model for pubDate * zero or more + one or more ? optional | Choice, Strict Sequence () Grouping Empty, Any, and Mixed content models

18 An Example DTD: Attributes <!ATTLIST kit price CDATA #REQUIRED shipWeight CDATA #REQUIRED avail (yes | no) #IMPLIED image CDATA “na.jpg” unitId ID #IMPLIED > <!ATTLIST accessory forKits IDREFS #IMPLIED orderStatus CDATA #FIXED “special” > each attribute has the form: attr-name type default-decl CDATA = character data ID = unique identifier IDREF = reference to an ID IDREFS = list of references enumeration = list of possible values #REQUIRED = must appear #IMPLIED = optionally appear #FIXED + default = if attribute is missing, parser assumes value Default only = if attribute is missing, default is assumed, otherwise any value

19 Limitations of DTDs DTDs are not optimal –Not well-formed XML can’t parse them with an XML parser need different tools to create them + but at least you can sort-of read/understand them –Limited support for defining data types –Limited modeling capabilities hard to express some structures no support for reusing structure

20 XML Schema W3C proposed recommendation (2001) Divided into 2 parts: structures, datatypes Main features –Well-formed XML documents –A schema can span multiple documents –Can define new data types and constraints –Inheritance among content model types –Improves data interchange Offers more precision for computer-computer transfer

21 The.xsd file <xs:schema xmlns:xs=“ targetNamespace=“ version=“1.1”> …. xmlns:xs - use the ‘xs’ prefix to reference elements defined in a schema from another namespace targetNamespace - all the elements and types defined in this schema come from this namespace. Use this URI to import or include these definitions in other schemas

22 Example XML Schema <xs:element xs:name=“kit” type=“Product” xs:minOccurs=“1” xs:maxOccurs=“unbounded”/> <xs:element xs:name=“accessory” xs:type=“Product” xs:minOccurs=“0” xs:maxOccurs=“unbounded”/>... …... Many ways to describe new data types (not just regular expressions) ComplexType = Content Model

23 Main Schema Components Definitions of: –Complex types = sub-elements + attributes –Simple types = no sub-elements, constraints on strings(datatypes) Declarations of: –elements (of simple and complex types) –attributes (simple types), attribute groups

24 Simple Type Definitions Can have: built-in, pre-declared or anonymous simple type definitions. ……

25 Example of Complex Type Definition …

26 Constraints on Element Content content = –textOnly : only character data –mixed : character data appears alongside subelements –elementOnly : only subelements –empty : no content (only attributes) –any

27 Datatype Example This creates a new datatype called 'TelephoneNumber'. Elements of this type can hold string values, but the string length must be exactly 8 characters long and the string must follow the pattern: ddd-dddd, where ‘\d' represents a 'digit'.

28 XPath

29 What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to the way an operating system describes paths to files XPath is almost a small simple programming language; it has functions, tests, and expressions XPath is a W3C standard XPath is not itself written as XML, but is used heavily in XSLT, XML Schema and XQuery

30 Terminology library is the parent of book ; book is the parent of the two chapter s The two chapter s are the children of book, and the section is the child of the second chapter The two chapter s of the book are siblings (they have the same parent) library, book, and the second chapter are the ancestors of the section The two chapter s, the section, and the two paragraph s are the descendents of the book

31 Slashes A path that begins with a / represents an absolute path, starting from the top of the document –Example: / /message/header/from –Note that even an absolute path can select more than one element –A slash by itself means “the whole document” A path that does not begin with a / represents a path starting from the current element –Example: header/from A path that begins with // can start from anywhere in the document –Example: //header/from selects every element from that is a child of an element header –This can be expensive, since it involves searching the entire document

32 Brackets and last() A number in brackets selects a particular matching child (counting starts from 1, except in Internet Explorer) –Example: /library/book[1] selects the first book of the library –Example: //chapter/section[2] selects the second section of every chapter in the XML document –Example: //book/chapter[1]/section[2] –Only matching elements are counted; for example, if a book has both section s and exercise s, the latter are ignored when counting section s The function last() in brackets selects the last matching child –Example: /library/book/chapter[last()] You can even do simple arithmetic –Example: /library/book/chapter[last()-1]

33 Stars A star, or asterisk, is a “wildcard” -- it means “all the elements at this level” –Example: /library/book/chapter/* selects every child of every chapter of every book in the library –Example: //book/* selects every child of every book –Example: /*/*/*/paragraph selects every paragraph that has exactly three ancestors –Example: //* selects every element in the entire document

34 Attributes I You can select attributes by themselves, or elements that have certain attributes –Remember: an attribute consists of a name-value pair, for example in, the attribute is named num –To choose the attribute itself, prefix the name will choose every attribute named num –Example: will choose every attribute, everywhere in the document To choose elements that have a given attribute, put the attribute name in square brackets –Example: will select every chapter element (anywhere in the document) that has an attribute named num

35 Attributes II selects every chapter element with an attribute num selects every chapter element that does not have a num attribute selects every chapter element that has any attribute selects every chapter element with no attributes

36 Values of attributes selects every chapter element with an attribute num with value 3 The normalize-space() function can be used to remove leading and trailing spaces from a value before comparison –Example:

37 Location Path The central construct is the location path: location path = location step / …/ location step child::section [ position()<6 ] / descendant::cite / attribute::href selects all href attributes in cite elements in the first 5 sections of a document A location step is evaluated wrt. some context A location path is evaluated left-to-right, starting with some initial context, each node resulting from evaluation of one step is used as context for evaluation of the next, and the results are unioned together

38 Location Step location step = axis :: node-test [ predicate ] axis a rough set of candidate nodes – e.g. the child nodes of the context node node-test performs an initial filtration based on – types: chardata node, processing instruction, etc. – names: element name predicates a further, more complex, filtration. only candidates for which the predicates evaluate to true are kept child::section [ position()<6 ] / descendant::cite / attribute::href

39 Axes :: Node-test [ Predicate ] child descendant parent ancestor following-sibling preceding-sibling following preceding attribute namespace self descendant-or-self ancestor-or-self child::section [ position()<6 ] / descendant::cite / attribute::href Axes Node Test name * text() comment() processing-instruction() node() [attribute::name="flour"] [attribute::name!="flour"] [attribute::amount=“0.5” and attribute::unit=“cup”] [position()=2] Predicate

40 Abbreviations child:: nothing (so child is the default axis) /descendant-or self::node()/ // self::node(). parent::node selects all href attributes in descendants of the context node. section [ position()<6 ] // cite = “there”] selects all cite elements with href="there" attributes in the first 5 sections

41 XSL

42 XSL (eXtensible Stylesheet Language) Why do we need it? –Store in one format, display in another. e.g. transforming XML to XHTML and displaying in browser –Convert to a more useful format –Make the document more compact Extracting from XML documents only the data we need We are interested to get another document that looks like we specify

43 XSL (eXtensible Stylesheet Language) consists of two parts: –XSL Transformations (XSLT) XSLT stylesheet is an XML document defining transformation from one class of XML documents into another –XSL Formatting Objects (XSL-FO) Specifying formatting in a more low-level and detailed way

44 A Simple Example File data.xml: Howdy! File render.xsl:

45 The.xsl File An XSLT document has the.xsl extension The XSLT document begins with: – Contains one or more templates, such as: –... And ends with: –

46 Explanation of render.xsl The XSL was: The chooses the root The is written to the output file The contents of message is written to the output file The is written to the output file The resultant file looks like: Howdy!

47 How XSLT Works The XML text document is read in and stored as a tree of nodes The template is used to select the entire tree The rules within the template are applied to the matching nodes, thus changing the structure of the XML tree –If there are other templates, they must be called explicitly from the main template Unmatched parts of the XML tree are not changed After the template is applied, the tree is written out again as a text document

48 xsl:value-of selects the contents of an element and adds it to the output stream –The select attribute is required –Notice that xsl:value-of is not a container, hence it needs to end with a slash Example (from an earlier slide):

49 xsl:for-each xsl:for-each is a kind of loop statement The syntax is Text to insert and rules to apply Example: to select every book ( //book ) and make an unordered list ( ) of their titles ( title ), use:

50 Filtering output You can filter (restrict) output by adding a criterion to the select attribute’s value: This will select book titles by Terry Smith

51 Filter details Here is the filter we just used: author is a sibling of title, so from title we have to go up to its parent, book, then back down to author This filter requires a quote within a quote, so we need both single quotes and double quotes Legal filter operators are: = != < >

52 But it doesn’t work right! Here’s what we did: This will output and for every book, so we will get empty bullets for authors other than Terry Smith There is no obvious way to solve this with just xsl:value-of

53 xsl:if xsl : if allows us to include content if a given condition (in the test attribute) is true Example: This does work correctly!

54 xsl:choose The xsl:choose... xsl:when... xsl:otherwise construct is XML’s equivalent of switch... case... default statement The syntax is:... some code some code... xsl:choose is often used within an xsl:for-each loop

55 xsl:sort You can place an xsl:sort inside an xsl:for-each The attribute of the sort tells what field to sort on Example: by –This example creates a list of titles and authors, sorted by author

56 xsl:apply-templates If you apply a template to an element that has child elements, templates are not automatically applied to those child elements The element applies a template rule to the current element or to the current element’s child nodes If we add a select attribute, it applies the template rule only to the child that matches If we have multiple elements with select attributes, the child nodes are processed in the same order as the elements

57 Applying templates to children XML Terry Smith by With this line: XML by Gregory Brill Without this line: XML

58 Calling named templates You can name a template, then call it, similar to the way you would call a method in Java The named template:...body of template... A call to the template: Or:...parameters...

59 Processing model A list of source nodes is processed to create a result tree fragment. The result tree is constructed by processing a list containing just the root node. A list of source nodes is processed by appending the result tree structure created by processing each of the members of the list in order. A node is processed by finding all the template rules with patterns that match the node, and choosing the best amongst them; the chosen rule's template is then instantiated with the node as the current node and with the list of source nodes as the current node list. A template typically contains instructions that select an additional list of source nodes for processing. The process of matching, instantiation and selection is continued recursively until no new source nodes are selected for processing.

60 XQuery

61 Enter XQuery XML documents generalize relational data c2b2a2 c3b3a3 c1b1a1 CBA  R   tuple   A  a1  /A   B  b1  /B   C  c1  /C   /tuple   tuple   A  a2  /A   B  b2  /B   C  c2  /C   /tuple  …  /R  How should query languages like SQL be similarly generalized?

62 FLWOR Expressions The main engine of XQuery is the FLWOR expression: –For-Let-Where-Order-Return –pronounced "flower" –generalizes SELECT-FROM-WHERE from SQL for $d in document("depts.xml")//deptno let $e := document("emps.xml")//employee[deptno = $d] where count($e) >= 10 order by avg($e/salary) descending return { $d, {count($e)}, {avg($e/salary)} } generates an ordered list of bindings of deptno values $d for each $d, $e = the list of emp elements with that department number filters that list to retain only the desired tuples sorts that list by the given criteria constructs for each tuple a resulting value have an ordered list of tuples ($d,$e) The result is a list of departments with at least 10 employees, sorted by average salaries.

63 List Expressions XQuery expressions often manipulate lists of values for $p in distinct-values(document("bib.xml")//publisher) let $a := avg(document("bib.xml")//book[publisher = $p]/price) return { $p/text() } { $a } List functions: distinct-values, avg, …

64 Conditional expressions XQuery supports a general if-then-else construction. extracts from the holdings of a library the titles and either editors or authors. for $h in document("library.xml")//holding return { $h/title, if = "Journal") then $h/editor else $h/author }

65 Quantified Expressions for $b in document("bib.xml")//book where some $p in $b//paragraph satisfies ( contains($p,"sailing") AND contains($p,“fishing") ) return $b/title for $b in document("bib.xml")//book where every $p in $b//paragraph satisfies contains($p,"sailing") return $b/title finds the titles of all books which mention both sailing and fishing in the same paragraph finds the titles of all books which mention sailing in every paragraph