XML - QL A Query Language for XML Version 0.6
04/2000XML-QL2 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in XML-QL * Extensions and Open Issues * Summary
04/2000XML-QL3 Why do we need a query language ? § XML standard doesn't address: l Extraction : How will data be extracted from large XML documents? l Transformation : How will XML data be exchanged between user communities using different but related DTD's? l Integration : How will XML data from multiple XML sources be integrated? l Conversion of data between relational or OO to XML
04/2000XML-QL4 What Does XML-QL do ? Extraction - of data pieces from XML documents Transformation - Map XML data between different DTDs Integration/Combination of XML data from different sources
04/2000XML-QL5 How will data be extracted from large XML documents?
04/2000XML-QL6 Data Transformation How will XML data be exchanged between user communities using different but related DTD's?
04/2000XML-QL7 Data Integration
04/2000XML-QL8 § “ Relational complete” l Can expression selection, join etc. l Nested queries § Precise semantic l To support reasoning § XML - specific features l Regular-path expressions & tag variables l No DTD required, exploit when available § Rewritability § Preserve order and Association l Ordering of elements, grouping of subelements l Server-side processing § Prototype implementation in Java.
04/2000XML-QL9 Requirements for a query language for XML § Selection and extraction § Preserve structure. § Reduction § Restructuring § Join (will be shown in the first part)
04/2000XML-QL10 Requirements for a query language for XML § Tag Variables § Regular path expressions § Transforming XML data between DTDs § No schema required § Indexing § Sorting. (will be shown in the advanced part)
04/2000XML-QL11 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in XML-QL * Extensions and Open Issues * Summary
04/2000XML-QL12 Class Number – cs 401 Class- web and xml Instructor – Sanjay Madria Lesson Title - XML-QL
04/2000XML-QL13 BIB. DTD
04/2000XML-QL14 Basic Examples: Selection/Extraction Find all the names of the authors whose publisher is Addison-Wesley: WHERE Addison-Wesley $t $a IN " CONSTRUCT $a
04/2000XML-QL15 Basic Examples, syntax (cont) The use of instead of : WHERE Addison-Wesley $t $a IN " CONSTRUCT $a
04/2000XML-QL16 Result of first query: The output is in XML form: Date Darwen Date
04/2000XML-QL17 Constructing new XML data: Reduction & Restructre WHERE Addison-Wesley $t $a IN " CONSTRUCT $a $t
04/2000XML-QL18 XML-QL Example Data An Introduction to DB Systems Date Addison-Wesley Foundations for OR Databases Date Darwen Addison-Wesley
04/2000XML-QL19 Constructing new XML data: (result) Date An Introduction to DB Systems Date Foundation for OR Databases Darwen Foundation for Object/Relational Databases: The Third Manifesto
04/2000XML-QL20 Grouping with Nested Queries: Preserve structure WHERE $p IN " Addison-Wesley IN $p, $t IN $p CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a
04/2000XML-QL21 Reduction § Where Addition-wesley $t Element_As $x $a Element_As $y IN Construct $x $y
04/2000XML-QL22 Grouping with Nested Queries: Preserve structure WHERE Addison-Wesley $t CONTENT_AS $p IN " CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a
04/2000XML-QL23 Grouping with Nested Queries: (result) An Introduction to Database Systems Date Foundation for Object/Relational Databases: The Third Manifesto Date Darwen
04/2000XML-QL24 Joining element by values: WHERE $fn -- firstname $f $ln -- firstname $l CONTENT_AS $a IN " $fn -- join the same firstname $f $ln -- join the same lastname $l IN " $y > 1995 CONSTRUCT $a
04/2000XML-QL25 ELEMENT_AS Vs. CONTENT_AS: WHERE $fn -- firstname $fn $ln -- firstname $ln ELEMENT_AS $a IN " … CONSTRUCT $a -- No need for ….
04/2000XML-QL26 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in XML-QL * Extensions and Open Issues * Summary
04/2000XML-QL27 A data model for XML § XML : data format syntax § Query operations assume data model § XML Graph l Directed, Labeled graph l Element tags on edges l Attribute values on nodes l Each node is represented by OID (unique string) l Unordered & ordered models l Leaves labeled with values
04/2000XML-QL28 XML graph for example XML book elements (year=“1995”) book title author publisher title author name An introduction … Addison- Wesley Foundations for... name lastname Date Datwen lastname publisher (year=“1998”) root
04/2000XML-QL29 Element Identity, IDs, and ID Reference: §XML reserve an attribute of type ID, which allows a unique key to be associated with an element. §An attribute IDREF allows an element to refer to another element with the designated key, and IDREFS may refer to multiple elements. §Example: adding attribute ID and author types ID and IDREFS:
04/2000XML-QL30 Element Identity, IDs, and ID Reference: (cont) §and definitions: John Smith... … 1995
04/2000XML-QL31 XML graph including ID & IDREFS first nametitle first name last name year last name author person article John Smith 1995 root
04/2000XML-QL32 Writing queries using IDs * Without IDs : WHERE $n IN “abc.xml” * Using IDREF: (All last name, title pairs) WHERE ELEMENT_AS $t, ELEMENT_AS $l CONSTRUCT $t $l
04/2000XML-QL33 Another Example: <book id = “b1” author idref = “a1”> Memoris… 1997 Arthur Golden catalogue bookauthor GoldenArthur1997Memories... publication author oid1 oid2 b1 a1 publication=b1 author=a1
04/2000XML-QL34 Scalar Values: § Only leaf nodes in the XML may contain values, and they may have only one value. § example: the XML fragment: A trip to The Moon can be translated in order to fit the data model into: A trip to The Moon § The value of a leaf node is its oid.
04/2000XML-QL35 Scalar Values: (cont) A trip to The Moon CDATA title titlepart “The Moon” “A trip to” XML graph ->
04/2000XML-QL36 Element Order § XML-QL supports two distinct data model: an unordered and ordered one. § An ordered graph is like an unordered one but include, for each node, a total order on its successors. § The price for an ordered model is a more complex semantic of the query language and less efficient.
04/2000XML-QL37 Mapping XML-graphs into XML-documents § XML graph don’t have a unique representation, as XML document because: l element order is unspecified. l sharing of nodes. § To create a XML document we have to choose some order that conform to a DTD.
04/2000XML-QL38 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in XML-QL * Extensions and Open Issues * Summary
04/2000XML-QL39 Advanced examples in XML-QL § Tag Variables § Regular - path Expressions § Transforming XML data § Integrating from multiple XML sources § No schema required § Functions definitions and DTD’s § External functions § Ordered model - Sorting, Indexing
04/2000XML-QL40 Tag Variables, No schema required WHERE -- $p can be {article, book} $t referring attr. as an element Date IN "bib.xml", $e IN {author, editor} CONSTRUCT $t Date All publications published in 1995 in which Date is either an author, or an editor
04/2000XML-QL41 Query Result Date An Introduction to Database Systems Date The New Jersey Machine-Code Toolkit
04/2000XML-QL42 Regular Path Expressions § XML data can specify nested and cyclic structures, such as trees, directed acyclic graphs, and arbitrary graphs. § The following DTD defines a self-recursive element part:
04/2000XML-QL43 Regular Path Expressions (cont) Here part* is a regular path expression, and matches any sequence of edges, all of which are labeled part: WHERE $r Ford IN " CONSTRUCT $r
04/2000XML-QL44 Regular Path Expressions (cont) § the path definition : $r Ford § is equivalent to the following infinite sequence of patterns: l $r Ford l...
04/2000XML-QL45 Regular Path Expressions (cont) § The wildcard ‘ * ‘ matches any tag and appear wherever a tag is permitted: § Example: WHERE $r Ford IN " CONSTRUCT $r
04/2000XML-QL46 Regular Path Expressions (cont) § ‘. ‘ denotes concatation of regular expression …… = … … § ‘ | ‘ denotes alternation of regular expression § ‘ + ‘ operator means one or more: = § Tag variables make it possible to write a query that can be applied to two or more XML data sources with similar but not identical DTDs.
04/2000XML-QL47 Regular Path Expressions (cont) WHERE $r IN ”parts.xml” -- please take a look at parts.XML CONSTRUCT $r Result: Motor Hamilton B&O A2D AMD Woofer Speakers Labtec
04/2000XML-QL48 Transforming XML data § Translate data from one DTD into another. § Example: besides the BIB. DTD we have other DTD that defines a person: <!ELEMENT person (lastname, firstname, address?, phone?, publicationtitle*) § Next query transform data that conforms to BIB.DTD into data that conforms to Person DTD. § The Query uses OID’s(Object identifiers) and Skolem functions to group results in the same element.
04/2000XML-QL49 Transforming XML data with Skolem function WHERE $fn $ln $t IN " CONSTRUCT $fn $ln $t
04/2000XML-QL50 Query Result Mary Fernandez The New Jersey Machine-Code Toolkit Dan Date The New Jersey Machine-Code Toolkit
04/2000XML-QL51 Integrating data from multiple XML sources WHERE ELEMENT_AS $n $ssn IN ”payroll.xml", -- take a look at payroll.XML $ssn ELEMENT_AS $i IN "taxpayers.xml” -- take a look… CONSTRUCT $n $i
04/2000XML-QL52 Integrating data from multiple XML sources (result) M.Smith R. Johnson J. Doe
04/2000XML-QL53 Integrating data from multiple XML sources (Skolem function) { WHERE ELEMENT_AS $n $ssn IN ”payroll.xml" CONSTRUCT $n } { WHERE $ssn ELEMENT_AS $i IN "taxpayers.xml" CONSTRUCT $n $i }
04/2000XML-QL54 Integrating data (result) M. Smith R. Johnson P. Kent J. Doe 35000
04/2000XML-QL55 Integrating data (cont) All titles published in ‘95, in addition the month of journal articles and the publishers for books: WHERE $t 1995 CONTENT_AS $p IN " CONSTRUCT $t { WHERE $e = ”article", $m IN $p CONSTRUCT $m } { WHERE $e = "book", $q IN $p CONSTRUCT $q }
04/2000XML-QL56 Query Result The New Jersey Machine-Code Toolkit June An Introduction to Database Systems Addison Wesley
04/2000XML-QL57 Functions definitions and DTD’s function query() { CONSTRUCT findDeclaredIncomes("taxpayers.xml","payroll.xml") } function findDeclaredIncome($Taxpayers,$Employees) { WHERE $s $x IN $Taxpayer, $s $n IN $Employees CONSTRUCT $n $x }
04/2000XML-QL58 Functions definitions and DTD’s (cont) Restrictions by DTD’s: function findDeclaredIncome ( $Taxpayers:” $Employees:” :“ ) { WHERE …. CONSTRUCT …. }
04/2000XML-QL59 Embedding queries in data WHERE $t $y IN “ $y > 1995 CONSTRUCT $t WHERE $t $y IN “ $y >1995 CONSTRUCT $t
04/2000XML-QL60 Support for an ordered model: * Variable order is important for binding * Example: WHERE $x $y -- and WHERE $y $x will match the same objects but by different order. * Lets take the following XML data: string1 string2 string3 string4
04/2000XML-QL61 Support for ordered model (cont) § Under the definition of where in WHERE $x $y The output will be in the order : $x $y string1string2 string1string4 string3string2 string3string4
04/2000XML-QL62 Support for ordered model (cont) § Under the definition of where in WHERE $y $x The output will be in the order : now the output will be in the ordered first by $y and than by $x) $x $y string1string2 string3string2 string1string4 string3string4
04/2000XML-QL63 Indexes for element: § XML support element-order variables. § Example: … § here $i and $j are bind to an integer 0, 1, 2 … that represent the index in the local order of the edges.
04/2000XML-QL64 Indexes for element (graph) (year=“1995”) book [0] book[1] title[0] author[2] publisher[1] title[0] author[3] author[2] name[0] An introduction … Addison- Wesley Foundations for... name[0] lastname[0] Date Datwen lastname[0] publisher[1] (year=“1998”) ( 1 ) ( 13 ) ( 12 ) ( 11 ) ( 10 ) ( 9 ) ( 8 ) ( 7 ) ( 5 ) ( 6 ) ( 4)( 3 ) ( 2 ) ( 15 ) ( 14 ) root
04/2000XML-QL65 Indexes for element: (cont.) § Example : retrieves all the persons whose lastname precedes the firstname : WHERE $p IN “ $x IN $p, $y IN $p, $j < $k CONSTRUCT $p
04/2000XML-QL66 ORDER-BY: Order publications by year and month: preserve the order in the original document for publications within the same year/month WHERE $p IN “ $t IN $p, $y IN $p, $m IN $p ORDER-BY value($y),value($m) CONSTRUCT $t value function returns the CDATA value of a node In the absence of value the result would be ordered by OIDs
04/2000XML-QL67 ORDER-BY: (cont.) Reverse the order of all authors in a publication: WHERE $p IN “ CONSTRUCT WHERE $a IN $p ORDER-BY $k DESCENDING CONSTRUCT $a WHERE $v IN $p $e != “author” CONSTRUCT $v
04/2000XML-QL68 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in XML-QL * Extensions and Open Issues * Summary
04/2000XML-QL69 Extensions and open issues: § Entities § User-defined predicates § String regular expressions § Name spaces § Aggregates § XML syntax § Extensions to other XML-related standard
04/2000XML-QL70 Entities Recognizing entity references: e.g :
04/2000XML-QL71 WHERE &M $a ELEMENT_AS $x IN “abc.xml”, ATTAddress($a) CONSTRUCT $x User-defined predicates
04/2000XML-QL72 WHERE &M ‘Fern*’ $a ELEMENT_AS $x IN “abc.xml”, ATTAddress($a) CONSTRUCT $x String regular expressions
04/2000XML-QL73 Name spaces WHERE ‘Fern*’ $a ELEMENT_AS $x IN “abc.xml”, ATTAddress($a) CONSTRUCT $x
04/2000XML-QL74 Aggregated WHERE ELEMENT_AS $p, $x, IN “bib.xml” GROUP-BY $p CONSTRUCT $p $min($x) $max($x)
04/2000XML-QL75 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in XML-QL * Extensions and Open Issues * Summary
04/2000XML-QL76 Other Query Languages XSL vs. XML-QL § XSL - Intended primarily for specifying style and layout of XML documents, consists: l Transformation operations l Formatting vocabulary XML-QLXSL XML output x x no schema required x x data extraction x x data restructuring x x data integration x schema browsing x x relational complete x
04/2000XML-QL77 Other Query Languages(cont) § Other competents query languages are: Lorel, YATL, XQL, XML-GL, WEBL § Comparison of simple query: XML-QL CONSTRUCT { WHERE $t Addison-Wesley IN “ $y>1991 CONSTRUCT $t }
04/2000XML-QL78 Other Query Languages(cont) Y A TL make bib [ *book [ $y ], title [ $t ] ] ] match “ with bib [ *book [ $y ], title [ $t ] ], publisher [ name [ $n ] ] ] where $n = “Addison-Wesley” and $y > 1991
04/2000XML-QL79 Other Query Languages(cont) XQL: document(“ { book[publisher/name=“Addison-Wesley” | title } } -- XQL doesn’t have constructor clause
04/2000XML-QL80 Other Query Languages(cont) LOREL: select xml(bib:( (select title:t}) from bib.book b, b.title t, b.year y where b.publisher = “Addison-Wesley” and y>1991)})
04/2000XML-QL81 Other Query Languages(cont) XML-QL LOREL XSL XQL XML-GL XML output x x x x All Query operations x x x No schema required x x x x x XML representation x XML embedded x x x x Exploit avail. schema x x Preserve order x x x x x
04/2000XML-QL82 Summary/Conclusions § XML-QL is a declarative language which provides support for querying, constructing, transforming, and integrating XML data § XML-QL supports both ordered and unordered view on XML document § XML-QL is based on similar database research suggested model of Semi-structured data § XML-QL satisfy the absolute set of requirements from query language cited in XML Query Requirements of W3C Working Draft § XML-QL is good candidate to be the new XML standard query language
04/2000XML-QL83 Bibliography Articles: § XML-QL:A Query Language for XML, W3C 19/8/98, 99 § Quering XML Data - IEEE1999 § XML Query Requirements - W3c Working draft 31/1/00 § XML Query Languages:Experiences and Exemplars-~mff WWW sites: § § db.cis.upenn.edu/~adeutsch/xmlql-demo/html/ § www-db.research.belllabs.com/user/simeon/ xquery.html
04/2000XML-QL84 Appendix - XML data Examples: BIB. DTD:
04/2000XML-QL85 Appendix - XML data Examples: BIB. XML: An Introduction to Database Systems Date Addison-Wesley Foundation for Object/Relational Databases: The Third Manifesto Date Darwen Addison-Wesley
04/2000XML-QL86 Appendix - XML data Examples Parts.DTD: <!DOCTYPE Parts [ ]>
04/2000XML-QL87 Appendix - XML data Examples Parts.XML: Green Power Juicer Green Power Motor Hamilton
04/2000XML-QL88 Appendix - XML data Examples Parts.XML continue... Toyota Tercel Toyota Sony Stereo X11-3 Sony Woofer B&O A2D AMD Speakers Labtec
04/2000XML-QL89 Appendix - XML data Examples Payroll.XML J. Doe M. Smith R. Johnson P. Kent 33000
04/2000XML-QL90 Appendix - XML data Examples TaxPayers.XML: