Query Languages Aswin Yedlapalli
XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence of nodes (eg. for sub elements). - an unordered set of nodes (eg. For attributes). Compatible with XML schemas
Comparison of XML and semi structured data Similarities: - both are best described by a labeled graph. - both are schema-less self describing. Differences: - XML is ordered; semi structured data is unordered. - XML can mix text and elements
Required features for a Query Language Expressive power - The Query language must be at least as expressive as SQL on relational data. - The Query language should have the ability to restructure data. - The Query language should be able to navigate data with arbitrary nesting. Semantics - It is very important in a query language for query transformation and optimization.
Compositionality - Our queries must remain in the same data model. They cannot take data in one model and produce output in another model. Schema - when structure is defined, a query language should be exploited for optimization, type checking etc.,
Query languages For semi structured data - Lorel (Lightweight Object REpository Language) - UnQL (Unstructured Query Language) -StruQL, MSL, W3QL, WebSQL, Weblog, etc., For XML - XML-QL (XML Query Language) - XSLT & structural recursion. - XML Query Algebra.
Formal Semantics Given query Q = SELECT E[X 1,……. X n ] FROM F WHERE C and database DB Answer: (Q,DB) is defined in two steps: –Step 1: compute all bindings: C ij are node oids or atomic values
Must satisfy paths in F Must satisfy conditions in C –Step 2: answer is E[C 11, …, C 1n ] … E[C m1, …, C mn ]
When E has nested sub queries, apply semantics recursively Note: so far we have dealt with an unordered model –What do we need to do for order ? Complexity: PTIME in |DB| (not in |Q|).
LOREL Minor syntactic differences in regular path expressions (% instead of _, # instead of _*) Common path convention SELECT biblio.book.author FROM biblio.book WHERE biblio.book.year = 1999
Becomes SELECT X.author FROM biblio.book X WHERE X.year = 1999
Lorel Query language of LORE system adapts OQL to semi structured data. Select X.title from bib.article X where “tova milo” in X.author returns {title: “type inf…”}
Features of Lorel Differences with typed query languages - performs implicit coercions. - deals with missing attributes. - deals with set valued attributes. eg., x.year > 1998 may have several years. Select clause creates new nodes. Allows for nested queries. Allows for regular path expressions.
UnQL (Unstructured Query language) UnQL is an extension of basic LOREL. UnQL does not make use of coercion unlike LOREL. “Where” clause contains 2 kinds of constructs. - generators; variables are bound via patterns. - conditions; as in LOREL “from” clause is not needed as variables are bound in patterns.
UnQL Queries Eg., Select title:T where {bib:article:{title:T, year:Y}}in db, y>1998. Root of the database is explicitly represented: db UnQL queries can be rewritten in LOREL. The equivalent LOREL for the above query is: select title:T from bib.article A, A.title T, A.year Y where Y>1998.
Additional features of LOREL Label variables - can combine “schema” and “data” information. - can turn tables to data and vice-versa. - perform group-by operations. Can match variables with regular expressions.
References Managing XML and semi structured data – Lecture series by Prof. Dan Suciu. website: iu/COURSES/590DS/